Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9256687
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T12:00:26+00:00 2026-06-18T12:00:26+00:00

I have a python module that loads data directly in to a dict of

  • 0

I have a python module that loads data directly in to a dict of numpy.ndarray for use in a pandas.Dataframe. However, I noticed an issue with ‘NA’ values. My file format represents NA values a s -9223372036854775808 (boost::integer_traits::const_min). My non-NA values are loading as expected (with the right values) into pandas.Dataframe. I believe what is happening is that my module loads into a numpy.datetime64 ndarray, which then is converted to a list of pandas.tslib.Timestamp. This conversion doesn’t seem to preserve the ‘const_min’ integer. Trye the following:

>>> pandas.tslib.Timestamp(-9223372036854775808)
NaT
>>> pandas.tslib.Timestamp(numpy.datetime64(-9223372036854775808))
<Timestamp: 1969-12-31 15:58:10.448384>

Is this a Pandas bug? I think I can have my module avoid using a numpy.ndarray in this case, and use something Pandas doesn’t trip on (perhaps pre-allocate the list of tslib.Timestamp itself.)

Here is another example of unexpected things happening:

>>> npa = numpy.ndarray(1, dtype=numpy.datetime64)
>>> npa[0] = -9223372036854775808
>>> pandas.Series(npa)
0   NaT
>>> pandas.Series(npa)[0]
<Timestamp: 1969-12-31 15:58:10.448384>

Following Jeff’s comment below, I have more information about what is going wrong.

>>> npa = numpy.ndarray(2, dtype=numpy.int64)
>>> npa[0] = -9223372036854775808
>>> npa[1] = 1326834000090451
>>> npa
array([-9223372036854775808,     1326834000090451])
>>> s_npa = pandas.Series(npa, dtype='M8[us]')
>>> s_npa
0                          NaT
1   2012-01-17 21:00:00.090451

Yay! The series preserved the NA and my timestamp. However, if I attempt to create a DataFrame from that series, the NaT disappears.

>>> pandas.DataFrame({'ts':s_npa})
                      ts
0 1969-12-31 15:58:10.448384
1 2012-01-17 21:00:00.090451

Ho-hum. On a whim, I tried interpreting my integers as nano-seconds past epoch instead. To my surprise, the DataFrame worked properly:

s2_npa = pandas.Series(npa, dtype='M8[ns]')
>>> s2_npa
0                             NaT
1   1970-01-16 08:33:54.000090451
>>> pandas.DataFrame({"ts":s2_npa})
                             ts
0                           NaT
1 1970-01-16 08:33:54.000090451

Of course, my timestamp is not right. My point is that pandas.DataFrame is behaving inconsistently here. Why does it preserve the NaT when using dtype=’M8[ns]’, but not when using ‘M8[us]’?

I am currently using this workaround to convert an , which slows things down quite a bit, but works:

>>> s = pandas.Series([1000*ts if ts != -9223372036854775808 else ts for ts in npa], dtype='M8[ns]')
>>> pandas.DataFrame({'ts':s})
                          ts
0                        NaT
1 2012-01-17 21:00:00.090451

(Several hours later…)

Okay, I have progress. I’ve delved into the code to realize that the repr function on Series eventually calls ‘_format_datetime64’, which checks ‘isnull’ and will print out ‘NaT’ That explains the difference between these two.

>>> pandas.Series(npa)
0   NaT
>>> pandas.Series(npa)[0]
<Timestamp: 1969-12-31 15:58:10.448384>

The former seems to honor the NA, but it only does so when printing. I suppose there may be other pandas functions that call ‘isnull’ and act based on the answer, which might seem to partially work for NA timestamps in this case. However, I know that the Series is incorrect due to the type of element zero. It is a Timestamp, but should be a NaTType. My next step is to dive into the constructor for Series to figure out when/how pandas uses the NaT value during construction. Presumably, it is missing a case when I specify dtype=’M8[us]’… (more to come).

Following Andy’s suggestion in the comments, I tried using a pandas Timestamp to resolve the issue. It didn’t work. Here is an example of those results:

>>> npa = numpy.ndarray(1, dtype='i8')
>>> npa[0] = -9223372036854775808
>>> npa
array([-9223372036854775808])
>>> pandas.tslib.Timestamp(npa.view('M8[ns]')[0]).value
-9223372036854775808
>>> pandas.tslib.Timestamp(npa.view('M8[us]')[0]).value
-28909551616000
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T12:00:27+00:00Added an answer on June 18, 2026 at 12:00 pm

    Answer: No

    Technically speaking, that is. I posted the bug on github and got a response here:
    https://github.com/pydata/pandas/issues/2800#issuecomment-13161074

    “Units other than nanoseconds are not supported right now in indexing etc. This should be strictly enforced”

    All of the tests I’ve run with ‘ns’ rather than ‘us’ work fine. I’m looking forward to a future release.

    For anyone interested, I modified my C++ python module to iterate over the int64_t arrays that I loaded from disk, and multiply everything by 1000, except for NA values (boost::integer_traits::const_min). I was worried about the performance, but the difference in load time is tiny for me. (Doing the same in Python is very, very slow.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a python module that imports a module generated with swig. When I
I have a python module that I've been using over the years to process
I have a simple Python script that uses the socket module to send a
1) I have read that if I import the threading module in python, CPU
I have a c++ file that I want to turn into a Python module
I have a python-based GTK application that loads several modules. It is run from
I have a Python library that, in addition to regular Python modules, has some
I have a Python script that uses built-in modules but also imports a number
Why doesn't Python allow modules to have a __call__ method? (Beyond the obvious that
I have written a Python module, and I have two versions: a pure Python

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.