Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8413031
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T00:49:51+00:00 2026-06-10T00:49:51+00:00

I have a large Pandas DataFrame <class ‘pandas.core.frame.DataFrame’> DatetimeIndex: 3425100 entries, 2011-12-01 00:00:00 to

  • 0

I have a large Pandas DataFrame

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3425100 entries, 2011-12-01 00:00:00 to 2011-12-31 23:59:59
Data columns:
sig_qual    3425100  non-null values
heave       3425100  non-null values
north       3425099  non-null values
west        3425097  non-null values
dtypes: float64(4)

I select a subset of that DataFrame using .ix[start_datetime:end_datetime] and I pass this to a peakdetect function which returns the index and value of the local maxima and minima in two seperate lists. I extract the index position of the maxima and using DataFrame.index I get a list of pandas TimeStamps.

I then attempt to extract the relevant subset of the large DataFrame by passing the list of TimeStamps to .ix[] but it always seems to return an empty DataFrame. I can loop over the list of TimeStamps and get the relevant rows from the DataFrame but this is a lengthy process and I thought that ix[] should accept a list of values according to the docs?
(Although I see that the example for Pandas 0.7 uses a numpy.ndarray of numpy.datetime64)

Update:
A small 8 second subset of the DataFrame is selected below, # lines show some of the values:

y = raw_disp['heave'].ix[datetime(2011,12,30,0,0,0):datetime(2011,12,30,0,0,8)]
#csv representation of y time-series 
2011-12-30 00:00:00,-310.0
2011-12-30 00:00:01,-238.0
2011-12-30 00:00:01.500000,-114.0
2011-12-30 00:00:02.500000,60.0
2011-12-30 00:00:03,185.0
2011-12-30 00:00:04,259.0
2011-12-30 00:00:04.500000,231.0
2011-12-30 00:00:05.500000,139.0
2011-12-30 00:00:06.500000,55.0
2011-12-30 00:00:07,-49.0
2011-12-30 00:00:08,-144.0

index = y.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-12-30 00:00:00, ..., 2011-12-30 00:00:08]
Length: 11, Freq: None, Timezone: None

#_max returned from the peakdetect function, one local maxima for this 8 seconds period
_max = [[5, 259.0]]

indexes = [x[0] for x in _max]
#[5]

timestamps = [index[z] for z in indexes]
#[<Timestamp: 2011-12-30 00:00:04>]

print raw_disp.ix[timestamps]
#Empty DataFrame
#Columns: array([sig_qual, heave, north, west, extrema], dtype=object)
#Index: <class 'pandas.tseries.index.DatetimeIndex'>
#Length: 0, Freq: None, Timezone: None

for timestamp in timestamps:
    print raw_disp.ix[timestamp]
#sig_qual      0
#heave       259
#north        27
#west        132
#extrema       0
#Name: 2011-12-30 00:00:04

Update 2:
I created a gist, which actually works because when the data is loaded in from csv the index columns of timestamps are stored as numpy array of objects which appear to be strings. Unlike in my own code where the index is of type <class 'pandas.tseries.index.DatetimeIndex'> and each element is of type <class 'pandas.lib.Timestamp'>, I thought passing a list of pandas.lib.Timestamp would work the same as passing individual timestamps, would this be considered a bug?

If I create the original DataFrame with the index as a list of strings, querying with a list of strings works fine. It does increase the byte size of the DataFrame significantly though.

Update 3:
The error only appears to occur with very large DataFrames, I reran the code on varying sizes of DataFrame ( some detail in a comment below ) and it appears to occur on a DataFrame above 2.7 million records. Using strings as opposed to TimeStamps resolves the issue but increases memory usage.

Fixed
In latest github master (18/09/2012), see comment from Wes at bottom of page.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T00:49:53+00:00Added an answer on June 10, 2026 at 12:49 am

    df.ix[my_list_of_dates] should work just fine.

    In [193]: df
    Out[193]:
                A  B  C  D
    2012-08-16  2  1  1  7
    2012-08-17  6  4  8  6
    2012-08-18  8  3  1  1
    2012-08-19  7  2  8  9
    2012-08-20  6  7  5  8
    2012-08-21  1  3  3  3
    2012-08-22  8  2  3  8
    2012-08-23  7  1  7  4
    2012-08-24  2  6  0  6
    2012-08-25  4  6  8  1
    
    In [194]: row_pos = [2, 6, 9]
    
    In [195]: df.ix[row_pos]
    Out[195]:
                A  B  C  D
    2012-08-18  8  3  1  1
    2012-08-22  8  2  3  8
    2012-08-25  4  6  8  1
    
    In [196]: dates = [df.index[i] for i in row_pos]
    
    In [197]: df.ix[dates]
    Out[197]:
                A  B  C  D
    2012-08-18  8  3  1  1
    2012-08-22  8  2  3  8
    2012-08-25  4  6  8  1
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have large sets of data but somewhere in columns there is missing data
I have large amount of data to be plotted on iPad using a core
I have large chunks of data, normally at around 2000+ entries, but in this
I have large data in two files each with about two million (different) entries.
I have been testing out pandas and pytables for some large financial data sets,
Interpolating Large Datasets I have a large data set of about 0.5million records representing
I have large table. consisting of only 3 columns (id(INT),bookmarkID(INT),tagID(INT)).I have two BTREE indexes
I have large amounts of data (a few terabytes) and accumulating... They are contained
I have large amounts of data formatted in JSON formats, I recently scripted the
I have large amounts of scientific data that I need to store (150 TB+

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.