I noticed some strange behavior when using IX on large pandas dataframes. When I

Question

0

Asked: June 15, 20262026-06-15T09:41:36+00:00 2026-06-15T09:41:36+00:00

I noticed some strange behavior when using IX on large pandas dataframes. When I

0

I noticed some strange behavior when using IX on large pandas dataframes.

When I called .ix on the same dataframe 50 times in a row it ran 10 times faster than when I called .ix on 50 different dataframes.

Is there caching going on behind the scenes on .ix? I noticed that the bottom loop doubles my memory usage. Why would the memory be increasing?

Is there any way to modify this behavior?

Note that if you use straight up numpy it ran in 7.4 seconds in both cases with 0 memory increase, which is what led me to believe pandas was caching.

Obviously you never want to call .ix on each individual element…

import pandas as pd
import numpy as np
import datetime as dt
print 'pandas', pd.__version__

li_list = []
for i in range(50):
    li_list.append(pd.DataFrame(data=np.random.randn(50, 17000)))

print 'starting'

dt_start = dt.datetime.now()
a = 0
for i in range(50):
    b = li_list[0] #Only access first element
    for j in b.columns:
        a += b.ix[i, j]
print (dt.datetime.now()-dt_start).total_seconds()


dt_start = dt.datetime.now()
a = 0
for i in range(50):
    b = li_list[i] #Access all in list
    for j in b.columns:
        a += b.ix[i, j]
print (dt.datetime.now()-dt_start).total_seconds()

Output:

pandas 0.9.1
starting
3.651
22.009

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T09:41:37+00:00

Note: there is a hash table population step the first time you look up a location in an axis index. That’s probably what you’re seeing here and would be obscured by using timeit (because the hash table is computed once, stored, and reused). Also explains the increased memory usage.

In a future version of pandas I plan to improve the performance of this type of code on simple data with simple sequential axis indexes. I’ll record your use case on the GitHub issue tracker.

https://github.com/pydata/pandas/issues/2420

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I noticed some strange behavior when using IX on large pandas dataframes. When I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply