I have a rather big dataset (2678271, 52) and a 5-dimensional index which consumes

Question

0

Asked: June 17, 20262026-06-17T02:05:50+00:00 2026-06-17T02:05:50+00:00

I have a rather big dataset (2678271, 52) and a 5-dimensional index which consumes

0

I have a rather big dataset (2678271, 52) and a 5-dimensional index which consumes 6.5% of the machine’s memory.
When I call

df.sortlevel(k)

I receive the following error:



MemoryError                               Traceback (most recent call last)
 in ()
----> 1 df = df.sortlevel(4)

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in sortlevel(self, level, axis, ascending)
   2978             raise Exception('can only sort by level with a hierarchical index')
   2979 
-> 2980         new_axis, indexer = the_axis.sortlevel(level, ascending=ascending)
   2981 
   2982         if self._data.is_mixed_dtype():

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in sortlevel(self, level, ascending)
   1856         indexer = _indexer_from_factorized((primary,) + tuple(labels),
   1857                                            (primshp,) + tuple(shape),
-> 1858                                            compress=False)
   1859         if not ascending:
   1860             indexer = indexer[::-1]

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _indexer_from_factorized(labels, shape, compress)
   2124         max_group = np.prod(shape)
   2125 
-> 2126     indexer, _ = lib.groupsort_indexer(comp_ids.astype(np.int64), max_group)
   2127 
   2128     return indexer

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/lib.so in pandas.lib.groupsort_indexer (pandas/src/tseries.c:55052)()

MemoryError:

Is there a hard-coded condition which throws this error? Or is it possible that even though the data only uses 6.5% of the memory (according to htop) the operation eats up the remaining memory?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T02:05:51+00:00

Editorial Team

2026-06-17T02:05:51+00:00Added an answer on June 17, 2026 at 2:05 am

can you move this to GitHub? I need to review the code but there are a number of edge cases where I didn’t test really deeply-“leveled” hierarchical indexes. So this is probably a legitimate bug.

EDIT: this has been fixed in v0.10.1

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a rather big dataset (2678271, 52) and a 5-dimensional index which consumes

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply