I am using Scipy’s KDTree implementation to read a large file of 300 MB.

Question

0

Asked: May 21, 20262026-05-21T16:19:03+00:00 2026-05-21T16:19:03+00:00

I am using Scipy’s KDTree implementation to read a large file of 300 MB.

0

I am using Scipy’s KDTree implementation to read a large file of 300 MB. Now, is there a way I can just save the datastructure to disk and load it again or am I stuck with reading raw points from file and constructing the data structure each time I start my program? I am constructing the KDTree as follows:

def buildKDTree(self):
        self.kdpoints = numpy.fromfile("All", sep=' ')
        self.kdpoints.shape = self.kdpoints.size / self.NDIM, NDIM
        self.kdtree = KDTree(self.kdpoints, leafsize = self.kdpoints.shape[0]+1)
        print "Preparing KDTree... Ready!"

Any suggestions please?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T16:19:04+00:00

KDtree uses nested classes to define its node types (innernode, leafnode). Pickle only works on module-level class definitions, so a nested class trips it up:

import cPickle

class Foo(object):
    class Bar(object):
        pass

obj = Foo.Bar()
print obj.__class__
cPickle.dumps(obj)

<class '__main__.Bar'>
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed

However, there is a (hacky) workaround by monkey-patching the class definitions into the scipy.spatial.kdtree at module scope so the pickler can find them. If all of your code which reads and writes pickled KDtree objects installs these patches, this hack should work fine:

import cPickle
import numpy
from scipy.spatial import kdtree

# patch module-level attribute to enable pickle to work
kdtree.node = kdtree.KDTree.node
kdtree.leafnode = kdtree.KDTree.leafnode
kdtree.innernode = kdtree.KDTree.innernode

x, y = numpy.mgrid[0:5, 2:8]
t1 = kdtree.KDTree(zip(x.ravel(), y.ravel()))
r1 = t1.query([3.4, 4.1])
raw = cPickle.dumps(t1)

# read in the pickled tree
t2 = cPickle.loads(raw)
r2 = t2.query([3.4, 4.1])
print t1.tree.__class__
print repr(raw)[:70]
print t1.data[r1[1]], t2.data[r2[1]]

Output:

<class 'scipy.spatial.kdtree.innernode'>
"ccopy_reg\n_reconstructor\np1\n(cscipy.spatial.kdtree\nKDTree\np2\nc_
[3 4] [3 4]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using Scipy’s KDTree implementation to read a large file of 300 MB.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply