I have defined a list of ball tree objects as below, where input1 is a NumPy array with shape (100, 320).
bt = []
bt.append(BallTree(input1))
I take one of the elements of input1 as a sample query, where sample_index is assumed to be within bounds.
sample_query = input1[sample_index,:]
# Find nearest neighbour and compute distance and index
distance, index = bt[0].query(sample_query,1)
Here, distance[0] is 0 as expected given that ‘sample_query’ is a member of input1.
# Adding another BallTree instance to the list
#input2 is a numpy array with shape (70,320)
bt.append(BallTree(input2))
distance, index = bt[0].query(sample_query,1)
print distance[0]
# Output here is NOT zero (NOT expected!!)
Why would the nearest neighbour distance change for ‘sample_query’ and bt[0] when I append one more Ball tree object to the Ball tree list ‘bt’? I would expect the object bt[0] to be unmodified when I append one more object to the list bt. Is my expectation correct?
I found a gap in my understanding of BallTree with this example.
After a bit of digging, I now understand that (borrowing from the notation in the question) bt[0].data actually points to the input numpy array rather than it being a copy. I was re-using the input numpy array for creating further ball trees and hence the data as seen by bt[0] kept getting clobbered everytime.
If I ensure that the numpy array gets created (or allocated in ‘C-speak’) for each ball tree instance, ball tree query results are consistent.