I have a task to perform fast search in huge in-memory array of objects by some object’s fields. I need to select the subset of objects satisfying some criteria.
The criteria may be specified as a floating point value or range of such values (eg. 2.5..10).
The problem is that the float property to be searched on is not quite uniformly distributed; it could contain few objects with value range 10-20 (for example) and another million objects with values 0-1, and another million with values 100-150.
So, how possible is it to build index for effective searching those objects? Code samples are welcome.
I fail to see what the distribution of values has to do with building an index (with the possible exception of exact duplicates). Since the data fits in memory, just extract all the fields with their original position, sort them, and use a binary search as suggested by @MattiLyra.
Are we missing something?