In MongoDB I’m doing a geonear query on a collection containing ~3.5 million objects to return results near a certain lat/long. This query runs great if I have a basic 2d index on the object:
db.Listing.ensureIndex( { Coordinates: "2d" } );
However now I also want to filter by other fields (Price, Property Type, Year Built, Beds, Baths, etc…) within the geonear query. When I add to the query things like Price <= 10000000 then the query starts to slow down. I don’t have any indexes on these other fields so I’m wondering what the best approach is performance-wise.
I tried adding separate indexes for each of the other fields (11 total indexes on the collection) however this made the query time out every time, I guess because a collection can only handle having so many indexes?
db.Listing.ensureIndex( { Coordinates: "2d" } );
db.Listing.ensureIndex( { Price: 1 } );
db.Listing.ensureIndex( { Beds: 1 } );
db.Listing.ensureIndex( { Baths: 1 } );
etc...
Now I’m thinking of having just 1 compound index on the collection like so:
db.Listing.ensureIndex( { Coordinates: "2d", Price: 1, PropertyType: 1, YearBuilt: 1, Beds: 1, Baths: 1, HouseSize: 1, LotSize: 1, Stories: 1 } );
Is this the correct approach or is there a better way?
Yes, compound index is probably the way to go. See http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-CompoundIndexes for details.
The only issue I see here is that you have a lot of fields in that index which will make it rather big so you may want to only have indexes on fields with high cardinality. Use explain() to optimize this.
Also, given your dataset it might be hard to keep the index right balanced (and thus it will start hitting disk when it runs out of physical memory) which will slow things down considerably.