Say I’ve the following design:
id | participant_ids
...| [ObjectId(...), ObjectId(...)]
Now I’m querying it this way:
db.events.find({
participant_ids: ObjectId(...)
});
Which is identical to this:
db.events.find({
participant_ids: {
$in: ObjectId(...)
}
});
I assume there isn’t a difference in performance between those two (but correct me if I’m wrong!).
For each event, there are at least 1 and at most 2 participants. So I could also use the following design:
id | participant_1_id | participant_2_id
… and query it like this …:
db.events.find({
$or: {
participant_1_id: ObjectId(...),
participant_2_id: ObjectId(...)
}
});
If I wouldn’t use indexing, this probably doesn’t really make a difference, but -of course- I am.
For the first design, I’ld go with the following index:
db.events.ensureIndex({
participant_ids: 1
});
For the second one, I’ld go with this:
db.events.ensureIndex({
participant_1_id: 1,
participant_2_id: 1
});
Both got downsides when you look at their performance.
- 1st query: Using an
Arrayis probably slower than using a plain key. - 2nd query: Using the
$or-operator isn’t very fast. - 2nd query: Isn’t very scalable, say I’ld want to release the limit of participants sometime, that wouldn’t be possible (you’d have unlimited keys and unlimited items in the
$or-part of the queries).
My questions are:
– What design should I use?
– Can I index Arrays? The docs don’t say anything about this, and I’m not sure Arrays are (since their contents can vary really much).
I don’t think so. It should be the exact same index-based access path if you have one value (“plain key”) or multiple (“Array”).
participant_1_id, participant_2_idis just terrible.