I use MongoDB + PHP for a “facebookish” newsfeed with different kinds of feeds (post, photo, poll, etc.) and with comments.
Each feed belongs to some “channel” – currently it could be user or group (there may be more containers in future).
Any user can subscribe for any channel or unsibscribe from it.
Now let’s say there’re tons of channels and tons of feeds. What is the best structure for channels/feeds/comments?
I’m thinking about two approaches:
1) Feeds collection with list of subscribers in each feed:
feeds:
[
{date_added: ...,
last_update: ...,
title: ...,
text: ...,
channel: ...,
channel_subscribers: [...],
comments_subscribers: [...],
comments: [...]
},
{...},
{...},
{...}
]
If I want to get last feeds:
db.feeds.find({date_added: "this week", channel_subscribers: "my_login"});
If I want to get feeds with new comments:
db.feeds.find({last_update: "this week", comments_subscribers: "my_login"});
Pros:
- Simple and fast readings?
Cons:
- When I want to subscribe/unsibscribe for/from a channel, I have to run
trough all feeds and push/pull my name from list of
channel_subscribers; it could be slow if I have tons of feeds
2) Separate “channels” collection:
Same thing but keep list of subscribers in channel collection:
channels:
[
{channel_id:..., last_update: ..., subscribers: [...]},
{channel_id:..., last_update: ..., subscribers: [...]}
]
First I have to query last updated channels:
subscribes = db.channels.find({last_update: "today", subscribers: "my_login"})
Now find my feeds:
db.feeds.find({channel: {$in: subscribes}], date_added: "today"})
Pros:
- Simple, fast and more safe subscribing/unsubsribing;
Cons:
- I feel I should avoid $in because it’s slow(?), especially when I have lots of
subscribes to put inside of this operator.
3) Keep user subscribes in users collection (so each user has an array of his own subscribes)
users:
[
{_id: ..., login: ..., email: ..., subscribes: [...]}
]
Cons:
– in this case we’ll have even bigger array to put inside of $in than in previous (#2) approach.
4) Your suggestions?
OK I’ll answer by myself. I tried to make a test on my laptop Windows 7 32 bit / 2GB RAM.
I created a “feeds” collection and filled it with 500 feeds:
Each “subscribers” array have a list of 2000 short random string names.
First I have to mention that my DB increased in size from 60Mb to 1.5Gb.
Then when I run a shell command
db.feeds.ensureIndex({subscribers: 1})it hanged for ~3 minutes and then stopped with error:"can't map file memory - mongo requires 64 bit build for larger datasets".So it’s definately not a good idea to create such large multikey fields inside of mongo’s documents.