I will have an activity feed per user that displays all activity related to events the user is subscribed to and the feed will pull in the most current 20 or so activities. The way I have it setup is all activity regardless of the event its related to is stored in one collection and the document itself has an “event” property I query against and index. The basic query is just select activities from collection where event is in the users event subscription list ordered by date. I store a hash of the list of the users event subscriptions and cache the results of the query using the hash as the key for xx seconds so if another user is subscribed to same exact events I can pull the results from cache instead, I’m not concerned with results being xx seconds stale.
Edit: added model and query example
Models:
User
{
// useless properties excluded
// fixed: hashset not list, can be quite large
HashSet<string> Subscriptions { get; set; }
string SubscriptionHash { get; set; } // precomputed hash of all strings in subscriptions
}
Activity
{
// Useless properties excluded
string ActivityType { get; set; }
}
Query:
if (cache[user.SubscriptionHash] != null)
results = (HashSet<Activity>)cache[user.SubscriptionHash];
else
results = session.Query<Activity>().Where(user.Subscriptions.Contains(e => e.ActivityType)).Take(20).ToList();
// add results to cache
My concern is if this is the BEST APPROACH to handle this or if there’s better ravendb voodoo to use. The single collection could grow into the millions if there’s alot of activities and I could potentially be storing thousands of keys in the cache when there’s thousands of users with endless combinations of subscription lists. These feeds are on the users landing page so it gets hit alot and I don’t want to just throw more hardware at the problem.
So answer im really looking for is if this is the best query to use or if there’s a better way to do it in Raven when I could be querying against millions of documents using list.Contains.
This is an asp.net 4.5 mvc 4 project using ravendb.
Now here is how I would approach it. This is based on RaccoonBlog PostComments
I would store each users events in a separate document (i.e. UserEvent in the example below) with the user having an additional property linking to it along with a number of events and a timestamp of the last event associated with the user. This would keep the user document much smaller but having alot of the important information
In UserEvent it would be a simple document holding id, link to the userid this document references, a “event” collection, and a lasteventid. This way each “event” becomes a sub document for maintenance if needed.
Lastly a Index on UserEvent that allows you to query the data easily
Then all you have to do to query is something like.