I’m trying to implement a behavioral analysis for targeted marketing on my ecommerce website. The basic idea is as follows (I assume the usage of MongoDB, but looking forward for the other recommendations):
- every website
Categoryhas a list of associatedtagsto it, - every content
Articlealso has a list oftags, - every
Userhas an unique cookie ID assigned to him/her on the first visit, -
every time the user browses a
Categoryor reads anArticle, we plan to increment theUser-tagdictionary like this:db.tagviews.update( {_id: user_id}, {$inc: {'tags.foo': 1, 'tags.bar': 1, 'tags.baz': 1}}, true /* upsert */ )
So if we want to see the interests of the particular user, we can fetch the tagviews document for him and look through the tags to see which ones has the most views.
However, I’ve stumbled on a pretty much trivial thing – how to fetch users, based on tags criteria. E.g. we’ve got Google Galaxy Nexus in stock for an attractive price, and want to send marketing emails to the users most interested in [android, phones, gadgets, google].
As far as I understand, we have to create indexes on every tags.* field in the tagviews collection, which is, of course, unacceptable. The other possible solution is to duplicate data in another dimension (incrementing tag-user combo instead of user-tag). But syntetic tests looks very unpromising in terms of disk space and flexibility.
What would be your suggestions to effectively fetch the most interested users based on tags criteria?
Thanks!
From your example i understand that you are using tag names as keys (aka fields) in tagviews collection.
Dont do that , which leaves you in the nightmare when you need to create indexes. Instead create tags as embedded doc within tagviews
and you can effectively index this document by tag name to use it in your filters
And you can increment the specific tag view of a user by
when your user stumbled on his interests.
So to your real question,
you can filter it like
This will retrieve all the users who are interested in above said tags.
or even you can use count to filter most accurate data
hope this helps