We have a posting analyzing requirement, that is, for a specific post, we need to return a list of posts which are mostly related to it, the logic is comparing the count of common tags in the posts. For example:
postA = {"author":"abc",
"title":"blah blah",
"tags":["japan","japanese style","england"],
}
there are may be other posts with tags like:
postB:["japan", "england"]
postC:["japan"]
postD:["joke"]
so basically, postB gets 2 counts, postC gets 1 counts when comparing to the tags in the postA. postD gets 0 and will not be included in the result.
My understanding for now is to use map/reduce to produce the result, I understand the basic usage of map/reduce, but I can’t figure out a solution for this specific purpose.
Any help? Or is there a better way like custom sorting function to work it out? I’m currently using the pymongodb as I’m python developer.
You should create an index on tags:
and search for posts that share at least one tag with postA:
and finally, sort by intersection in Python:
Note that if postA shares at least one tag with a large number of other posts this won’t perform well, because you’ll send so much data from Mongo to your application; unfortunately there’s no way to sort and limit by the size of the intersection using Mongo itself.