I’m setting up a Twitter-style “trending topics” box for my forum. I’ve got the most popular /words/, but can’t even begin to think how I will get popular phrases, like Twitter does.
As it stands I just get all the content of the last 200 posts into a string and split them into words, then sort by which words are used the most. How can I turn this from most popular words into the most popular phrases?
One technique you might consider is the use of ZSETs in Redis for something like this. If you’ve got very large sets of data, you’ll find that you can do something like this:
To retrieve the top phrases, you’d use this:
$trending_phraseswill be an array of the top ten trending phrases. To do things like recent trending phrases (as opposed to a persistent, global set of phrases), duplicate all of the Redis interactions above. For each interaction, use a key that’s indicative of, say, today’s timestamp and tomorrow’s timestamp (i.e.: days since Jan 1, 1970). When retrieving the results with$trending_phrases, just retrieve both today and tomorrow’s (or yesterday’s) key and usearray_mergeandarray_uniqueto find the union.Hope this helps!