I’m not even sure this is possible to do efficiently, but here’s my problem:
I’m writing what’s essentially a blog engine where a blog post and all replies to each blog post can tagged.
So, I could have a blog post tagged ‘stack’, and a reply to that post tagged ‘overflow’.
Right now, I’m trying to generate a list of the most popular tags when a user hits a special page in my application. It should return not only the n most popular tags by descending number of blog posts, but also the number of blog posts associated with each tag, even if a reply in that post but not the post itself is tagged with that tag.
So, if BlogPost A is tagged with ‘foo’, and a reply in BlogPost B is tagged with ‘foo’, the popular tag summary should count that as two blog posts in total, even though BlogPost B is not technically tagged.
Here’s a description of the tables/fields that might be relevant:
BlogPosts | id # Primary key for all tables, Rails-style BlogComments | id | blog_post_id Tags | id | name # 'foo' Taggings | id | tag_id | blog_post_id | blog_comment_id
There’s some denormalization in Taggings for the sake of convenience. If someone tags BlogPost, it fills in the blog_post_id field, and blog_comment_id remains NULL. If someone tags a comment to a post, it fills in both blog_post_id and blog_comment_id.
Is there some way to return a sorted list of the most popular tags in one or several SQL queries? I’m thinking I might need to just run a computationally-expensive script every few minutes on a cron job and render the cached output instead of running this every time somebody hits the page…
Thanks!
So far I see nothing complicated in your request:
If you want to count ‘affected blog posts’ only, I think that’s the way: