I have a table with comments almost 2 million rows. We receive roughly 500 new comments per day. Each comment is assigned to a specific ID. I want to grab the most popular “discussions” based on the specific ID.
I have an index on the ID column.
What is best practice? Do I just group by this ID and then sort by the ID who has the most comments? Is this most efficient for a table this size?
That’s pretty much simply how I would do it. Let’s just assume you want to retrieve the top 50:
If your users are executing this query quite frequently in your application and you’re finding that it’s not running quite as fast as you’d like, one way you could optimize it is to store the result of the above query in a separate table (
topdiscussions), and perhaps have a script or cron that runs intermittently every five minutes or so which would update that table.Then in your application, just have your users select from the
topdiscussionstable so that they only need to select from 50 rows rather than 2 million.The downside of this of course being that the selection will no longer be in real-time, but rather out of sync by up to five minutes or however often you want to update the table. How real-time you actually need it to be depends on the requirements of your system.
Edit: As per your comments to this answer, I know a little more about your schema and requirements. The following query retrieves the discussions that are the most active within the past day:
I don’t know your field names, but that’s the general idea.