Lets say I have three normalized tables, one for threads, one for comments, and one that connects the two.
I want to display the number of comments in a thread, and that involves finding every comment belonging to a certain thread.
Obviously, I don’t want to do this query every time i display a page, so I need to cache the number of comments to a thread. My two options (as I see it) are:
-
Add a number_comments row to the threads table, and update it whenever adding / removing a comment.
-
Cache the value in memory, either by telling mysql to cache it or using something like APC / memchached
What are the pro’s / con’s of each?
I’m thinking the first one is simple, a bit less performance but you have redundancy and storage is a lot cheaper than memory, though it does muck up the database with a “dynamic”, ever changing value (also note that I’ll need to save “upvotes” for comments, so this question applys to more than one “dynamic” value).
The second one is better performance, but it introduces a new technology that you have to tie in just to cache the number of things.
This project would have a relatively low amount of users, but I want to know which is preferable to a highly visited site as well (e.g. how does facebook store number of comments [I’m guessing both database and in memory]).
Keep in mind this quote from Donald Knuth:
I think caching, or “database denormalization” in this case, is a perfectly valid option, but it’s an option that is best considered when the “normal” approaches are no longer adequate.
You say that you “obviously” don’t want to run an extra query to grab the comment count on every page view, but it’s actually not that obvious. If your database is properly configured, you should already have an index on the
thread_idfield in your comments table (or whatever you happen to call the field). Running a query based on an indexed field, particularly when the query only returns the a generatedCOUNT()field rather than a huge list of threads, doesn’t actually have a lot of overhead. I think it’s easier to just run that query and be done with it.That said, there is value in denormalizing a database when performance reasons demand it. In that case, I would add a
comments_countfield to the thread table which is incremented whenever a new record is added or deleted from the table. You need to remember to add extra code surrounding yourINSERTandDELETEqueries, and possibly toUPDATEqueries as well, ŀŀdepending on if yourcommentstable keeps track of active/deleted states.Again, this is a premature optimization in most cases. The question you need to ask yourself is, “is this site so busy/under such heavy load that the added complexity of managing a computed field is less costly than just running a quick
COUNT()query?” If it is, then by all means go for the denormalization route, but it probably shouldn’t be the first option you go for.