So I’m trying to create a comment system in which you can reply to comments that are already replies (allowing you to create theoretically infinite threads of replies). I want them to display in chronological order (newest on top), but of course the replies should be directly underneath the original comment. If there are multiple comments replying to the same comment, the replies should also be in chronological order (still underneath the original comment). I also want to limit the number of comment groups (a set of comments with a single comment that is not a reply at all) to, say, 25. How should I set up the MySQL table, and what sort of query would I use to extract what I want?
Here’s a simplified version of my DB:
ID int(11) NOT NULL AUTO_INCREMENT,
DatePosted datetime NOT NULL,
InReplyTo int(11) NOT NULL DEFAULT ‘0’,
Sorry if this is kind of confusing, I’m not sure how to word it any differently. I’ve had this problem in the back of my mind for a couple months now, and every time I solve one problem, I end up with another…
There are many ways. Here’s one approach that I like (and use on a regular basis).
The database
Consider the following database structure:
your data will look like this:
It’s fairly easy to select everything in a useable way:
ordering by
parent_path, date_postedwill usually produce results in the order you’ll need them when you generate your page; but you’ll want to be sure that you have an index on the comments table that’ll properly support this — otherwise the query works, but it’s really, really inefficient:For any given single comment, it’s easy to get that comment’s entire tree of child-comments. Just add a where clause:
the added where clause will make use of the same index we already defined, so we’re good to go.
Notice that we haven’t used the
parent_idyet. In fact, it’s not strictly necessary. But I include it because it allows us to define a traditional foreign key to enforce referential integrity and to implement cascading deletes and updates if we want to. Foreign key constraints and cascading rules are only available in INNODB tables:Managing The Hierarchy
In order to use this approach, of course, you’ll have to make sure you set the
parent_pathproperly when you insert each comment. And if you move comments around (which would admittedly be a strange usecase), you’ll have to make sure you manually update each parent_path of each comment that is subordinate to the moved comment. … but those are both fairly easy things to keep up with.If you really want to get fancy (and if your db supports it), you can write triggers to manage the parent_path transparently — I’ll leave this an exercise for the reader, but the basic idea is that insert and update triggers would fire before a new insert is committed. they would walk up the tree (using the
parent_idforeign key relationship), and rebuild the value of theparent_pathaccordingly.It’s even possible to break the
parent_pathout into a separate table that is managed entirely by triggers on the comments table, with a few views or stored procedures to implement the various queries you need. Thus completely isolating your middle-tier code from the need to know or care about the mechanics of storing the hierarchy info.Of course, none of the fancy stuff is required by any means — it’s usually quite sufficient to just drop the parent_path into the table, and write some code in your middle-tier to ensure that it gets managed properly along with all the other fields you already have to manage.
Imposing limits
MySQL (and some other databases) allows you to select “pages” of data using the the
LIMITclause:Unfortunately, when dealing with hierarchical data like this, the LIMIT clause alone won’t yield the desired results.
Instead, we need to so a separate select at the level where we want to impose the limit, then we join that back together with our “sub-tree” query to give the final desired results.
Something like this:
Notice the statement
limit 25 offset 0, buried in the middle of the inner select. This statement will retrieve the most recent 25 “root-level” comments.[edit: you may find that you have to play with stuff a bit to get the ability to order and/or limit things exactly the way you like. this may include adding information within the hierarchy that’s encoded in
parent_path. for example: instead of/{id}/{id2}/{id3}/, you might decide to include the post_date as part of the parent_path:/{id}:{post_date}/{id2}:{post_date2}/{id3}:{post_date3}/. This would make it very easy to get the order and hierarchy you want, at the expense of having to populate the field up-front, and manage it as the data changes]hope this helps.
good luck!