What is the most efficient way to determine how many comments a particular blog post has? We want to store the data for a new web app. We have a list of permalink URl’s as well as the RSS feeds.
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
If I understand correctly, you want a heuristic to estimate the number of comments in an HTML page which is known to be a blog post, yes?
Very often, a specific blog will have some features which make it easy to work out. If you look at mine over at http://kstruct.com/ you’ll see that all the pages with comments say ‘X Responses’, so if you were able to do some work on a per blog basis, it’s probably not really difficult.
If you needed something generic, I guess there are a few common features that comments have that you might be able to detect. For one, any links in them are quite likely to have rel=’nofollow’ attributes, so seeing that within a block might imply that it’s a comment.
The main interesting thing to look for would be changes in the structure of posts for m the same site. For example, there’s also a very good chance that each comment will have its own anchor so people can link directly to it, so you could look at the differing numbers of <a name=’XXX’> tags in a given page on the same site to get an idea of the relative numbers of comments.
As Michael Stum pointed out, if the pages have a Comment-RSS feed, your life is made a lot easier because you can get the comment data in a structured format.
All in all, though, I think it’s going to be quite a challenging problem to solve in general.