Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8224269
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T15:00:34+00:00 2026-06-07T15:00:34+00:00

I was trying to parse a comment thread on the forum news.ycombinator.com. However, after

  • 0

I was trying to parse a comment thread on the forum news.ycombinator.com. However, after looking at the html it seems there is no hierarchy to nest comments. This would make it really difficult to parse. For example, here is a parent comment and its child:

<!-- This part below draws the upvote/downvote images -->
<table border=0><tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td><td valign=top><center><a id=up_4241971 href="vote?for=4241971&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4241971></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; ">


<!-- This part below is user/time and permalink info for a parent comment -->
<span class="comhead"><a href="user?id=JshWright">JshWright</a> 7 hours ago  | <a href="item?id=4241971">link</a></span></div><br>


<!-- This part below is actual Comment -->
<span class="comment"><font color=#000000>I just got my Verizon Galaxy S3, and ordered the 20-pack of NFC tags offered by <a href="http://tagsfordroid.com" rel="nofollow">http://tagsfordroid.com</a><p>I think I know what my Dad felt like when he got his first label printer... Within days it seemed like every object in his office was labeled...<p>I've got a tag in my car to automatically send my wife a "Headed home" SMS, a tag on my night stand to toggle between 'night' (silent) and 'day' (loud) volume settings, a tag by my back door to launch CardioTrainer when I go out for a run (this one may have crossed the "I've run out of ideas" line...). I'm using the keychain tag to dial a response number for the fire department I'm a member of.</font></span><p><font size=1><u><a href="reply?id=4241971&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr>


<!-- This part below is upvote/downvote arrow for child of parent -->
<tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td><td valign=top><center><a id=up_4242025 href="vote?for=4242025&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4242025></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; ">

<!-- This part has user/time/permalink for child comment -->
<span class="comhead"><a href="user?id=msbmsb">msbmsb</a> 7 hours ago  | <a href="item?id=4242025">link</a></span></div><br>

<!-- This part is the content of the  child comment -->
<span class="comment"><font color=#000000>I did the same thing. Tag next to the entry-way light switch for changing to an "at-home" profile, tag next to the bed for switching between night mode and morning mode, tag at work, keychain tag for switching between car mode and quiet mode.<p>And profile switching is just the basics. You can have a tag that connects guests' NFC-enabled phones to your wifi without having to hand out the password, for instance.<p>NFC task launcher + tasker is an amazing combination that opens up all kinds of possibilities.</font></span><p><font size=1><u><a href="reply?id=4242025&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr><tr><td>

So how does hacker news store the hierarchial structure of the comments, and how can I replicate it when I am scraping their data?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T15:00:35+00:00Added an answer on June 7, 2026 at 3:00 pm

    In the tables, the indenting is done by image tags:

    ...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td>...
    ...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td>...
    

    Presumably you’d read and parse those. Reconstructing the actual threading represented could be done by keeping an internal stack of the width values.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am looking at this boot.rb file: http://github.com/bestbuyremix/BBYIDX/blob/master/config/boot.rb And after trying to understand it,
Trying to parse an HTML document and extract some elements (any links to text
I keep getting the following error when trying to parse some html using BeautifulSoup:
I'm trying to parse the JSON file of a Reddit thread with all it's
I'm trying parse the follow XML file: <root>Root <pai>Pai_1 <filho>Pai1,Filho1</filho> <filho>Pai1,Filho2</filho> </pai> <pai>Pai_2 <filho>Pai2,Filho1</filho>
Trying to parse Maplines for an airport. Each airport can have X number of
Trying to parse some XML but apparently this is too much for a lazy
Trying to parse an SQL string and pull out the parameters. Ex: select *
Trying to parse an iTunes Atom feed with a PHP script. If you visit
Im trying to parse the string located in /proc/stat in a linux filesystem using

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.