I have a HTML doc to parse and read a bunch of stuff from

Question

0

Asked: June 1, 20262026-06-01T18:24:48+00:00 2026-06-01T18:24:48+00:00

I have a HTML doc to parse and read a bunch of stuff from

0

I have a HTML doc to parse and read a bunch of stuff from there. The problem is the html has multiple tables in it, and I am only interested in one table. Plus I want to read only the lines that having some useful content. Here is sample html page, there are two tables with no ID, and I want only the second table and only the lines that are useful to humans.

<HTML>
<BODY>

<TABLE>
  <TR>
    <TD> I don't want this table </TD></TR>
  <TR>
    <TD></TD>
    <TD> No No No <br></TD>
  </TR>
....
</TABLE>


<TABLE>
  <TR>
    <TD>04/13/2012 22:51  I want this table </TD></TR>
  <TR>
    <TD></TD>
    <TD> First - something there <br></TD>
  </TR>
  <TR>
    <TD>04/13/2012 23:23  Update from xyz</TD></TR>
  <TR>
    <TD></TD>
    <TD>Second - something here <br></TD>
  </TR>
</TABLE>


</BODY>
</HTML>

I am trying this code, which is obviously not working. The o/p is not the text I want. It includes both tables, I only want the second table. help!

require 'curb'
require 'nokogiri'
c = Curl::Easy.perform("http://server/cgi-bin/page.cgi?id=123456")
html_doc = Nokogiri::HTML(c.body_str.to_s)
puts html_doc.xpath("//table/tr/td")

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T18:24:49+00:00

Editorial Team

2026-06-01T18:24:49+00:00Added an answer on June 1, 2026 at 6:24 pm

Have you tried the xpath of //table[2]/tr/td to get the second table. If you can change the source of the HTML the best solution would be to provide id attributes for your tables.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a HTML doc to parse and read a bunch of stuff from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply