I need to extract the first table (not the material in the first table

Question

0

Asked: June 18, 20262026-06-18T16:31:17+00:00 2026-06-18T16:31:17+00:00

I need to extract the first table (not the material in the first table

0

I need to extract the first table (not the material in the first table tag) from a section in a html. The table may spread out in multiple pages, so it may be under multiple table tags. There may be more than one table in the section. My logic is that if there are text node between table tags, then they are different tables. If there is no text node between tables tags, they are part of one table. How can I implement this?

I didn’t use xpath to find the first table because I need to identify the appropriate section first by using reg exp to check each text node.

html='<body>
<table border="1">
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>
<table border="1">
<tr>
<td>row 3, cell 1</td>   
<td>row 3, cell 2</td>
</tr>
<tr> 
<td>row 4, cell 1</td>
<td>row 4, cell 2</td>
</tr>
</table>
<p>text </p>                       # Split by text, the below is a different table
<table border="1">
<tr>
<td>row 5, cell 1</td>
<td>row 5, cell 2</td>
</tr>
<tr>
<td>row 6, cell 1</td>
<td>row 6, cell 2</td>
</tr>

</body>'

This is my current code, which only picks up the first table tag rather than first TABLE(row 1-4 in my sample). I used gem tabler parser for extract the table.

require 'nokogiri'
require 'table_parser'

doc = Nokogiri::HTML(html)
table = Array.new

i = 0
doc.traverse do |node|
    if node.name == 'table' && i == 0
        table = TableParser::Parser::extract_table(node, node.path)
        i +=1
    end
end

puts table

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T16:31:18+00:00

Editorial Team

2026-06-18T16:31:18+00:00Added an answer on June 18, 2026 at 4:31 pm

It sounds like you want to merge consecutive tables:

# find each table that follows another table. Then reverse that so you're iterating from bottom to top.
doc.search('table + table').to_a.reverse.each do |table|
  # add each of the tables tr's to the previous table
  table.search('tr').each{|tr| table.previous.add_child tr}
  # then remove the table
  table.remove
end

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to extract the first table (not the material in the first table

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply