I would like to parse a table using Nokogiri. I’m doing it this way

Question

0

Asked: May 20, 20262026-05-20T00:39:44+00:00 2026-05-20T00:39:44+00:00

I would like to parse a table using Nokogiri. I’m doing it this way

0

I would like to parse a table using Nokogiri. I’m doing it this way

def parse_table_nokogiri(html)

    doc = Nokogiri::HTML(html)

    doc.search('table > tr').each do |row|
        row.search('td/font/text()').each do |col|
            p col.to_s
        end
    end

end

Some of the table that I have have rows like this:

<tr>
  <td>
     Some text
  </td>
</tr>

…and some have this.

<tr>
  <td>
     <font> Some text </font>
  </td>
</tr>

My XPath expression works for the second scenario but not the first. Is there an XPath expression that I could use that would give me the text from the innermost node of the cell so that I can handle both scenarios?

I’ve incorporated the changes into my snippet

def parse_table_nokogiri(html)

    doc = Nokogiri::HTML(html)
    table = doc.xpath('//table').max_by {|table| table.xpath('.//tr').length}

    rows = table.search('tr')[1..-1]
    rows.each do |row|

        cells = row.search('td//text()').collect {|text| CGI.unescapeHTML(text.to_s.strip)}
        cells.each do |col|

            puts col
            puts "_____________"

        end

    end

end

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T00:39:45+00:00

Use:

td//text()[normalize-space()]

This selects all non-white-space-only text node descendents of any td child of the current node (the tr already selected in your code).

Or if you want to select all text-node descendents, regardles whether they are white-space-only or not:

td//text()

UPDATE:

The OP has signaled in a comment that he is getting an unwanted td with content just a ' ' (aka non-breaking space).

To exclude also tds whose content is composed only of (one or more) nbsp characters, use:

td//text()[translate(normalize-space(), '&#160;', '')]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I would like to parse a table using Nokogiri. I’m doing it this way

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply