I’m trying to use Nokogiri to extract data from an HTML file using the code below:
@doc = Nokogiri::HTML("<table >
<tr BGCOLOR=\"#eeeeee\">
<td>SPILLED</td>
</tr>
<tr BGCOLOR=\"#eeeeee\">
<td >RUSTING</td>
</tr>
</table>")
@doc.xpath('//tr[@bgcolor="#eeeeee"]').each do |record|
print record
record.xpath("//td").each do |cell|
print cell
end
end
The first block seems to be working as expected, each time through record contains just one of the rows. The second block, on the other hand, is accessing the <td> elements for BOTH rows, which is mystifying to me given that the record variable is showing that it has the data for just one row before entering the inner block.
How is “record” having the data for both rows when it’s in the inner block?
Figured it out–the “//” preceding the “td” causes the search to go back up the tree, above what’s in the record variable, i.e., the elements in record still have a relationship to parent elements, etc. Eliminating the “//” solved it.