I have trouble parsing a HTML table using Nokogiri and Ruby. My HTML table structure looks like this
<table>
<tbody>
<tr>
<td>Firstname</td>
<td>Lastname</td>
<td>Middle</td>
</tr>
<tr>
<td>ding</td>
<td>dong</td>
<td>ling</td>
</tr>
....
....
.... {more tr's and td's with similar data exists.}
....
....
....
....
....
</tbody>
</table>
In the above HTML table I would like to entirely remove the first and corresponding elements, so remove Firstname, Lastname and Middle i.e., I want to start stripping the text only from the second . So this way I get only the contents of the table from the second or tr[2] and no column headers.
Can someone please provide me a code as to how to do this.
Thanks.
After you’ve run either one of the above 2 snippets,
rowswill contain every<tr>...</tr>after the first one. For exampleputs rows.to_xmlprints the following:To get the inner text, removing all the html tags, run
puts rows.textTo get the inner text of the td tags only, run
rows.xpath('td').map {|td| td.text }