I need to parse HTML using Rails and Nokogiri. Here is the HTML:
<body>
<div id="mama">
<div class="test1">text</div>
<div class="test2">text2</div>
</div>
<div id="mama">
<div class="test1">text</div>
<div class="test2">text2</div>
</div>
<div id="mama">
<div class="test1">text</div>
<div class="test2">text2</div>
</div>
</body>
How I should form loop question? I’ve tried so many times but still getting an error or bad results…
…
doc.xpath('//div[@id='mama']/?or what?').each do |node|
parse_file.puts text1
parse_file.puts text2
parse_file.puts text1
parse_file.puts \n
end
Result should be like
text from first mama
text2 from first mama
text from first mama
text from second mama
and so on...
First, note that the HTML you posted is syntactically invalid: it is illegal to have more than one element with the same
idattribute value. If you have control over your HTML, you should fix this problem.Using that same (invalid) HTML, however, Nokogiri still has no trouble:
If you wanted to use XPath directly (as Nokogiri does behind the scenes for the CSS) you would do this: