I have the following HTML:
<ul class="filtering_new" width="50%">
<li class="filter">1</li>
<li class="filter">2</li>
<script>Alert('1');</script>
<li class="filter">3</li>
</ul>
How can I get li with inner_html = 3?
I tried like this:
page.search("//ul.filtering_new").each do |list|
puts list.search("li").size
end
where page is the HTML document.
size = 2, but it should be 3.
I tried to do like in manual https://github.com/hpricot/hpricot/wiki/hpricot-challenge
but I cannot even find <script.
list.search("script")
returns nothing.
Most XML/HTML parsing in Ruby uses Nokogiri these days, so I’ll recommend that parser. However, both Hpricot and Nokogiri support XPath and CSS, so they are fairly interchangeable.
I’d go about it this way:
That finds the candidate nodes, then returns them as a NodeSet to be iterated over, where they are selected/rejected based on the node’s text.
That offloads more of the comparison to the underlying libXML library, where it runs a lot faster.