I’m have a document A and want to build a new one B using A‘s node values.
Given A looks like this…
<html>
<head></head>
<body>
<div id="section0">
<h1>Section 0</h1>
<div>
<p>Some <b>important</b> info here</p>
<div>Some unimportant info here</p>
</div>
<div>
<div id="section1">
<h1>Section 1</h1>
<div>
<p>Some <i>important</i> info here</p>
<div>Some unimportant info here</div>
</div>
<div>
</body>
</html>
When building a B document, I’m using method a.at_css("#section#{n} h1").text to grab the data from A‘s h1 tags like this:
require 'nokogiri'
a = Nokogiri::HTML(html)
Nokogiri::HTML::Builder.new do |doc|
...
doc.h1 a.at_css("#section#{n} h1").text
...
end
So there are three questions:
-
How do I grab the content of
<p>tags preserving tags inside
<p>?Currently, once I hit
a.at_css("#section#{n} p").textit
returns a plain text, which is not what’s needed.If, instead of
.textI hit.to_htmlor.inner_html, the html appears escaped. So I get, for example,<p>instead of<p>. -
Is there any known true way of assigning nodes at the document building stage? So that I wouldn’t dance with
textmethod at all? I.e. how do I assigndoc.h1node with value ofa.at_css("#section#{n} h1")node at building stage? -
What’s the profit of
Nokogiri::Builder.with(...)method? I wonder if I can get use of it…
How do I grab the content of
<p>tags preserving tags inside<p>?Use
.inner_html. The entities are not escaped when accessing them. They will be escaped if you do something likebuilder.node_name raw_html. Instead:Is there any known true way of assigning nodes at the document building stage?
Similar to the above, one way is:
Voila! The node has moved from one document to the other.
What’s the profit of
Nokogiri::Builder.with(...)method?That’s rather unrelated to the rest of your question. As the documentation says:
I don’t think it would be useful to you here.
In general, I find the Builder to be convenient when writing a large number of custom nodes from scratch with a known hierarchy. When not doing that you may find it simpler to just create a new document and use DOM methods to add nodes as appropriate. It’s hard to tell how much hard-coded nodes/hierarchy your document will have versus procedurally created.
One other, alternative suggestion: perhaps you should create a template XML document and then augment that with details from the other, scraped HTML?