I need to check an HTML fragment and replace the ids or classes of HTML elements with other values. Using a regexp doesn’t fit my needs because there can be multiple occurrences of the searched-for class in the text, which I should keep.
For example, I have this HTML:
<div id='sweet'>
Bla bla sweet bla bla...
</div>
When I replace id='sweet' with ‘bitter’ I want to receive:
<div id='bitter'>
Bla bla sweet bla bla...
</div>
I can do it with Nokogiri without any problems, but sometimes I get invalid HTML and need to return the markup as it was. The problem is, Nokogiri fixes markup and broken nodes.
Example:
</table>
<div id='sweet'>
Bla bla sweet bla bla...
</div>
I will receive only this:
<div id='bitter'>
Bla bla sweet bla bla...
</div>
Example 2:
</div>
<div id='sweet'>
Bla bla sweet bla bla...
</div>
<table>
<tr>
<td>
Some text
I will get this:
<div id='bitter'>
Bla bla sweet bla bla...
</div>
<table>
<tr>
<td>
Some text
</td>
</tr>
</table>
How can I get this from the second example?:
</div>
<div id='bitter'>
Bla bla sweet bla bla...
</div>
<table>
<tr>
<td>
Some text
You can use regexes, but with a little more context:
will only change the first instance of ‘sweet’.
Similarly,
handles ‘sweet’ only within a class attribute.