I’m trying to sanitize some HTML and just remove a single tag (and I’d really like to avoid using nokogiri, etc). So I’ve got the following string appearing I want to get rid of:
<div class="the_class>Some junk here that's different every time</div>
This appears exactly once in my string, and I’d like to find a way to remove it. I’ve tried coming up with a regex to capture it all but I can’t find one that works.
I’ve tried /<div class="the_class">(.*)<\/div>/m and that works, but it’ll also match up to and including any further </div> tags in the document, which I don’t want.
Any ideas on how to approach this?
I believe you’re looking for an non-greedy regex, like this:
Note the added
?. Now, the capturing group will capture as little as possible (non-greedy), instead of as most as possible (greedy).