I want to be able to match text in between two tags, starting at an opening tag and ending in a closing tag.
Say I have this block of text in a variable called ‘text’:
some text some text some text some text some text
<some_tag>
some text some text some text some text some text
</some_tag>
some text some text some text some text some text
I want to parse the contents ‘text’ doing nothing until it finds an opening tag, in this case ‘some_tag’, and once it finds an opening tag I want it to capture everything until the tag closes.
I’ve been fooling around with blocks and regular expressions for about an hour now and cannot seem to figure out a good way to work this out.
I’d appreciate any and all pointers, thanks!
You should use a parser for HTML. Regex and HTML tends to make a volatile mix, that leads to insanity in large doses.
Using Nokogiri:
This is searching through the HTML fragment, looking for
<p>tags. For each one it finds it’ll extract the inner text.I’m using Nokogiri’s CSS mode, by using
"p". I could use XPath instead, but CSS is understood by more people.