My question looks like some other questions in Stackoverflow, but i did not find exacly what I was looking for.
I need to retrive a whole phrase that contains a specific word. This phrase is also between “>” and “<“.
For example:
text:
"<div>bla bla bla</div><div>blu blu GOLD blu</div><form> bla bla...."
What I need is:
blu blu GOLD blu
I’m trying to do that in Perl. What I have until now is:
$specific_word = GOLD;
while ($var=~/[>]?(?<phrase>(.*?)\Q$specific_word\E(.*?))</ig) {
script.....
}
What I get with this regex, given the example above, is:
<div>bla bla bla</div><div>blu blu GOLD blu
How do I do to find the first “>” before my specific word, and not the first “>” of the entire text?
HTML::TreeBuilder is a better way to parse HTML in Perl.
But to answer the question, you probably want to match
/[^>]*${specific_word}[^<]*/g, which basically says that>is not on the left hand side and<is not on the right hand side of the phrase.