*Note: The output of the Array() is a PHP print_r()*
I have this HTML tag:
<tr>
<td width="40" align="left"><div class="icSkill" id="skill4"></div></td>
<td colspan="2">SOME_VALUE_I_WANT </td>
</tr>
I really want to extract this with RegEx and don’t want to use HTML parsers in this case.
I do this Regex (I use the s-flag to ignore the file’s newlines):
\<tr\>\<td\swidth="40"\salign="left"\>\<div\s+class="icSkill"\s+id="skill(\d+)".*\<\/tr\>
Problem now is that the Regex doesn’t stop at the first found close TR tag, but I want it to. I know it probably has something todo with assertions, only I don’t know how to.
Array
(
[0] => <tr><td width="40" align="left"><div class="icSkill" id="skill4"></div></td><td colspan="2">SOME_VALUE_I_WANT
</td></tr><tr><td rowspan="2" align="left"><div class="icGuard" id="guard9"></div></td></tr>
[1] => 4
)
The basic examples like: /[^<]*/ won’t work in this case. Is there also a way to tell regex something like:
/[^A_STRING]*/ (in words; stop unless you find A_STRING)
OR BETTER EXAMPLE:
/[^A_STRING_FIRST_TIME]*/ (in words; stop unless you find A_STRING for the FIRST_TIME)
The problem is greediness.
.*consumes as much as it can. You can make it ungreedy by appending?:Also, as you can see, there is really no need to do so much escaping. It only hinders legibility.
An alternative way to make repetition ungreedy, is to use the modifier
U, which makes all repetition ungreedy globally in the whole pattern. I prefer the local variant (using?), though.In any case, there is a different possibility which mimics
[^A_STRING]*(which doesn’t work, because it matches any string of characters, that do not includeA,_,S,T,R,I,NorG). You can use a negative lookahead at every position of the repetition:(substitute this for
.*or.*?). It should be equivalent in most cases, but execution time might differ. Plus, it’s a little harder to decipher.