So, I’m still a REGEX dummy and have only been using them for the past 2 days. However my problem seems odd, to me at least.
The following pattern correctly matches this string for me:
<td valign=3D\"top\">For:</td>(\\s)+(=)?(.|\r\n|\n)+<td>(([a-z]|[A-Z]|=|\\s)+)<br>
Original String (taken from the html document which is being fed to the regex as input):
<td valign=3D"top">For:</td> = <td>XXXXXX XXXXX<br>
and the matched string:
<td valign=3D"top">For:</td> = <td>XXXXXX XXXXX<br>
However for this string:
<td valign=3D"top">For:</td> <td>YYYYYYY= YYYYY<br>
it matched the entire html document. I don’t understand why this is happening since after my (([a-z]|[A-Z]|=|\\s)+ I specified that there should be a <br> tag
Add the indicated question marks for non-greedy matching:
EDIT:
Further, you can simplify into a character class instead of using alternation:
My only question is why your
\\sis escaped while your\r\nare not…EDIT 2:
Use
*instead of+where, for example, spaces aren’t mandatory; and non-greedy quantifiers are probably always helpful in these cases: