I have a large malformed test HTML document which I need to get the numbers out of:
I’d like to get the primary ratio out. I’m using this regular expression:
(?<=Primary ratio</TD><TD>--</TD><TD>).*(?=</TD>)
On this string:
Primary ratio</TD><TD>--</TD><TD>10.52</TD><TD>14.97</TD><TD></TD></TR><TR align='right'><TD align='left'>Flip Ratio</TD><TD>-122.81</TD><TD>1.13</TD><TD>1.50</TD><TD></TD></TR><TR align='right'><TD align='left'>Secondary Ratio</TD><TD>--</TD><TD>0.70</TD><TD>0.70</TD><TD></TD></TR><TR align='right'><TD align='left'>RM Ratio</TD><TD>--</TD><TD>2.02</TD>
But I get this as a result:
10.52</TD><TD>14.97</TD><TD></TD></TR><TR align='right'><TD align='left'>Flip Ra
tio</TD><TD>-122.81</TD><TD>1.13</TD><TD>1.50</TD><TD></TD></TR><TR align='right
'><TD align='left'>Secondary Ratio</TD><TD>--</TD><TD>0.70</TD><TD>0.70</TD><TD>
</TD></TR><TR align='right'><TD align='left'>RM Ratio</TD><TD>--</TD><TD>2.02
I don’t want that, I just want the 10.52 number in the first tag.
I mean, it found the start of the string perfectly, but it didn’t find the first .
What am I doing wrong?
Replace
.*with.*?near the end of your regex; that should stop it from matching too much. Normally it’ll much as much as possible that fits the pattern, by adding the?, you ask it to match as little as possible instead.