I have an html file that has a table of information and I’m trying to extract specific columns. The pattern is like this with alternating “TableDarkRow” and “TableLightRow”:
'>817338284254611</A></td><td Class='TableDarkRow' NOWRAP> 01/14/2011</td>
And I’m trying to extract an array of number and date pairs :
817338284254611
01/14/2011
I tried and came up with this:
>([0-9])+</A>(.*)NOWRAP> ?([0-9]{2}\/[0-9]{2}\/[0-9]{4})
But the (.*) is allowing the entire document to be selected between the first and last occurrences.
Replace the
.*with.*?for non-greedy matching.Reference: Watch Out for The Greediness!