I have a webpage converted to a string and I’m trying to extract three numbers from it from this line.
<td class="col_stat">1</td><td class="col_stat">0</td><td class="col_stat">1</td>
From the line above I already have it extracting the first ‘1’ using this
String filePattern = "<td class=\"col_stat\">(.+)</td>";
pattern = Pattern.compile(filePattern);
matcher = pattern.matcher(text);
if(matcher.find()){
String number = matcher.group(1);
System.out.println(number);
}
Now what I want to do is extract the 0 and the last 1 but anytime I try edit the regular expression above it just outputs the complete webpage on the console. Anyone have any suggestions??
Thanks
Regex matching is greedy, try this instead (looking only for
(\d+)instead of(.+)(which matches everything until the last</td>):On a related note, I completely agree with other’s suggestions to use a more structured approach to interpreting HTML.