in the following, i need to get:
String regex = "Item#: <em>.*</em>";
String content = "xxx Item#: <em>something</em> yyy";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(content);
if( matcher.find() ) {
System.out.println(matcher.group());
}
it will print:
Item#: <em>something</em>
but i just need the value “something”.
i know i can use .substring(begin,end) to get the value,
but is there another way which would be more elegant?
It prints the whole string because you have printed it.
matcher.group()prints the complete match. To get specific part of your matched string, you need to change your Regex to capture the content between the tag in a group: –Also, use
Reluctantquantifier(.*?)to match the least number of characters before an</em>is encountered.And then in if, print
group(1)instead ofgroup()Anyways, you should not use
Regexto parseHTML. Regex is not strong enough to achieve this task. You should probably use someHTMLparser like –HTML Cleaner. Also see the link that is provided in one of the comments in the OP. That post is very nice explanation of the problems you can face.