If I use a delimiter on a string:
Scanner scanString = new Scanner(line).useDelimiter("<.*>");
I want to know why this won’t preserve the text in
<a href="https://post.craigslist.org/c/snj?lang=en">post to classifieds</a>
but it will in a line with only
<option value="ccc">community
While
Scanner scanString = new Scanner(line).useDelimiter("<.*?>");
will work for both.
As I understand it this "<.*>" should exclude a string starting with “<” followed by any character 0 or more times until it reaches a “>”. So shouldn’t it not start excluding again until it reaches another “<“?
This is because the second expression uses a reluctant (as opposed to greedy) quantifier, which means that it does not attempt to match the entire string and back off from there, like the first one does.
This expression
"<.*>"tries to advance as far as possible into your input string, so it goes all the way to the end. Once it’s there, it discovers that it has a match, and so it stops. The reluctant version"<.*?>"does not do that: it matches to the first>, and stops.This article provides a great read on quantifiers.