Can anyone explain me, step by step, why the regex fails with this:
<.++>
with this string to compare: <em>
The same string is found with lazy or greedy quantifiers but in this case
what steps are involved?
I use Java regex flavor.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
From the Java
Patterndocumentation:In your example, the
<in your regex matches<in the string, then.++matches the entire rest of the string,em>. You still have a>in your regex, but there are no characters left in the string for it to match (because.++consumed them all). So the match fails.If the quantifier were greedy, i.e. if it were
.+instead of.++, at this point the regular expression engine would try reducing the portion matched by.+by one character, to justem, and try again. This time the match would succeed, because there would be a>left in the string for the>in the regex to match.EDIT: A lazy quantifier would work like a greedy quantifier in reverse. Instead of starting by trying to match the whole rest of the string and backing off character by character, the lazy quantifier would start by trying to match a single character, in this case just
e. If that doesn’t allow the full regex to match (which it wouldn’t here, because you’d have>in the regex trying to matchmin the string), the lazy quantifier would move up to matching two characters,em. Then the>in the regex would line up with>in the string and the match would succeed. If it didn’t work out, though, the lazy quantifier would move up to three characters, and so on.