Can someone please tell me why my pattern: <p(\s+(.*)?)?>(.[^</p>]*)?</p> does not work correctly. Example matches:
<p>This is a test and anything can be here even other <tags>tags</tags></p><p style="test">This is a test</p><p></p>
And if the above were all on one line it should find 3 separate patterns. The link below demonstrates its true behaviour which is very odd…
The matches it finds should always immediately start when it finds <p and immediately stop when it finds </p>
There are a couple of problems with your regex. Let’s see what they look like.
Here’s your regex: –
(.*)?. It is not doing what you think. It is not a enforcing a reluctant behaviour on*quantifier. Rather it’s a enforcing an optional quantifier(?)over a greedy*quantifier. It simply means match0 or 1repetition of(.*). For making it reluctant, you need to move?inside the bracket. So, you need to use(.*?)instead of(.*)?.[^</p>]does not negate</p>rather it negates –<, /, p, >as separate characters. Note that in a character class, each character is taken literally. There is not grouping in there. So,(.[^</p>]*)means match acharacterif is not followed by0 or more repetitionof either of[</p>]. That is not what you want. If you want to match a sequence that is not</p>, then you can use a negative look-ahead like this: –((?!</p>).)*. Now this will check first whether the following sequence is not</p>, then it matches the next character.So, your regex pattern should be: –
Or, you can even simplify your regex to: –