I have a text that will contain strings surrounded by #[ ]. I need to match these strings and find out the string inside.
My example text:
Lorem ipsum dolor #[This is my first string.] sit amet, consectetur elit,
sed do eiusmod tempor incididunt #[This is my second string?] ut et dolore magna.
With this text I would like to have two matches:
#[This is my first string.]
#[This is my second string?]
Now I have written my regular expression:
\#\[([\w\s\W]*)\]
I added \W because I would like to include dots, question marks and other characters that is not a letter. This causes me a problem because now #, [ and ] is included which results in my text to have only one match:
#[This is my first string.] sit amet, consectetur elit,
sed do eiusmod tempor incididunt #[This is my second string?]
Of course it matches on the first occurrence of #[ and the last occurrence of ].
How to solve this? I can accept to not include #, [ and ] in my strings but all other none-letter character should be included if that is possible.
Your problem is not the
\W, it’s the*. The*is greedy and will match the longest string possible. So it matches the first[with the last]and takes everything in between.Try this:
In fact, you should be able to simplify it to just:
The
?after a qualifier (*or+) will force minimal matches. I.e. non-greedy.