I am trying to build a regex that stops at the first occurrence. I know I can make it non-greedy by putting ?.
Consider a string:
"This is sample text located at first line and located at second line."
Here, I am searching for pattern1 using pattern2.
pattern1is"text"pattern2is"located at"
In the above string, I want to extract "text", and my search pattern is "located at", so I am using the following regex:
/is.*sample(.*)located at?/
How do I make located at non-greedy? I am using http://rubular.com/ to verify my regex.
Your regex isn’t correct.
If you want a single “word” that occurs before the first “located at”, you could use:
I’m defining “word” to mean non-whitespace characters using
\S, so punctuation and numbers are going to be included with the alpha characters. Other classes could be used, such as\wif you want[A-Za-z0-9_]. Otherwise use[a-z]like:If you want any text that occurs between “sample” and the first “located at”, you could use:
In your pattern
/test.*sample(.*)located at?/, you’re using multiple.*, which mean zero-or-more of anything (but, not really anything depending on the context but that’s deeper than we need to go right now). That “more” is the part you’re colliding with, because it’s greedy. And, because you use that twice, it’s doubly greedy. You could use the “non-greedy” variant by adding?, but it still wouldn’t work right because you’re giving the regex engine too much rope to play with. My patterns tighten that all up, reducing the need to use the?modifier in the first two.My third example needs it because, again,
.+would have been greedy and it needed to be moderated.Finally,
at?in your pattern isn’t applying?to modify the.*, it’s acting on the preceedingt, causing the engine to thing “zero-or-one ‘t’ must be found”, which isn’t what you want, because that would match “a” or “at”.