I’m trying to split up a string into two parts using regex. The string is formatted as follows:
text to extract<number>
I’ve been using (.*?)< and <(.*?)> which work fine but after reading into regex a little, I’ve just started to wonder why I need the ? in the expressions. I’ve only done it like that after finding them through this site so I’m not exactly sure what the difference is.
It is the difference between greedy and non-greedy quantifiers.
Consider the input
101000000000100.Using
1.*1,*is greedy – it will match all the way to the end, and then backtrack until it can match1, leaving you with1010000000001..*?is non-greedy.*will match nothing, but then will try to match extra characters until it matches1, eventually matching101.All quantifiers have a non-greedy mode:
.*?,.+?,.{2,6}?, and even.??.In your case, a similar pattern could be
<([^>]*)>– matching anything but a greater-than sign (strictly speaking, it matches zero or more characters other than>in-between<and>).See Quantifier Cheat Sheet.