I am trying to write a regex to match optionally quoted values (valid quotes are "' and `).
The rule is that the occurence of two quotes is an escaped quote.
Here is the regex I came up with:
(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote).)|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)
And now in readable (with comments indicating what I think it does):
(?P<quote>["'`])? #named group Quote (any quoting character?)
(?P<value> #name this group "value", what I am interested in
(?(quote) #if quoted
((?!(?P=quote).)|((?=(?P=quote)).){2})* #see below
#match either anything that is not the quote
#or match 2 quotes
|
[^\s;]* #match anything that is not whitespace or ; (my seperators if there are no quotes)
)
)
(?(quote)(?P=quote)|) #if we had a leeding quote we need to consume a closing quote
It Performs fine for unquoted strings, quoted strings crash it with:
match = re.match(regexValue, line)
File "****/jython2.5.1/Lib/re.py", line 137, in match
return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion depth exceeded
what do I do wrong?
edit: Example input => output(for capturing group ‘value’ (desired)
text => text
'text' => text
te xt => te
'te''xt'=> te''xt #quote=' => strreplace("''","'") => desired result: te'xt
'te xt' => te xt
edit2: while looking at it i noticed a mistake, see below, however I believe the above to be still a valid re +> it might be a Jython bug, however it still does not do what I want it to do: (very subtle difference, point moved out of the lookahead group
new:(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote)).|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)
old:(?P<quote>["'`])?(?P<value>(?(quote)((?!(?P=quote).)|((?=(?P=quote)).){2})*|[^\s;]*))(?(quote)(?P=quote)|)
I found the solution after a bit of fiddeling:
and no I don’t understand the difference