I am trying to remove quotes from a string. Example:
"hello", how 'are "you" today'
returns
hello, how are "you" today
I am using php preg_replace.
I’ve got a couple of solutions at the moment:
(\'|")(.*)\1
Problem with this is it matches all characters (including quotes) in the middle, so the result ($2) is
hello", how 'are "you today'
Backreferences cannot be used in character classes, so I can’t use something like
(\'|")([^\1\r\n]*)\1
to not match the first backreference in the middle.
Second solution:
(\'[^\']*\'|"[^"]*")
Problem is, this includes the quotes in the back reference so doesn’t actually do anything at all. The result ($1):
"hello", how 'are "you" today'
Instead of:
Simply write:
Now one of the groups will match the quoted content.
In most flavor, when a group that failed to match is referred to in a replacement string, the empty string gets substituted in, so you can simply replace with
$1$2and one will be the successful capture (depending on the alternate) and the other will substitute in the empty string.Here’s a PHP implementation (as seen on ideone.com):
A closer look
Let’s use
1and2for the quotes (for clarity). Whitespaces will also be added (for clarity).Before, you have, as your second solution, this pattern:
As you correctly pointed out, this match a pair of quotes correctly (assuming that you can’t escape quotes), but it doesn’t capture the content part.
This may not be a problem depending on context (e.g. you can simply trim one character from the beginning and end to get the content), but at the same time, it’s also not that hard to fix the problem: simply capture the content from the two possibilities separately.
Now either group 1 or group 2 will capture the content, depending on which alternate was matched. As a “bonus”, you can check which quote was used, i.e. if group 1 succeeded, then
1was used.Appendix
The
[…]is a character class. Something like[aeiou]matches one of any of the lowercase vowels.[^…]is a negated character class.[^aeiou]matches one of anything but the lowercase vowels.(…)is used for grouping.(pattern)is a capturing group and creates a backreference.(?:pattern)is non-capturing.References