I’ve got a line from a CSV file with " as field encloser and , as field seperator as a string. Sometimes there are " in the data that break the field enclosers. I’m looking for a regex to remove these ".
My string looks like this:
my $csv = qq~"123456","024003","Stuff","","28" stuff with more stuff","2"," 1.99 ","",""~;
I’ve looked at this but I don’t understand how to tell it to only remove quotes that are
- not at the beginning of the string
- not at the end of the string
- not preceded by a
, - not followed by a
,
I managed to tell it to remove 3 and 4 at the same time with this line of code:
$csv =~ s/(?<!,)"(?!,)//g;
However, I cannot fit the ^ and $ in there since the lookahead and lookbehind both do not like being written as (?<!(^|,)).
Is there a way to achieve this only with a regex besides splitting the string up and removing the quote from each element?
This should work:
1and2implies that there must be at least one character before and after the comma, hence the positive lookarounds.3and4implies that these characters can be anything but a comma.