Let me preface this by saying I’m a complete amateur when it comes to RegEx and only started a few days ago. I’m trying to solve a problem formatting a file and have hit a hitch with a particular type of data. The input file is structured like this:
Two words,Word,Word,Word,'Number, number'
What I need to do is format it like this…
'Two words','Word',Word','Word','Number, number'
I have had a RegEx pattern of
s/,/','/g
working, except it also replaces the comma in the already quoted Number, number section, which causes the field to separate and breaks the file. Essentially, I need to modify my pattern to replace a comma with ‘,’ [quote comma quote], but only when that comma isn’t followed by a space. Note that the other fields will never have a space following the comma, only the delimited number list.
I managed to write up
s/,[A-Za-z0-9]/','/g
which, while matching the appropriate strings, would replace the comma AND the following letter. I have heard of backreferences and think that might be what I need to use? My understanding was that
s/(,)[A-Za-z0-9]\b
should work, but it doesn’t.
Anyone have an idea?
s/,([^ ])/','$1/will match a ‘,‘ followed by a ‘not-a-space’, capturing the not-a-space, then replacing the whole thing with the captured part.Depending on which regex engine you’re using, you might be writing
\1or other things instead of$1.If you’re using Perl or otherwise have access to a regex engine with negative lookahead,
s/,(?! )/','/(a ‘,‘ not followed by a space) works.Your input looks like CSV, though, and if it actually is, you’d be better off parsing it with a real CSV parser rather than with regexes. There’s lot of other odd corner cases to worry about.