Say I have a normal CSV like
# helloworld.csv
hello,world,,,"please don't replace quoted stuff like ,,",,
If I want mysqlimport to understand that some of those fields are NULL, then I need:
# helloworld.mysql.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,,",\N,\N
I got some help from another question — Why does sed not replace overlapping patterns — but note the problem:
$ perl -pe 'while (s#,,#,\\N,#) {}' -pe 's/,$/,\\N/g' helloworld.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,\N,",\N,\N
^^
How can I write the regex so it doesn’t replace ,, if they’re between quotes?
FINAL ANSWER
Here’s the final perl I used, thanks to the accepted answer below:
perl -pe 's/^,/\\N,/; while (s/,(?=,)(?=(?:[^"]*"[^"]*")*[^"]*$)/,\\N/g) {}; s/,$/,\\N/' helloworld.csv
That takes care of leading, trailing, and unquoted empty strings.
Assuming that you won’t have escaped quotes, you can make sure that you only replace
,,if it’s followed by an even number of quotes:Input:
Output: