How do you parse a CSV file using gawk? Simply setting FS=',' is not enough, as a quoted field with a comma inside will be treated as multiple fields.
Example using FS=',' which does not work:
file contents:
one,two,'three, four',five 'six, seven',eight,'nine'
gawk script:
BEGIN { FS=',' } { for (i=1; i<=NF; i++) printf 'field #%d: %s\n', i, $(i) printf '---------------------------\n' }
bad output:
field #1: one field #2: two field #3: 'three field #4: four' field #5: five --------------------------- field #1: 'six field #2: seven' field #3: eight field #4: 'nine' ---------------------------
desired output:
field #1: one field #2: two field #3: 'three, four' field #4: five --------------------------- field #1: 'six, seven' field #2: eight field #3: 'nine' ---------------------------
The short answer is ‘I wouldn’t use gawk to parse CSV if the CSV contains awkward data’, where ‘awkward’ means things like commas in the CSV field data.
The next question is ‘What other processing are you going to be doing’, since that will influence what alternatives you use.
I’d probably use Perl and the Text::CSV or Text::CSV_XS modules to read and process the data. Remember, Perl was originally written in part as an
awkandsedkiller – hence thea2pands2pprograms still distributed with Perl which convertawkandsedscripts (respectively) into Perl.