Normally, I do something like
IFS=','
columns=( $LINE )
where $LINE is a line from a csv file I’m reading.
However, how do I handle a csv file with embedded commas? I have to handle several hundred gigs of file so everything needs to be done quickly, i.e., no multiple readings of a line, definitely no loops (last time I tried that slowed it down several factors).
The general structure of the code is as follows
FILENAME=$1
cat $FILENAME | while read LINE
do
IFS=","
columns=( $LINE )
# affect columns changes here
newline="${columns[*]}"
echo "$newline"
done
Preferably, I need something that goes
FILENAME=$1
cat $FILENAME | while read LINE
do
IFS=","
# code to tell bash to ignore if IFS is within an open quote
columns=( $LINE )
# affect columns changes here
newline="${columns[*]}"
echo "$newline"
done
Any tips would be appreciated. Otherwise, I’ll probably switch to using another language to handle this stuff.
Probably embedded commas is just the first obvious problem that you encountered while parsing those CSV files.
Future problems that might popped are:
I generally tend to follow the philosophy that If there is a (reputable) module that parses some
format you have to parse, use it instead of making a homebrew
I don’t think there is such a thing for bash, but there are some for Perl. I’d go for
Text::CSV_XS. Being written in C I expect it to be very fast.