I have a 16GB CSV that is ;-seperated and the fields are always quoted. I need to quickly filter out rows where the second field is blank.
"12345";"987";"..." # keep it
"67890";"";"..." # omit it
The first two fields are numbers only, if that matters for performance.
I figure, that awk might be the most performant tool for this but I can’t seem to get it right. I tried this, but it omits most lines wrongly:
cat huge.csv | awk '/^"\d+";"\d/' > filtered.csv
Of course it doesn’t have to be awk; any command line tool commonly found on linux and OS X will do.
Another solution simply using
i.e., your command would be:
This sets the input field separator to
"and checks the 4th field. If it’s non-zero it prints the line implicitly. Gives:tested with GNU awk 3.1.6