I have a lot of large csv files with NULL values stored as ,, (i.e., no entry). Using LOAD DATA INFILE makes these NULL values into zeros, even if I create the table with a string like var DOUBLE DEFAULT NULL. After a lot of searching I found that this is a known “bug”, although it may be a feature for some users. Is there a way that I can fix this on the fly without pre-processing? These data are all numeric, so a zero value is very different from NULL.
Or if I have to do pre-processing, is there one that is most promising for dealing with tens of csv files of 100mb to 1gb? Thanks!
With minimal preprocessing with
sed, you can have your data ready for import.That should do an in-place edit of your CSV files and replace the blank values with
\N. Update the glob, *.csv, to match your needs.The reason there are two identical regular expressions matching
,,is because I couldn’t figure out another way to make it replace two consecutive blank values. E.g.,,,.