I have a rather large csv file (17GB) which I’m trying to sanity check. I’ve written a little script which looks like this:
#!/usr/bin/php
<?php
$f = fopen($argv[1],'r');
$i=0;
while (!feof($f)) {
$row = fgetcsv($f);
$i++;
}
print $i."\n";
?>
Which should just read in the number of rows and print it out. This script outputs:
60770881
But if I do a wc -l the result is 60777200.
My csv file was generated from MySQL using:
INTO OUTFILE '/tmp/file.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n'
So it shouldn’t have any unescaped newlines or anything like that. Does anyone have any idea what could be wrong?
CSV record can span multiple lines. If you have carriage-returns in any of the values, there will be multiple (2 or more) physical lines in the file (as counted by
wc) but they would be read as one CSV record usingfgetcsv.Also, you don’t need to check for
feof($f), becausefgetcsvwill return FALSE on end-of-file.