I am running the following code with such error.
> rat <- scan("sortedratings.csv",nlines=760,sep=",",what=rat.cols,multi.line=FALSE);
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 755 did not have 8 elements
>
And here is the line that is causing all the trouble
ubuntu@ip-10-28-6-239:/data/csv$ sed -n "750,760p" sortedratings.csv
"281656475","2.5.0","Jul 17, 2011","","","KK9876",4,0
"281656475","2.5.0","Jul 17, 2011","","","Lyteskin45",4,0
"281656475","2.5.0","Jul 17, 2011","","","Mrs. Felton",5,0
"281656475","2.5.0","Jul 17, 2011","","","Nick Bartoszek",4,0
"281656475","2.5.0","Jul 17,2011","","","SANFRANPSYCHO",5,0
"281656475","2.5.0","Jul 17, 2011","","","Wxcgfduytrewjgf@!?$(:@&&$&@\"",5,0
"281656475","2.5.0","Jul 18, 2011","","","Downs58",5,0
"281656475","2.5.0","Jul 18, 2011","","","kitty1019",5,0
"281656475","2.5.0","Jul 18, 2011","","","Rj&e",4,0
"281656475","2.5.0","Jul 18, 2011","","","Robin Kinzer",5,0
"281656475","2.5.0","Jul 18, 2011","","","Roderick Palmer",5,0
ubuntu@ip-10-28-6-239:/data/csv$ s
I have tried different fixes but i can’t figure out the correct one. Any idea?
i don’t have a problem removing the backslash for no text or anything.
Oh, forgot to add, the file is 1.4GB big, so i can’t read all the file or just replace it with sed, since it’s just too big for my system.
In the “Details” part of
?scan(used byread.table,read.csvand so on):So it looks like your problem is the “escaped” quote
\"in that line causing the trouble – R expects an escaped quote for a CSV to be a double quote"", not a backslashed quote\".I think your best bet here is to replace escaped quotes with double quotes, be it using Linux or in R (R example below):
Then you can use
read.csvorscanlike before (note thetextConnection(txt)which converts the string into a file-like object forscanto use):Edit/Addition
Re OP’s comment – the file is 1.4GB and there are difficulties reading it all into R at once, so how to do the sanitizing?
Option 1
You appear to be on Linux, so you could use
sed:(Depending on where your data comes from, perhaps you could adjust the program that is outputting the data to output it in the format you require in the first place, but this is not always possible).
Option 2
If you are averse to using Linux or want an in-house R solution, use the
nparameter toreadLinesto only read in a few lines at a time: