I am trying to read this file (3.8mb) using its fixed-width structure as described in the following link.
This command:
a <- read.fwf('~/ccsl.txt',c(2,30,6,2,30,8,10,11,6,8))
Produces an error:
line 37 did not have 10 elements
After replicating the issue with different values of the skip option, I figured that the lines causing the problem all contain the “#” symbol.
Is there any way to get around it?
As @jverzani already commented, this problem is probably the fact that the # sign often used as a character to signal a comment. Setting the
comment.charinput argument ofread.fwfto something other than # could fix the problem. I’ll leave my answer below as a more general case that you can use on any character that causes problems (e.g. the'sin the Dutch city name's Gravenhage).I’ve had this problem occur with other symbols. The approach I took was to simply replace the # by either nothing, or by a character which does not generate the error. In my case it was no problem to simply replace the character, but this might not be possible in your case.
So my approach would be to delete the symbol that generates the error, or replace by another character. This can be done using a text editor (find and replace), in an R script, or using some linux tools called
grepandsed. If you want to do this in an R script, usescanorreadLinesto read the lines. Once the text is in memory, you can usesubto replace the character.If you cannot replace the character, I would try the following approach: replace the character by a character that does not generate an error, read it into R using
read.fwf, and finally replace the character by the # character.