Whenever I read in a file using read.csv() with option header=T, the headers change in weird (but predictable) ways. A header name which ought to read "P(A<B)" becomes "P.A.B.", for instance:
> # when header=F:
> myfile1 <- read.csv(fullpath,sep="\t",header=F,nrow=3)
> myfile1
V1 V2 V3
1 ID Name P(A>B)
2 AB001 Alice 0.997
3 AB002 Bob 0.497
>
> # When header=T:
> myfile2 <- read.csv(fullpath,sep="\t",header=T,nrow=3)
> myfile2
ID Name P.A.B.
1 AB001 Alice 0.997
2 AB002 Bob 0.497
3 AB003 Charles 0.732
I tried to fix it like this, but it didn’t work:
> names(myfile2) <- myfile1[1,]
> myfile2
3 3 3
1 AB001 Alice 0.997
2 AB002 Bob 0.497
3 AB003 Charles 0.732
So then I tried to use sub() to write a function that would take any vector "arbitrary.lengths.here." and return a vector "arbitrary(lengths>here)", but I didn’t really get anywhere, and I started to suspect that I was making this problem more complicated than it had to be.
How would you deal with this problem of headers? Was I on the right track with sub()?
Set
check.names=FALSEinread.csv()From the help for
?read.csv:check.names
logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names) so that they are, and also to ensure that there are no duplicates.