I am new to R and am trying to read a public Google spreadsheet into an R data frame with numeric columns. My problem seems to be that the exported spreadsheet has commas in large numbers, such as “13,061.422”. The read.csv() function treats this as a factor. I tried stringsAsFactors=FALSE and colClasses=c(rep(“numeric”,7)) but neither worked. Is there a way to coerce the values with commas and decimals to numeric values, either within read.csv() or afterwards when they are treated as Factors in the R dataframe? Here is my code:
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE) #ssl.verifypeer=FALSE gets around certificate issues I don't understand.
fullmatrix <- read.csv(textConnection(myCsv))
str(fullmatrix)
which results in:
'data.frame': 4927 obs. of 7 variables:
$ wave. : Factor w/ 4927 levels "1,000.8900","1,002.8190",..: 4875 4874 4873 4872 4871 4870 4869 4868 4867 4866 ...
$ wavelength : Factor w/ 4927 levels "1,000.074","1,000.267",..: 1 2 3 4 5 6 7 8 9 10 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
Thanks for any help! I am new to R, so guessing (hoping) this is an easy one!
Yes. Two methods. The easiest to understand at first is probably just to is
as.is=TRUEto preserve them as character vectors and then usegsubto remove the commas and any currency symbols before converting to numeric. The second is a bit more difficult, but I think more kewl. Create an as-method for the format you are using. Then you can usecolClassesto do it in one step.I see @EDi already did version #1 (using
stringsAsFactorsrather thanas.is, so I will document strategy #2:as-methods are coercive. There are many such methods in base R, such as
as.list,as.numeric,as.character. In each case they attempt to take input that is in one mode and make a sensible copy of that in a different mode. For instance, it makes sense to coerce a matrix to a dataframe because they both have two dimensions. It makes a bit less sense to coerce a dataframe to a matrix (but it does succeed with loss of all the attributes of the columns and coercion to a common mode.)In the present case I am taking a character string as input, removing any commas, and coercing the character values to numeric. Then I use
read.table‘s ( in this case by way ofread.csv) ‘colClasses’ argument to dispatch to the as-method I registered withsetAs. You may want to go to thehelp(setAs)page for more details. The S4 class system confuses a lot of people, me included. This is about the only area of success I have had with S4 methods.