I’ve imported a csv file with lots of columns and sections of data.
v <- read.csv2('200109.csv', header=TRUE, sep=',', skip='6', na.strings=c(''))
The layout of the file is something like this:
Dataset1 time, data, ..... 0 0 0 <NA> 0 0 Dataset2 time, data, ..... 00:00 0 0 <NA> 0 0
(The headers of the different datasets is exactly the same.
Now, I can plot the first dataset with:
plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type='l')
I am curious if there is a better way to:
-
Get all the numbers read as numbers, without having to convert.
-
Address the different datasets in the file, in some meaningfull way.
Any hints would be appreciated. Thank you.
Status update:
I haven’t really found a good solution yet in R, but I’ve started writing a script in Lua to seperate each individual time-series into a seperate file. I’m leaving this open for now, because I’m curious how well R will deal with all these files. I’ll get 8 files per day.
What I personally would do is to make a script in some scripting language to separate the different data sets before the file is read into R, and possibly do some of the necessary data conversions, too.
If you want to do the splitting in R, look up
readLinesandscan–read.csv2is too high-level and is meant for reading a single data frame. You could write the different data sets into different files, or if you are ambitious, cook up file-like R objects that are usable withread.csv2and read from the correct parts of the underlying big file.Once you have dealt with separating the data sets into different files, use
read.csv2on those (or whicheverread.tablevariant is best – if those are not tabs but fixed-width fields, seeread.fwf). If<NA>indicates ‘not available’ in your file, be sure to specify it as part ofna.strings. If you don’t do that, R thinks you have non-numeric data in that field, but with the rightna.strings, you automatically get the field converted into numbers. It seems that one of your fields can include time stamps like00:00, so you need to usecolClassesand specify a class to which your time stamp format can be converted. If the built-inDateclass doesn’t work, just define your owntimestampclass and anas.timestampfunction that does the conversion.