Suppose I have:
R> str(data)
'data.frame': 4 obs. of 2 variables:
$ datetime: Factor w/ 4 levels "2011-01-05 09:30:00.001",..: 1 2 3 4
$ price : num 18.3 18.3 18.3 18.3
R> data
datetime price
1 2011-01-05 09:30:00.001 18.31
2 2011-01-05 09:30:00.321 18.33
3 2011-01-05 09:30:01.511 18.33
4 2011-01-05 09:30:02.192 18.34
When I try to load this into an xts object the timestamps are subtly altered:
R> x <- xts(data[-1], as.POSIXct(strptime(data$datetime, '%Y-%m-%d %H:%M:%OS')))
R> str(x)
An ‘xts’ object from 2011-01-05 09:30:00.000 to 2011-01-05 09:30:02.191 containing:
Data: num [1:4, 1] 18.3 18.3 18.3 18.3
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "price"
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
NULL
R> x
price
2011-01-05 09:30:00.000 18.31
2011-01-05 09:30:00.321 18.33
2011-01-05 09:30:01.510 18.33
2011-01-05 09:30:02.191 18.34
You’ll notice that the timestamps have been altered. The first entry now occurs at 09:30:00.000 instead of what the original data said, 09:30:00.001. The third and fourth rows are also incorrect.
What’s causing this? Am I doing something fundamentally wrong? I’ve tried various incantations to get the data into an xts object and they all seem to exhibit this behavior.
EDIT: Add sessionInfo()
R> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xts_0.8-2 zoo_1.7-4
loaded via a namespace (and not attached):
[1] grid_2.13.1 lattice_0.19-30 tools_2.13.1
EDIT 2: If I modify my source data to be microsecond precision as follows:
datetime,price
2011-01-05 09:30:00.001000,18.31
2011-01-05 09:30:00.321000,18.33
2011-01-05 09:30:01.511000,18.33
2011-01-05 09:30:02.192000,18.34
And then load it so I have:
R> test
datetime price
1 2011-01-05 09:30:00.001000 18.31
2 2011-01-05 09:30:00.321000 18.33
3 2011-01-05 09:30:01.511000 18.33
4 2011-01-05 09:30:02.192000 18.34
And then, finally, convert it into an xts object and set the index format:
R> x <- xts(test[,-1], as.POSIXct(strptime(test$datetime, '%Y-%m-%d %H:%M:%OS')))
R> indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
R> x
[,1]
2011-01-05 09:30:00.000999 18.31
2011-01-05 09:30:00.321000 18.33
2011-01-05 09:30:01.510999 18.33
2011-01-05 09:30:02.191999 18.34
You can see the effect as well. I was hoping that adding the extra precision would help, but unfortunately it does not.
EDIT 3: Please see @DWin’s answer for an end-to-end test case that reproduces this behavior.
EDIT 4: The behavior does not appear to be millisecond oriented. The following shows the same altered result of a microsecond resolution timestamp. If I change my input data to:
R> data
datetime price
1 2011-01-05 09:30:00.001001 18.31
2 2011-01-05 09:30:00.321001 18.33
3 2011-01-05 09:30:01.511001 18.33
4 2011-01-05 09:30:02.192005 18.34
And then create an xts object:
R> x <- xts(data[-1],
as.POSIXct(strptime(as.character(data$datetime), '%Y-%m-%d %H:%M:%OS')))
R> indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
R> x
price
2011-01-05 09:30:00.001000 18.31
2011-01-05 09:30:00.321001 18.33
2011-01-05 09:30:01.511001 18.33
2011-01-05 09:30:02.192004 18.34
EDIT 5: It would appear to be a floating point precision issue. Observe:
R> t <- as.POSIXct("2011-01-05 09:30:00.001001")
R> t
[1] "2011-01-05 09:30:00.001 CST"
R> as.numeric(t)
[1] 1294241400.0010008812
This exhibits the error behavior, and is consistent with the example in EDIT 4. However, using an example that didn’t show the error:
R> t <- as.POSIXct("2011-01-05 09:30:01.511001")
R> t
[1] "2011-01-05 09:30:01.511001 CST"
R> as.numeric(t)
[1] 1294241401.5110011101
It seems as if xts or some underlying component is rounding down rather than to the nearest?
It seems the problem is only in printing. Using the OP’s original
data: