I have an ffdf object that has ~ 100 million rows of which I’d like to create a sample with ~ 5 million rows.
I used the following code but am not having any luck:
> library(ffbase)
> library(ff)
> rowSamp1 <- c(1,3,5,7,9,11)
> ff1 <- ff(runif(20))
> ff2 <- ff(runif(20))
> ff3 <- ff(runif(20))
> ffdf1 <- ffdf(ff1, ff2, ff3)
> dim(ffdf1)
[1] 20 3
> ffdf2 <- ffdf(ffdf1[rownames(ffdf1) %in% rowSamp1,])
Error in as.hi.integer(x, maxindex = maxindex, dim = dim, vw = vw, pack = pack) :
NAs in as.hi.integer
Any suggestions?
ffdf1[bigsample(x=100000000, size=5000000, replace = FALSE), ]