I’ve got a column in a CSV file that looks like c("","1","1 1e-3") (i.e. white space seperated). I’m trying to run through all values, taking the sum() of values where there is at least one value and returning NA otherwise.
My code currently does something like this:
x <- c("","1","1 2 3")
x2 <- as.numeric(rep(NA,length(x)))
for (i in 1:length(x)) {
si <- scan(text=x[[i]],quiet=TRUE)
if (length(si) > 0)
x2[[i]] <- sum(si)
}
I’m struggling to make this fast; x is really a set of columns from a CSV file containing a few hundred thousand rows and thought it should be possible to do this in R.
(these are thinned samples from the posterior of a reversible jump MCMC algorithm, hence combining multiple values as the dimensionality changes throughout the file and I want useful columns).
This seems to perform a bit faster and may work for you.
This will return a zero instead of NA, but maybe that isn’t a deal breaker for you?
Let’s see how this scales to a larger problem: