I’m translating R code to c++ and I’d like to find an equivalent (optimal) structure which would allow the same kind of operations than a data frame, but in c++.
The operations are :
- add elements (rows)
- remove elements (rows) from index
- get the index of the lowest value
e.g. :
a <- data.frame(i = c(4, 9, 3, 1, 8, 2, 7, 10, 6, 6),
j = c(8, 8, 8, 4, 3, 9, 1, 4, 8, 9) ,
v = c(1.9, 18, 1.3, 17, 1.5, 14, 11, 1.4, 18, 2.0),
o = c(3, 3, 3, 3, 1, 2, 1, 2, 3, 3))
a[which.min(a$v), c('i', 'j')] # find lowest v value and get i,j value
a <- a[-which.min(a$v)] # remove row from index
a <- cbind(a, data.frame(i = 3, j = 9, v = 2, o = 2)) # add a row
As I’m using Rcpp, Rcpp::DataFrame might be an option (I don’t know how I would which.min it however), but I guess it’s quite slow for the task as these operations need to be repeated a lot and I don’t need to ship it back to R.
EDIT:
Target. Just to make it clear the goal here is to gain speed. It is the obvious reason why one would translate code from R to C++ (there might be others, that’s why I clarify). However, maintenance and easy implementation comes second.
More precision on the operations. The algorithm is: add lots of data to the array (multiple lines), then extract the lowest value and delete it. Repeat.
That’s why I wouldn’t go for a sorted vector, but instead always search the lowest data on demand as the array is updated (addition) frequently. I think it’s faster, but maybe I’m wrong.
I think a vector of vectors should do what you want. You would need to implement the min-finding manually (two nested loops), which is the fastest you can do without adding overhead.
You can speed up the min-finding by keeping track of the position of the smallest element in each row along with the row.