I’m working on a dataset that consists of ~10^6 values which clustered into a

Question

0

Asked: June 6, 20262026-06-06T19:55:10+00:00 2026-06-06T19:55:10+00:00

I’m working on a dataset that consists of ~10^6 values which clustered into a

0

I’m working on a dataset that consists of ~10^6 values which clustered into a variable number of bins. In the course of my analysis, I am trying to randomize my clustering, but keeping bin size constant. As a toy example (in pseudocode), this would look something like this:

data <- list(c(1,5,6,3), c(2,4,7,8), c(9), c(10,11,15), c(12,13,14));
sizes <- lapply(data, length);
for (rand in 1:no.of.randomizations) {
    rand.data <- partition.sample(seq(1,15), partitions=sizes, replace=F)
}

So, I am looking for a function like “partition.sample” that will take a vector (like seq(1,15)) and randomly sample from it, returning a list with the data partitioned into the right bin sizes given already by “sizes”.

I’ve been trying to write one such function myself, since the task seems to be not so hard. However, the partitioning of a vector into given bin sizes looks like it would be a lot faster and more efficient if done “under the hood”, meaning probably not in native R. So I wonder whether I have just missed the name of the appropriate function, or whether someone could please point me to a smart solution that is around 🙂

Your help & time are very much appreciated! 🙂

Best,

Lymond

UPDATE:

By “no.of.randomizations” I mean the actual number of times I run through the whole “randomization loop”. This will, later on, obviously include more steps than just the actual sampling.

Moreover, I would in addition be interested in a trick to do the above feat for sampling without replacement.

Thanks in advance, your help is very much appreciated!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T19:55:12+00:00

Revised: This should be fairly efficient. It’s complexity should be primarily in the permutation step:

# A single step:
x <- sample( unlist(data)) 
list( one=x[1:4], two=x[5:8], three=x[9], four=x[10:12], five=x[13:16])

As mentioned above the “no.of.randomizations” may have been the number of repeated applications of this proces, in which case you may want to wrap replicate around that:

replic <- replicate(n=4, { x <- sample(unlist(data))
   list( x[1:4], x[5:8], x[9], x[10:12], x[13:15]) }  )

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a dataset that consists of ~10^6 values which clustered into a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply