I am reading the weka implementation on re-sampling an array based on a given

Question

0

Asked: June 15, 20262026-06-15T13:24:37+00:00 2026-06-15T13:24:37+00:00

I am reading the weka implementation on re-sampling an array based on a given

0

I am reading the weka implementation on re-sampling an array based on a given weight vector. After reading through the code, I am not sure what’s the algorithm underlying this implementation. In addition, I am quite confusing on the usage of these two lines of code:

  Utils.normalize(probabilities, sumProbs / sumOfWeights);

and

// Make sure that rounding errors don't mess things up
probabilities[numInstances() - 1] = sumOfWeights;

I do not know what they are used for. The following is the code copied from Weka

Instances weka::core::Instances::resampleWithWeights(Random random,double[] weights )       
{

if (weights.length != numInstances()) {
  throw new IllegalArgumentException("weights.length != numInstances.");
}
Instances newData = new Instances(this, numInstances());
if (numInstances() == 0) {
  return newData;
}
double[] probabilities = new double[numInstances()];
double sumProbs = 0, sumOfWeights = Utils.sum(weights);
for (int i = 0; i < numInstances(); i++) {
  sumProbs += random.nextDouble();
  probabilities[i] = sumProbs;
}
Utils.normalize(probabilities, sumProbs / sumOfWeights);

// Make sure that rounding errors don't mess things up
probabilities[numInstances() - 1] = sumOfWeights;
int k = 0; int l = 0;
sumProbs = 0;
while ((k < numInstances() && (l < numInstances()))) {
  if (weights[l] < 0) {
  throw new IllegalArgumentException("Weights have to be positive.");
  }
  sumProbs += weights[l];
  while ((k < numInstances()) &&
       (probabilities[k] <= sumProbs)) { 
  newData.add(instance(l));
  newData.instance(k).setWeight(1);
  k++;
  }
  l++;
}
return newData;

}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T13:24:38+00:00

The first code fragment:

Utils.normalize(probabilities, sumProbs / sumOfWeights);

just divides each element of probabilities by the second argument. This converts probabilities from an array that has maximum element of sumProbs to one that has a maximum element of sumOfWeights. The second piece of code:

probabilities[numInstances() - 1] = sumOfWeights;

just ensures that the last (maximum) element actually is sumOfWeights and wasn’t thrown off by some sort of rounding error.

EDIT Here’s the theory about how the entire method works. The first half (up to the declaration of k and l) generates probabilities as a vector of (not independent) random numbers that are increasing and the last of which is the sum of weights. This is a random partition of the interval [0, sumOfWeights]. Now the weights themselves are a partition of the same interval. Implicitly, each existing instance is assigned to one each element of the weight-based partition.

The second half of the method simply steps along the weights partition (using index l). It samples the l^th instance as many times as the random partition falls in the indicated weight partition. I realize that this explanation is a little awkwardly worded. Perhaps a picture of what’s going on will help:

0                                                   sumOfWeights
↓                                                       ↓

|     *   *         *       *               * *     *   * ← Random partition
|    ^      ^           ^      ^     ^     ^         ^  ^ ← Weights partition

   0     2        1        1       0     0       3     1  ← # of samples

The second half of the method simply counts how many random partition boundaries (denoted by *) are in each weight interval (bounded by ^). A little consideration should convince you that this is a valid method of randomly sampling with replacement according to the given weights.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am reading the weka implementation on re-sampling an array based on a given

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply