I am using Boltzman exploration in Q-learning where I have at least 10 actions

Question

0

Asked: June 9, 20262026-06-09T08:25:51+00:00 2026-06-09T08:25:51+00:00

I am using Boltzman exploration in Q-learning where I have at least 10 actions

0

I am using Boltzman exploration in Q-learning where I have at least 10 actions in each state. I know that with only two actions, Boltzman exploration can be applied quite simply as follows:

Calculate pr1 and pr2 for the two actions with the Boltzman exploration equation.
Generate a random number r
Assuming pr1>pr2. If r<=pr1, take action corresponding to probability pr1. If r>pr1, take action corresponding to pr2.

However, how can I do this with 10 actions? At each decision step, I update the probabilities of all the actions. This gives me a probability distribution of all the actions where the probability of best action is highest. How do I select action in this case using the Boltzman exploration?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T08:25:52+00:00

Here is an excellent discussion of weighted random sampling: Darts, Dice, and Coins.

And here is my implementation of the Vose’s Alias method:

class WeightedRandom
{
    private alias : array[int];
    private prob  : array[double];

    private random : Random;

    public this(p : array[double], random : Random)
    {
        this.random = random;

        def n = p.Length;

        alias = array(n);
        prob  = array(n);

        def small = Queue(n);
        def large = Queue(n);

        def p = p.Map(_ * n : double);

        foreach (x in p with i)
            (if (x < 1.0) small else large).Enqueue(i);

        while (!small.IsEmpty && !large.IsEmpty)
        {
            def s = small.Dequeue();
            def l = large.Dequeue();
            prob[s]  = p[s];
            alias[s] = l;
            p[l] = p[l] + p[s] - 1;
            (if (p[l] < 1.0) small else large).Enqueue(l);
        }

        while (!large.IsEmpty)
            prob[large.Dequeue()] = 1.0;

        while (!small.IsEmpty)
            prob[small.Dequeue()] = 1.0;
    }

    public NextIndex() : int
    {
        def i = random.Next(prob.Length);
        if (random.NextDouble() < prob[i])
            i;
        else
            alias[i];
    }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using Boltzman exploration in Q-learning where I have at least 10 actions

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply