I’m looking at the ‘Monte Carlo Tree Search’ algorithm’s ‘Upper Confidence Bounds’. C is

Question

0

Asked: May 27, 20262026-05-27T03:24:24+00:00 2026-05-27T03:24:24+00:00

I’m looking at the ‘Monte Carlo Tree Search’ algorithm’s ‘Upper Confidence Bounds’. C is

0

I’m looking at the ‘Monte Carlo Tree Search’ algorithm’s ‘Upper Confidence Bounds’.

C is a weight for exploration over exploitation.
score = wins / played
sum = wins + played
UCB = score + C * sqrt(naturalLog(parent's sum) / sum)

The issue occurs when played is 0. I’m considering these possibilities.

score = 0
Because the node has never won, although it's never lost either.

score = 0.5
Because the node's value is completly uncertain and 0.5 is half way.

Does anyone have an answer?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T03:24:24+00:00

Editorial Team

2026-05-27T03:24:24+00:00Added an answer on May 27, 2026 at 3:24 am

The first step in every bandit algorithm, including MCTS, is to pull every arm once. Since this would obviously result in exhaustive search if you do this at every node, you instead only use MCTS up to a fixed depth and use a roll-out policy for the rest. You can use a prior of course, but then you lose all the nice theoretical properties of the UCB algorithm, primarily logarithmic regret.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m looking at the ‘Monte Carlo Tree Search’ algorithm’s ‘Upper Confidence Bounds’. C is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply