I have an artificial neural network which plays Tic-Tac-Toe – but it is not

Question

0

Editorial Team

Asked: May 19, 20262026-05-19T16:12:23+00:00 2026-05-19T16:12:23+00:00

I have an artificial neural network which plays Tic-Tac-Toe – but it is not

0

I have an artificial neural network which plays Tic-Tac-Toe – but it is not complete yet.

What I have yet:

the reward array “R[t]” with integer values for every timestep or move “t” (1=player A wins, 0=draw, -1=player B wins)
The input values are correctly propagated through the network.
the formula for adjusting the weights:

enter image description here

What is missing:

the TD learning: I still need a procedure which “backpropagates” the network’s errors using the TD(λ) algorithm.

But I don’t really understand this algorithm.

My approach so far …

The trace decay parameter λ should be “0.1” as distal states should not get that much of the reward.

The learning rate is “0.5” in both layers (input and hidden).

It’s a case of delayed reward: The reward remains “0” until the game ends. Then the reward becomes “1” for the first player’s win, “-1” for the second player’s win or “0” in case of a draw.

My questions:

How and when do you calculate the net’s error (TD error)?
How can you implement the “backpropagation” of the error?
How are the weights adjusted using TD(λ)?

Thank you so much in advance 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T16:12:23+00:00

If you’re serious about making this work, then understanding TD-lambda would be very helpful. Sutton and Barto’s book, “Reinforcement Learning” is available for free in HTML format and covers this algorithm in detail. Basically, what TD-lambda does is create a mapping between a game state and the expected reward at the game’s end. As games are played, states that are more likely to lead to winning states tend to get higher expected reward values.

For a simple game like tic-tac-toe, you’re better off starting with a tabular mapping (just track an expected reward value for every possible game state). Then once you’ve got that working, you can try using a NN for the mapping instead. But I would suggest trying a separate, simpler NN project first…

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an artificial neural network which plays Tic-Tac-Toe – but it is not

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply