So I was assigned the problem of writing a 5x5x5 tic-tac-toe player using a genetic algorithm. My approach was to start off with 3×3, get that working, and then extend to 5×5, and then to 5x5x5.
The way it works is this:
-
Simulate a whole bunch of games, and during each turn of each game, lookup in a corresponding table (X table or O table implemented as a c++ stdlib maps) for a response. If the board was not there, add the board to the table. Otherwise, make a random response.
-
After I have complete tables, I initialize a bunch of players (each with a copy of the board table, initialized with random responses), and let them play against each other.
- Using their wins/losses to evaluate fitness, I keep a certain % of the best, and they move on. Rinse and repeat for X generations, and an optimal player should emerge.
For 3×3, discounting boards that were reflections/rotations of other boards, and boards where the move is either ‘take the win’ or ‘block the win’, the total number of boards I would encounter were either 53 or 38, depending on whether you go first or second. Fantastic! An optimal player was generated in under an hour. Very cool!
Using the same strategy for 5×5, I knew the size of the table would increase, but did not realize it would increase so drastically. Even discounting rotations/reflections and mandatory moves, my table is ~3.6 million entries, with no end in sight.
Okay, so that’s clearly not going to work, I need a new plan. What if I don’t enumerate all the boards, but just some boards. Well, it seems like this won’t work either, because if each player has just a fraction of possible boards they might see, then they are going to be making a lot of random moves, clearly steering in the opposite direction of optimality.
What is a realistic way of going about this? Am I going to be stuck using board features? The goal is to hard-code as little game functionality as possible.
I’ve been doing research, but everything I read leads to min/max with A-B pruning as the only viable option. I can certainly do it that way, but the GA is really cool, my current method is just exceeding reality a bit here.
EDIT Problem has been pretty much solved:
Using a similarity function that combines hamming distance of open spaces, the possible win conditions, and a few other measures has brought the table down to a very manageable 2500 possibilities, which a std::map handles in a fraction of a second.
My knowledge of GA is pretty limited, but in modeling board configurations, aren’t you asking the wrong question? Your task isn’t to enumerate all the possible winning configurations — what you’re trying to do is to find a sequence of moves that leads to a winning configuration. Maybe the population you should be looking at isn’t a set of boards, but a set of move sequences.
Edit: I wasn’t thinking so much of starting from a particular board as starting from an empty board. It’s obvious on a 3×3 board that move sequences starting with (1,1) work out best for X. The important thing isn’t that the final board has an X in the middle, it’s that the X was placed in the middle first. If there’s one or more best first moves for X, maybe there’s also a best second, third, or fourth move for X, too? After several rounds of fitness testing and recombining, will we find that X’s second move is usually the same, or is one of a small set of values? And what about the third move?
This isn’t minimax because you’re not looking for the best moves one at a time based on the previous state of the board, you’re looking for all the best moves at the same time, hoping to converge on a winning strategy.
I know this doesn’t solve your problem, but if the idea is to evolve a winning strategy then it seems natural that you’d want to look at sequences of moves rather than board states.