The other day my wife got a 8192 tile while playing 2048, total score of 167,848. Because there is no way I’m going beat that myself, I need to write a program to do it for me. There are a number of excellent articles written about doing this. I first saw one done in javascipt by Matt Overlan posted on hacker news. More recently I stumbled across a great article using java by Vasilis Vryniotis. Both articles use the same basic strategy.
- Come up with some a number of metric functions of the board state that you believe correlate with good / bad board states. We will call these functions fi(b), where i is the function index and b is the board state.
- Use a look ahead algorithm to train weighting coefficients, xi, for these functions, such that the heuristic function, H, defined by the linear combinations of metrics functions, Hj = Σ xi * fi(bj), suggest moves that result in the highest possible score while looking ahead j moves.
While this is fairly straightforward it requires that you have some vague idea how to play the game when you construct your metric functions. If we assume there is some sort of plutonic ideal heuristic function, P, such that P(b) always suggests the best possible move when applied to the current board state, then the point of the procedure is to approximate P with a function H. At first glance it seems like we are approximating P with a linear combination of fis but the look ahead part of the algorithm makes it a very non-linear approximation.
To conceptually simplify the situation, I’m starting by implementing the algorithm described in chapter 1 of Tom Mitchell’s book for checkers. This constructs H with a simple linear combination of fis. The algorithm then defines parameter updates by:
xi -> xi + eta * (H(boardj+1) – H(boardj)* xi
Where H(boardj+1) is defined as the games score when the game is lost, I’ve decided to weight each game equally some I’m averaging over all the moves in a game instead of summing all the deltas. While there are some disadvantages to doing it this way it has the advantage of giving some semblance of intuitive meaning to the weights. For the simple case of only a bias parameter of f(b) = 1, so that H = x * 1, then all H(boardj+1) are equal except the final one. The update function simplifes to
xi -> xi + mean( eta * (score – xi) * xi )-> xi + eta *(score – xi) * xi / n_moves
then if xi = score, there is no update. Once we have converged, if xi > score then the game will have been lost earlier then usual, so xi will be decrease but because we are weighting all games equally n_moves will be smaller, so there will be a larger update to xi than for games where xi < score. On the other hand, the distribution isn’t normal, and consequently high scores have the potential to be much higher then low scores.
Instead of trying to explain this in the abstract lets makes some graphs. After implementing 2048 in python I built a simple drive and metrics framework and set about simulating. The code is posted at https://github.com/TristanMatthews/python-2048.
After running 10k games with only a bias parameter, we get a histogram of scores in figure 1.
My very simple algorithm breaks heuristic ties by choosing the first legal move in the order Left, Right, Up, then Down. Because our heuristic always returns 1 (times the scaling value) there is always a 4 way tie, so the AI simply tries to move Left, then Right if that fails and so on. The graph of the training value, x1, is shown in figure 2. A large eta of 0.1 was use, so there is a fair amount of noise in x1 after convergence.
Examining the final 1000 games we see:
- mean(x1) = 640.59
- mean(score) = 831.94
- median(score) = 720.00
While you might expect mean(x1) ~= mean(score) it shouldn’t for the two reasons described above. All games being weighted equally has the result of down weighting high scoring, longer, games, and the distribution seen in figure 1 is clearly not gaussian. In fact its a fairly decent poisson distribution, which clearly has a very different mean than a normal distribution.
While I’m sure I could work through the math to predict the converged value of x1, that doesn’t seem very productive. Our initial take aways are:
- Implemented a simple AI for playing 2048
- The ideal production function P is being modeled by a linear combination of simpler functions.
- The coefficient of the bias function converges to a value that is fairly simply related to the score.
Next we will begin slowly adding more functions and see if we can understand them as clearly