Building an AI to play 2048 in python. Part 1.

by on January 24, 2015

The other day my wife got a 8192 tile while playing 2048, total score of 167,848. Because there is no way I’m going beat that myself, I need to write a program to do it for me. There are a number of excellent articles written about doing this. I first saw one done in javascipt by Matt Overlan posted on hacker news. More recently I stumbled across a great article using java by Vasilis Vryniotis. Both articles use the same basic strategy.

  1. Come up with some a number of metric functions of the board state that you believe correlate with good / bad board states. We will call these functions fi(b), where i is the function index and b is the board state.
  2. Use a look ahead algorithm to train weighting coefficients, xi, for these functions, such that the heuristic function, H, defined by the linear combinations of metrics functions, Hj = Σ xi * fi(bj), suggest moves that result in the highest possible score while looking ahead j moves.

While this is fairly straightforward it requires that you have some vague idea how to play the game when you construct your metric functions. If we assume there is some sort of plutonic ideal heuristic function, P, such that P(b) always suggests the best possible move when applied to the current board state, then the point of the procedure is to approximate P with a function H. At first glance it seems like we are approximating P with a linear combination of fis but the look ahead part of the algorithm makes it a very non-linear approximation.

To conceptually simplify the situation, I’m starting by implementing the algorithm described in chapter 1 of Tom Mitchell’s book for checkers. This constructs H with a simple linear combination of fis. The algorithm then defines parameter updates by:

xi -> xi + eta * (H(boardj+1)  – H(boardj)* xi

Where H(boardj+1) is defined as the games score when the game is lost, I’ve decided to weight each game equally some I’m averaging over all the moves in a game instead of summing all the deltas. While there are some disadvantages to doing it this way it has the advantage of giving some semblance of intuitive meaning to the weights. For the simple case of only a bias parameter of f(b) = 1, so that H = x * 1, then all H(boardj+1) are equal except the final one. The update function simplifes to

xi -> xi + mean( eta * (score – xi) * xi )-> xi + eta *(score – xi) * x / n_moves

then if xi = score, there is no update. Once we have converged, if xi > score then the game will have been lost earlier then usual, so xi will be decrease but because we are weighting all games equally n_moves will be smaller, so there will be a larger update to xthan for games where xi < score. On the other hand, the distribution isn’t normal, and consequently high scores have the potential to be much higher then low scores.

Instead of trying to explain this in the abstract lets makes some graphs. After implementing 2048 in python I built a simple drive and metrics framework and set about simulating. The code is posted at https://github.com/TristanMatthews/python-2048.

After running 10k games with only a bias parameter, we get a histogram of scores in figure 1.

Figure 1, Normalized histogram of scores after 10,000 games. Using 100 bins so that the height of each bin represents the probability of getting that score.

My very simple algorithm breaks heuristic ties by choosing the first legal move in the order Left, Right, Up, then Down. Because our heuristic always returns 1 (times the scaling value) there is always a 4 way tie, so the AI simply tries to move Left, then Right if that fails and so on. The graph of the training value, x1, is shown in figure 2. A large eta of 0.1 was use, so there is a fair amount of noise in x1 after convergence.

Figure 2. Value of x1 parameter while training over 10,000 games with initial guess of x1 = 1.0 and eta = 0.1. Its seen that convergence is reached after ~2.5k games, but the large eta causes the parameter to still bounce around with a fair amount of noise.

Examining the final 1000 games we see:

  • mean(x1) = 640.59
  • mean(score) = 831.94
  • median(score) = 720.00

While you might expect mean(x1) ~= mean(score) it shouldn’t for the two reasons described above. All games being weighted equally has the result of down weighting high scoring, longer, games, and the distribution seen in figure 1 is clearly not gaussian. In fact its a fairly decent poisson distribution, which clearly has a very different mean than a normal distribution.

Figure 3. Poisson distribution fitted to normalize histogram of game scores. Fit parameters are 0.00366, 2.57, and, 270.8 for Amplitude, Lambda and Sigma.

While I’m sure I could work through the math to predict the converged value of x1, that doesn’t seem very productive. Our initial take aways are:

  1. Implemented a simple AI for playing 2048
  2. The ideal production function P is being modeled by a linear combination of simpler functions.
  3. The coefficient of the bias function converges to a value that is fairly simply related to the score.

Next we will begin slowly adding more functions and see if we can understand them as clearly

-Tristan Matthews

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>