This is a project which I completed earlier this year, after a previous attempt trying to beat 2048. The play strategy is relatively simple and can be followed and replicated by beginners in machine learning. All the code is at https://github.com/sn6uv/minesweeper.

This post demonstrates how to acheive good human performance on minesweeper using neural networks to predict mine locations.

# Implementing minesweeper

Minesweeper was implemented in Python. Each game state is represented as an instance of a Game object

see game.py for implementation details. The key ideas are to precompute the neighbors of each position and store the list of mines positions as a set for efficient lookup.

The first move is always safe, so mines are initialised after the first guess.

# Learning to play minesweeper

Given an input board the agent predicts the probability of finding a mine at each open location. It selects the lowest probability location for its next guess. This process repeats until until the game is over.

The Agent’s objective is to predict the locations of mines based on a ‘view’ of the game board.

Note that this is not actually an optimal strategy. A higher risk guess may reveal more information about the location of mines, thus leading to a shorter game, and a higher overall win-rate. Winning the game is approximated as not loosing each guess.

This agent (arrow above) is repesented as a Player object

The first move is always safe in minesweeper so the AI chooses at random to remove bias in the data derived from player initialisation.

The player accumulates data on previous games. This will be used for training the model. In order to beat minesweeper we will attempt to implement the predict_mines function using a neural network.

# View representation

Each tile in the game grid can be either guessed or unguessed. If it’s guessed there is either a bomb or a number of adjacent bombs $0, \dots, 8$. In total, each tile has 10 potential states.

If $h$ is the height of the board and $w$ is the width we define

$$n := height \times width$$

as the input size. Each tile was represented as a one-hot input vector, to give a total input size of $10 n$.

# Neural network architecture

The neural network used had 5 fully-connected layers with sizes $[10n, 20n, 10n, 5n, n]$ respectively. Each layer had a ReLU activation except for the final layer which used a sigmoid activation.

This works out to $455n^2 + 46n$ parameters.

The network $f$ takes an input game $x \in \{0,1\}^{10n}$ state and outputs $p \in \mathbb{R}^n$, the probability distribution of mines.

The loss function used was the cross-entropy loss with $L^2$ regularisation,

where $N$ is the total number of samples and $\hat{p}_i \in \{0,1\}^n$ is the true probabilities for training batch $i$.

This architecture was implemented as a Model object using TensorFlow.

# Training

The model was trained in batches. Metrics are stored in a small utility class:

Some care has to be taken when paritioning training and test sets, but otherwise training the model is relatively straightforward.

Metrics are printed at regular intervals to ensure that training is making progress but ultimately sucess is determined by win rate. see model.py for details.

In order to generate training data, the agent plays a series of games and then trains on the resulting game histories.

Old game histories are discarded to prevent overfitting. This process repeats iteratively until the desired win-rate is acheived, see learn_from_scratch.py for an example on how to train beginner difficulty to >50% win rate.

The following hyperparamers were used during training:

 symbol description value $\alpha$ Learning rate 1e-3 $\beta$ L^2 regularisation coefficient 1e-4 batch size 1024 test set fraction 1e-2 training epochs 1

These can be set in config.py.

# Results

 height width number of mines win rate training duration parameters 4 4 2 90% <1min 117216 5 5 3 80% 10min 285525 9 9 10 80% 3 days 2988981

The network was trained on core i7-6600U CPU @ 2.60GHz.

Training progress for beginner board.

This network was tested on larger networks, but the training rate was too slow to get any good results.

# Conclusion

This neural network acheives an 80% win rate on the beginner board (9x9 with 10 mines). Anecdotally, this is on par with a good human player.

Training this network is quite slow. Unsuprisingly the bottleneck is the game simulation in Python. Some attempt was made to optimise this but it could be improved significantly by switching to a faster language like C++ or running multiple simulaitons in parallel on a more powerful machine.

# Future work

CNNs

By flattening the input board to a $10 \times h \times w$ vector the network needs to learn the spatial relationships between input variables. A convolutional neural network (CNN) might generalise better, especially for larger input boards. Moreover, a CNN trained on one board size could also play on other boards, assuming a simlar mine density. A side-effect would be to reduce the model size.

Reinforcement learning

The goal is to optimise win rate; predicting mine locations is only useful towards that goal. A reinforcemnt learning approach, such as Q-learning might be more effective at increasing the win-rate, at the tradeoff of slower training rate.

# Try it yourself

Requires tensorflow and numpy.