Stefan Stanojevic, Kevin Qualls

DATA 2040: Deep Learning and Advanced Topics in Data Science

Hello! You have reached our website for our DATA 2040 Final project!

We are graduate students at Brown University building an AI version of Gomoku.

To view the machine-learning techniuqes implemented for this project, check out the following Github Repository. Also below are blog posts that describe the implementations.

Understanding the Playing Field

Initial Blog Post - Stefan Stanojevic, Kevin Qualls

For this project, we set out a goal of implementing algorithms something akin to Google’s DeepMind to devise an AI capable of playing the game of Gomoku at a competitive level.

Similar to Connect Four, Gomoku is like Connect Five (Go means 5 in Japanese, and moku means pieces [1] ), but played on a horizontal Go board. An illustration is shown below (Image adapted from [2]).

Screen Shot 2020-04-13 at 8 59 59 PM

Fig. 1: Gomoku is Played on a 15 x 15 Board

There are ample documentations online for how to build a Connect Four AI, as the game is relatively simple and requires less computational power [3]. On the other hand, documentation for how to build an AI Gomoku is sparse, likely due to its abstract strategy on a 15 x 15 gameboard setting [2]. Rather than seeing this as a setback, we welcomed the challenge to deepen our understanding of neural networks and machine learning AI.

Our starting goal was to simply train a neural network to predict the next move a skilled human player would make in Gomoku. In order to do so, we’ve started with a dataset of competitive games pieced together from different sources on [4], mostly from Russian tournament archives. We’ve managed to paste together around 11000 games overall. The game entries came in the following form:

1997,382=[marik,om-by,-,88FFFE98798A6A975B4C59999A7BA86C5D5C3C7A4B896BA7B6A99687,?,?]

This entry first specifies the year of the competion, the players and the winner (- corresponds to the second player winning). Then, the long string of penta-decimal numbers specifies the board coordinates of Gomoku moves. The sequence of moves in this game of Gomoku is shown in the following figure. Note that while no player has managed to connect 5 tokens yet, the white has already won by constructing two unrestricteed sequences of 3 tokens (26,4,28 and 28,8,24), intersecting at token 28.

Screen Shot 2020-04-13 at 8 59 59 PM

Fig. 2: Sequence of Moves in an Example Game

Next, we turned this game string into a sequence of 28 images representing the states of the board at different times during the game. Those would correspond to inputs to our neural network. The output was a single number specifying one of 15^2 = 225 possible next moves. Some additional preprocessing included removing the duplicate board game states from the dataset. This was done by first sorting the list of board states, and then iterating through this dataset and collecting neighboring identical boards. Tokens were one-hot encoded, with (1,0,0) corresponding to first player’s token, (0,1,0) corresponding to second player’s token and (0,0,1) corresponding to empty space.

Since this is essentially an image classification task, it makes sense to try to use a convolutional neural network. The neural network architecture we used came from [2], and took the following form:

model

Fig. 3: Neural Network of Example Game from Fig 2

This neural network achieved a decent validation accuracy of around 55% pretty quickly, as shown in the following graphs:

Screen Shot 2020-05-12 at 3 44 43 PM

Fig. 4: Accuracies and Losses of Example Game

The model is shown to be overfitted, as the accuracies don’t sync with one another. Compared to the training data set, the validation data set shows more stability, as the accuracy and loss number keep a relatively horizontal shape.

Reinforcement Learning Approach

We have now obtained a neural network capable of imitating human players by predicting their next move, and plan to use this to jump-start the training of our AI Gomoku player using DeepMind’s reinforcement learning algorithm. Let us briefly describe how this would work.

In the language of DeepMind, our model is a “policy head”, advising AI which next moves to take under consideration. Another quantity that a player needs to consider is the “value” of board states, roughly the measure of how desirable they are. Our AI player can then perform a number of simulations of possible games guided by its policy and value estimates (something called “Monte Carlo Tree Search”), and decide what move to make based on the result of those simulations. We can thus play out a number of AI vs AI games, and train our neural network on the replays. This results in a very large training set of games between progressively more advanced players, which given sufficient computational power can be used to train our neural network to reach an expert level in Gomoku.

References

[1] Rules for Gomoku - http://www.opengames.com.ar/en/rules/Gomoku

– Gives an overview of origins of Gomoku as well as the rules of the game.

[2] Shao, Kun & Zhao, Dongbin & Tang, Zhentao & Zhu, Yuanheng. (2016). Move prediction in Gomoku using deep learning. 292-297. 10.1109/YAC.2016.7804906. - https://www.researchgate.net/publication/312325842_Move_prediction_in_Gomoku_using_deep_learning

– Describes methodology of how to predict moves in Gomoku, using a convolutional neural network model.

[3] From-scratch implementation of AlphaZero for Connect4 - https://towardsdatascience.com/from-scratch-implementation-of-alphazero-for-connect4-f73d4554002a

– Describes how to implement Google DeepMind’s AlphaZero approach for Connect4. Methodology can be applied to Gomoku.

[4] Gomoku datasets http://mostovlyansky.narod.ru/iedown.html

– Archives datasets of Gomoku games. Data is stored in a .bdt file.

[5] AlphaGomoku: An AlphaGo-based Gomoku Artificial Intelligence using Curriculum Learning - Zheng Xie, Xing Yu Fu, Jin Yuan Yu, Likelihood Lab, https://arxiv.org/pdf/1809.10595.pdf

– Shows how to implement curriculum learning - a technique that builds the AI Gomoku’s strategy and knowledge of the game through progressively difficult tasks.

…

Some Attempts at Self-Play Reinforcement Learning

Midway Blog Post - Stefan Stanojevic, Kevin Qualls

In our initial blog post, we wrote about training a neural network on the dataset of Gomoku games, in order to predict the next move a human player would make. Since then, we have taken our project a step further and coded an AI Gomoku player that can gradually improve its skill through self-play and reinforcement learning.

Since we were curious about whether our neural network has actually learned important elements of the game or not, we decided to quickly code a self-playing module and visualize its performance, before fully implementing the AlphaZero algorithm. We used our trained “policy head”, giving us the probability distribution over possible moves, to iteratively generate the next move until one of the AI players was in the position to win by connecting 5 tokens. Python’s ipywidget library proved to be very useful for visualizing the games, specifically its objects interact and Play. You can see one of the sample games below:

Fig. 5: Sample Game of AI Gomoku Playing Against Itself

Predictably, our AI is pretty bad at playing the game, as it is missing key components - the “value head” evaluating the chance of winning for different board states, as well as the ability to simulate the future. In order to rectify the first problem, we went back to our dataset of human games. Keeping track of the winner of each game, we assigned a score of +1, -1 or 0 (in case of a draw) to each board state of a given game and averaged those out over the dataset. Then, a neural network of the same architecture as in the initial blog post (except the final layer, adopted to the new regression task) is trained to predict the board state value. We obtain a not great, not terrible performance as seen in the plot below:

Fig 6 final project

Fig. 6: Model Performance of Predicting the Winner for Each Game

As we can see, the best validation accuracy is achieved very early in the training process, and there is significant overfitting later on.

At this point, we were in a position to try out a slightly more sophisticated algorithm. Our agent can decide on which move to make considering the value functions of different moves, as well as its probability distribution over the space of moves. The first one is related to “exploitation” of its knowledge, more useful in the later stages of the game, and the second one to “exploration” of possibilities, in theory useful for innovation early in the game.

We wanted to emulate the AlphaZero algorithm, which worked in the following way. Prior to making each move, simulations of the remainder of the game are made. Since the space of possible games is generally way too large to efficiently cover with a search algorithm, the search is done by looking at a set of the likely games randomly sampled using the probabilities from “policy head” and values from “value head”. While AlphaGo does around 1000 simulations at each step, even a much smaller number of simulations looks too computationally intensive for us, with each simulation taking on the order of a minute to finish.

Adjustments to Consider for Reinforcement Learning Approach

We feel that with more computational power, we can use reinforcement learning to better train an AI Gomoku. We also would like to see if reinforcement learning can lead to the AI learning any unconventional strategies that can help it be more successful. For example, in our course lecture, we learned how reinforcement learning helped an AI develop a strategy to get ~20% more points in the game CoastRunners 7, without the AI having to complete the game [8]. Additionally, reinforcement learning also helped an AI rack up more points in the game Atari to the point where the bottom paddle didn’t have to move itself [8]. Ultimately, it would require more time and GPU power for our AI Gomoku study a pattern like this.