JM's blogMy humble personal blog. Hopefuly, the fact that it is humble in content will soon change.
http://jmoudrik.github.io/
Thu, 21 Jan 2016 15:54:12 +0000Thu, 21 Jan 2016 15:54:12 +0000Jekyll v2.4.0Predicting Go players' strength with Convolutional Neural Nets<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<p>Ok, as this is my first blog-post here, so I should probably say why did I
start. I want to use this platform to share what I do and maybe get some
feedback for my work as well (I am working on comments, until then, e-mails
are most welcome).</p>
<h2 id="introduction">Introduction</h2>
<p>Now about the actual subject. As probably everyone around nowadays,
I’ve been toying with deep learning recently. The domain of my interest
is <a href="https://en.wikipedia.org/wiki/Computer_Go">computer Go</a>
and since the Go board is essentially a 19x19
bitmap with spatial properties, convolutional neural nets are clear choice.
Most researchers in computer Go are trying to make strong computer programs
(which is a hard problem). Making strong programs means knowing what is the
strongest move in a given position, which makes for simple translation to
the language of convolutional networks. Since we also have a lot of <a href="http://gokifu.com">Go game records</a>
around, we have some ideas for research papers and fun.</p>
<p>Obviously, this idea has been around for some time now, but recently it
has started to be really hot; see <a href="clark2014">[Clark, Storkey 2014]</a> and
<a href="http://arxiv.org/abs/1412.6564">[Maddison et al. 2015]</a> for starters. Just recently,
a paper from FB research <a href="http://arxiv.org/abs/1511.06410">[Tian, Zhu 2015]</a> has made a great
deal of hype in the news, improving on the previous results. However the FB really
seems to have some people working on this hard, as their bot combining the Monte Carlo
Tree Search (prevalent technology in strong bots nowadays) with the CNN priors
<a href="http://www.weddslist.com/kgs/past/119/index.html">ended up third</a> on the first
tournament it played. Moreover it lost on time in both games, so lets see
if they can actually win once they improve the time management.</p>
<h2 id="strength-prediction">Strength Prediction</h2>
<p>In the past, I’ve been working on predicting Go player’s strength and playing
style (<a href="http://gostyle.j2m.cz">Gostyle Project</a>), so I wanted to try how
good are CNN’s there. The idea is to give the network a position and teach it
what are the players’ strengths instead of predicting the good move.
In Go the strength is measured in
<a href="https://en.wikipedia.org/wiki/Go_ranks_and_ratings">kyu/dan ranks</a>;
for us, imagine we have a scale of — say — 24 ranks. The ranks have ordering
(rank 1 is stronger player than rank 2), so regression seems like a good choice
to begin the experiments with.</p>
<h3 id="dataset">Dataset</h3>
<p>I used some 71,119 games from the <a href="https://www.gokgs.com/archives.jsp">KGS Archives</a>.
Each game has on average circa 190 moves, so in the end we have almost 14 million
pairs <script type="math/tex">(X, y)</script> for training. First we need to make the dataset. The <script type="math/tex">y</script>’s are clear,
we only need to rescale black and white’s strength. For <script type="math/tex">X</script>’s we need to define
and extract data planes from the games, here I used the following 13 planes:</p>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">plane</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_our_liberties</span> <span class="o">==</span> <span class="mi">1</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_our_liberties</span> <span class="o">==</span> <span class="mi">2</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_our_liberties</span> <span class="o">==</span> <span class="mi">3</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_our_liberties</span> <span class="o">>=</span> <span class="mi">4</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_enemy_liberties</span> <span class="o">==</span> <span class="mi">1</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_enemy_liberties</span> <span class="o">==</span> <span class="mi">2</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_enemy_liberties</span> <span class="o">==</span> <span class="mi">3</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="n">num_enemy_liberties</span> <span class="o">>=</span> <span class="mi">4</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="n">empty_points_on_board</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span> <span class="o">=</span> <span class="n">history_played_before</span> <span class="o">==</span> <span class="mi">1</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">=</span> <span class="n">history_played_before</span> <span class="o">==</span> <span class="mi">2</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">11</span><span class="p">]</span> <span class="o">=</span> <span class="n">history_played_before</span> <span class="o">==</span> <span class="mi">3</span>
<span class="n">plane</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="n">history_played_before</span> <span class="o">==</span> <span class="mi">4</span></code></pre></div>
<p>This is almost the source code in a tool I made (below).
Basically the right sides are numpy arrays with some simple domain knowledge (for
instance, planes 9 to 12 list the last 4 moves).
The planes are basically a simple extension of the
<a href="http://arxiv.org/abs/1412.3409">Clark, Storkey 2014</a> paper with the history moves and were proposed by
<a href="http://computer-go.org/pipermail/computer-go/2015-December/008324.html">Detlef Schmicker</a>.</p>
<p>I used my github project <a href="https://github.com/jmoudrik/deep-go-wrap/">deep-go-wrap</a>
which has a tool for making HDF5 datasets from Go game records and all the planes
necessary. Making a dataset is as hard as running:</p>
<div class="highlight"><pre><code class="language-bash" data-lang="bash">cat game_filenames <span class="p">|</span> sort -R <span class="p">|</span> ./make_dataset.py -l ranks -p detlef out.hdf</code></pre></div>
<h3 id="network">Network</h3>
<p>Now, the network. Since this is more like a proof-of-concept experiment, I went
with a fairly simple network to have fast training. I used a few convolutions with
a small dropout dense layer atop and 2 output neurons for strengths of the players.</p>
<ol>
<li>convolutional layer 128 times 5x5 filters, ReLU activation</li>
<li>convolutional layer 64 times 3x3 filters, ReLU activation</li>
<li>convolutional layer 32 times 3x3 filters, ReLU activation</li>
<li>convolutional layer 8 times 3x3 filters, ReLU activation</li>
<li>dense layer with 64 neurons, ReLU activation, 0.5 dropout</li>
<li>2 output neurons, linear activation</li>
</ol>
<p>I used the <a href="http://keras.io/">keras.io</a> library for implementation of the model,
which <a href="/static/20160114/keras_model.py">can be found here</a>.
The network was trained for just 2 epochs with <a href="http://arxiv.org/abs/1412.6980v8">Adam</a>
optimizer, because it converged pretty quickly. I have a bigger network
in training (using RMSProp), but it will take some time, so let’s have
a look on results in the meantime ;-)</p>
<h3 id="results">Results</h3>
<p>Basically, we now have a network which predicts strength of both players
from a single position (plus history of 4 last moves). This would be really
cool if it worked, as we’ve previously recommended at least a sample of 10 games
in our <a href="http://gostyle.j2m.cz/webapp.html">GoStyle webapp</a> to predict strength somewhat reliably.</p>
<p>So, how good this really simple first-shot network — trained for quite a short time,
without any fine-tuning — performs? And what would a good result be?
Lets have a look on dependency of error (difference between wanted and predicted
strength) on move number.</p>
<p><img src="/static/20160114/err_by_move.png" /></p>
<p>We can clearly see that the error is the highest at the beginning and end.
Because games of beginners usually look the same to games of strong players at the
beginning (first few moves are usually very similar) the first half of the statement is not really
surprising.
On the other hand, the fact that the error grows so steeply in the end is probably caused by the fact
that
there are only few very long games in the dataset (usual game takes about 250–300 moves).
Before discussing whether these numbers are any good, let’s have a look on another graph,
error by rank:</p>
<p><img src="/static/20160114/err_by_rank.png" /></p>
<p>This graph basically shows that hardest ranks to be predicted correspond to both very strong and
very weak players. A graph of a predictor which would always predict just the middle class would
look like a letter “V” with the minimum in the middle. This graph has more of a “U” shape,
which is good, because it means that the network is not only utilising statistical
distribution of target <script type="math/tex">y</script>’s but also has some understanding of the data.
Comparing with the naive V predictor
is also interesting in terms of error. Were the 24 values of <script type="math/tex">y</script> distributed
<a href="https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)">uniformly randomly</a>,
the standard deviation of the always-the-middle V predictor would be `</p>
<script type="math/tex; mode=display">\sigma = \sqrt{Var(U(0,24))} = \sqrt{\frac{24^2}{12}} = 6.93</script>
<p>On average, the network has <script type="math/tex">RMSE</script> (root of mean square error) or <strong>4.66</strong>. The <script type="math/tex">RMSE</script> has
a nice property that (under certain assumptions) it is an estimate of the <script type="math/tex">\sigma</script>.
So by comparing 4.66 and 6.93, we can say that the network actually does something useful.
This is of course a mediocre comparison and one would hope for the network to be much better
than the simplest possible reference.</p>
<h3 id="comparison-with-prior-work">Comparison With Prior Work</h3>
<p>In my recent paper <a href="http://arxiv.org/abs/1512.08969">Evaluating Go game records for prediction of player attributes</a>,
different features were extracted from games (samples of 10–40 them) and given
a good predictive model, it was possible to predict the strength with the following <script type="math/tex">RMSE</script>:</p>
<ul>
<li>2.788 Pattern feature</li>
<li>5.765 Local sequences</li>
<li>5.818 Border distance</li>
<li>5.904 Captured stones</li>
<li>6.792 Win/Loss statistics</li>
<li>5.116 Win/Loss points</li>
</ul>
<p>The results in the paper had a slightly bigger domain of 26 ranks instead of 24, but roughly
the numbers are still comparable. So <strong>4.66</strong> our brave new simplistic deep network has is better
than all but the dominating feature (extracted from at least 10 games), and this from just
one game position with history of size 4. Given that average game in the dataset is about 190
moves long, 10 games is 380 times more information than the network has.</p>
<p><strong>Cool indeed!</strong></p>
<h3 id="what-next">What next?</h3>
<ul>
<li>You have some ideas? I do. Stay tuned!</li>
</ul>
<!-- ref [Jekyll docs][jekyll-docs] -->
Fri, 15 Jan 2016 17:09:18 +0000
http://jmoudrik.github.io/post/2016/01/15/convolutional_neural_net_for_Go_strength_prediction.html
http://jmoudrik.github.io/post/2016/01/15/convolutional_neural_net_for_Go_strength_prediction.htmlpost