Saturday, April 6, 2013

How fit is the fitness function

So far I tuned my engine by solving EPD positions. The final versions all solved more positions than the original one. But solving more positions is not the ultimate goal. Finally the engine shall play better chess.

To test that I ran a small tournament between four different versions of iCE.   

ice.04 is the base version that uses manual picked weights (reasonable ones but not tuned)
ice.4109 used a training set of 4109 filtered positions  
ice.175000 used a training set of 175000 filtered positions
ice.match plays games among the engines within a generation


Here are the results of a 6000 games round robin tournament.

Rank Name         Elo    +    - games score oppo. draws
   1 ice.match     52   11   11  3000   60%   -17   29%
   2 ice.04        14   11   11  3000   53%    -5   25%
   3 ice.4109      -9   11   11  3000   48%     3   27%
   4 ice.175000   -57   11   11  3000   39%    19   24%


Hmm !

Using an EPD solver as fitness function seems not the best choice. That is not a problem of the genetic algorithm itself. It solved what it was told to solve.

The final version of ice.4109 solved 2811 positions out of 4109. The base version of ice.04 only solved 2437 positions. The algorithm definitely found a better solution, unfortunately not for the problem I have.

On the other hand the more complex fitness function that plays games really seems to get the job done, even with a quick and dirty run that was only meant as reference in this case.

So either my approach with EPDs is not good, maybe a 2 ply search per position is not enough (in real games depending on time control iCE searches between 8 and 18 plies in the mid game) or the correlation between single position performance and game play is not strong enough.

It also shows that using more positions doesn't help. This is really surprising.

I still think the EPD approach has potential if it is maybe changed a bit. Things that come into mind here are
  • Using a higher search depth to solve each position in the hope to pickup more positional concepts in the then larger tree. But this slows down the algorithm so maybe the population size must be lowered then.   
  • Modify the fitness function. Currently I award 1 point for a position solved and 0 points otherwise. I could try to award more points for a quiet solution because they are harder to find.
  • Splitting the training data into different sets (instead of 1 file with 100k positions make it 50 files with 2k positions) and  give each generation a different file. This might prevent the algorithm to converge towards a very specific weight set to solve one file and leads to a more general solution instead, which is stronger in game play.
So this area is far from being fully exploited. For the moment I will put more focus on the tournament fitness to generate a stronger base version with a serious run. I will then add new terms to my evaluation which requires retuning anyway.

So more to come ...