To test that I ran a small tournament between four different versions of iCE.
ice.04 is the base version that uses manual picked weights (reasonable ones but not tuned)
ice.4109 used a training set of 4109 filtered positions
ice.175000 used a training set of 175000 filtered positions
ice.match plays games among the engines within a generation
Here are the results of a 6000 games round robin tournament.
Rank Name Elo + - games score oppo. draws
1 ice.match 52 11 11 3000 60% -17 29%
2 ice.04 14 11 11 3000 53% -5 25%
3 ice.4109 -9 11 11 3000 48% 3 27%
4 ice.175000 -57 11 11 3000 39% 19 24%
Using an EPD solver as fitness function seems not the best choice. That is not a problem of the genetic algorithm itself. It solved what it was told to solve.
The final version of ice.4109 solved 2811 positions out of 4109. The base version of ice.04 only solved 2437 positions. The algorithm definitely found a better solution, unfortunately not for the problem I have.
On the other hand the more complex fitness function that plays games really seems to get the job done, even with a quick and dirty run that was only meant as reference in this case.
So either my approach with EPDs is not good, maybe a 2 ply search per position is not enough (in real games depending on time control iCE searches between 8 and 18 plies in the mid game) or the correlation between single position performance and game play is not strong enough.
It also shows that using more positions doesn't help. This is really surprising.
I still think the EPD approach has potential if it is maybe changed a bit. Things that come into mind here are
- Using a higher search depth to solve each position in the hope to pickup more positional concepts in the then larger tree. But this slows down the algorithm so maybe the population size must be lowered then.
- Modify the fitness function. Currently I award 1 point for a position solved and 0 points otherwise. I could try to award more points for a quiet solution because they are harder to find.
- Splitting the training data into different sets (instead of 1 file with 100k positions make it 50 files with 2k positions) and give each generation a different file. This might prevent the algorithm to converge towards a very specific weight set to solve one file and leads to a more general solution instead, which is stronger in game play.
So more to come ...