This is fundamentally different than my former approach were I measured position solving performance
- It is computationally much more expensive
- It is not reproducible. If the best individual is determined in a generation chances are very high a different individual would be selected if the calculation would be repeated. This is caused by the high degree of randomness involved when playing small amount of games.
- There is no guarantee that the real best individual is winning a generation tournament. But chances are that at least a good one wins.
I call them evol-1 and evol-2.
Runtime in hrs 185 363
Generations 1.100 1.200
Total Games 184.800 362.345
EPDs solved* 2.646 2.652
*The number of solved positions out of a test set of 4.109 positions. This number is given to set this GA in relation to the previous test where the number of solved positions was the fitness criteria. Both solutions are better than the un-tuned version that scored only 2.437 points but worse than the version optimized towards solving this set that scored 2.811 points.
|Entropy development of evol-1 and evol-2|
|Comparison of the ability to find correct positions in an EPD file|
And finally the real test: A direct round robin comparison between the two and the base version.
And the OSCAR goes to: evol-2
Rank Name Elo + - games score oppo. draws
1 ice.evol-2 89 5 5 12000 58% 35 32%
2 ice.evol-1 69 5 5 12000 54% 44 33%
3 ice.04 0 5 5 12000 39% 79 28%
Looks like all the effort finally paid off, considering also the fact that the base version is also not really a weak engine. It is a bit stronger than the offical iCE 0.3 which is rated 2475 ELO in the CCRL.
Next I maybe tweak manually some of the weights from the final set because some look suspicious. I wonder whether I'm able to score half a point back against the evolution ...