Sunday, May 12, 2013

Population Based Incremental Learning and Chess - Pt. II

Population Based Incremental Learning (PBIL) is a variant of  a genetic algorithm. In the recent past I've used it a lot to improve my chess engine iCE.

The two main factors responsible for personality and strength within a chess engine are evaluation and search.

Search is responsible to analyze the game tree of possible future chess positions and selecting the most promising path. As the game tree is huge and exponentially growing with increasing depth it has to decide which parts of the tree should get more focus than others.

Evaluation is responsible for assigning a numerical value to a chess position that is expresses the winning probabilities of both sides. Search will select a path that has the best winning probability as determined by the evaluation.

Both features (especially evaluation) are heavily parametrized and selecting the right parameters is the one million dollar question. Intuition, game play performance, chess theory knowledge, crystal balls and sometimes voodoo are used to select the parameters. Every change (especially the small ones) requires extensive testing. Often 30.000 games and more are played just to see whether a small change in one of the parameters made the engine stronger or weaker. To make things worse most of the parameters are connected somehow. Changing one parameter might require to change other parameters as well to keep the whole thing in balance.

As I consider playing millions of games boring and I really don't have the resources to do that anyway I came up with the idea to just hand my engine over to the evolution and let nature take its course. PBIL is excellent in optimizing large parameter sets. Instead of maintaining a population it maintains a probability vector of population properties. This is much simpler but at least as effective as classic GAs.

Here are the results of my first approach to optimize the parameter sets used by Search.

 My search framework is controlled by 35 parameters that require 176 bits. Those parameters control the search extensions (e.g. check extension), late move reductions, null move, internal iterative deepening, lazy evaluation and futility pruning.

The current search parameter set of iCE that should be optimized through PBIL looks like

 ID | PARAMETER NAME                     | VALUE |   MAX
----+------------------------------------+-------+-------
  0 | EXTEND_BY_CAPTURE_TO_PAWN_EG       |    30 |     31
  1 | EXTEND_IN_CHECK                    |    10 |     15
  2 | EXTEND_BY_CHECK_SNGL_RESPONSE      |     5 |     15
  3 | EXTEND_BY_CHECK_DBL_RESPONSE       |     1 |     15
  4 | EXTEND_BY_PAWN_PUSH_7              |    10 |     15
  5 | S0MP_DEPTH                         |     3 |      7
  6 | S0MP_MARGIN_1                      |     0 |  1.023
  7 | S0MP_MARGIN_2                      |     0 |  1.023
  8 | NULL_MOVE_DEPTH                    |     1 |      3
  9 | NULL_MOVE_IN_PV                    |     0 |      1
 10 | NULL_MOVE_MARGIN                   |    63 |    127
 11 | NULL_MOVE_THRESHOLD_R3             |     7 |     15
 12 | F_PRUNING_DEPTH                    |     5 |      7
 13 | F_PRUNING_1                        |   120 |    511
 14 | F_PRUNING_2                        |     0 |    511
 15 | F_PRUNING_3                        |   255 |    511
 16 | F_PRUNING_4                        |     0 |    511
 17 | F_PRUNING_5                        |     0 |    511
 18 | F_PRUNING_6                        |     0 |    511
 19 | LMR_DEPTH                          |     2 |      7
 20 | LMR_IN_PV                          |     0 |      1
 21 | LMR_GOOD_MOVE_REDUCTION            |     1 |      3
 22 | LMR_BAD_MOVE_REDUCTION             |     2 |      3
 23 | LMR_ADDITIONAL_REDUCTION_NT        |     1 |      3
 24 | LMR_MOVES_SEARCHED_PV              |     0 |     31
 25 | LMR_MOVES_SEARCHED_NON_PV          |     0 |     31
 26 | LMR_QUIET_MOVES_SEARCHED_PV        |     0 |     31
 27 | LMR_QUIET_MOVES_SEARCHED_NON_PV    |     0 |     31
 28 | IID_DEPTH                          |     5 |      7
 29 | IID_ONLY_PV                        |     1 |      1
 30 | IID_REDUCTION_PV                   |     2 |      3
 31 | IID_REDUCTION_NON_PV               |     2 |      7
 32 | QS_DELTA_CUTOFF_MARGIN             |   200 |  1.023
 33 | LE_USE                             |     0 |      1
 34 | LE_MARGIN                          | 2.047 |  2.047
----+------------------------------------+-------+-------


PBIL settings:

Population Size:    32
Generations:        1100
Learning Rate:      0.018 - 0.040 (depending on the progress)
Neg. Learning Rate: 0.000
Elitism:            Yes
Mutational Prob.:   0.01 - 0.05 (depending on the progress)
Fitness Function:   Knock-Out Tournament (0.25 sec per move)
Run Time:           365 hrs

The starting entropy with 176 bits is 61. Here all bits have a 50/50 chance to flip either towards 0 or 1. In the course of the evolution the entropy dropped. Bits increased their chances to flip either towards 0 or 1. At the end of the evolution about half of the bits were fully converged. So their chance of being 0 or 1 was close to 100%.


To monitor the progress of the evolution I measured from time to time the behavior and performance of the generated population . One interesting but not surprising fact is that the generated engines in later generations reached higher search depths given a constant time control.


Average Search Depths reached for Mid Game positions for moves 10. - 25.

Another fact is that the diversity within the population decreased. This could be seen with an increasing probabilities for draws and also that the games played in the tournament took longer.


Non-Draw Percentage of Games

Finally I measured the real playing strength of the engines after each 100 generations. Here I played an engine with the current state of the evolution against the base version with the original weights. Both engines used an identical evaluation and only differed in the values of the search parameters. The evolution improved the playing strength very rapidly at first. In later generations it went in and out of local optima.


It looks like the generation #1100 were I stopped the evolution is not the strongest one. The global peak was probably reached around generation #900. ("probably" because those data points have an error bar of about -20/+20 ELO).

Next steps: I will optimize the parameter set using the results from the first run. Some features are over determined and some others are under determined, so this requires some adjustments. I will further try to remove some randomness by modifying the fitness function a bit.

And then give it another run.