Saturday, December 21, 2013

Queen vs 3 Minors part 2

I did some more research in the case of a material imbalance of 3 minors vs a queen. In my previous post I determined the imbalance to be worth 0,82 pawns.

Unfortunately this seems to be too simple. The above imbalance figure includes the bonus for the bishop pair which the side with the minors usually enjoys. To remove this effect I repeated the test with positions where the 3 minors did not include the bishop pair.

An additional factor not covered so far is the effect coming from the number of rooks still on the board. In my previous test all starting positions included 2 rook pairs. The theory says that the bonus for the minors should be bigger if the rooks are still on the board. The bonus might be different if some rooks are removed. So I performed several tests to test this hypothesis. 

Here are my results

QRR vs RRBNN
In those positions all rooks were still on the board

Queen side scores 43,0% with equal number of pawns
Queen side scores 23,8% with Minors having an extra pawn

Value of Imbalance for Queen side: -0,37 pawns






QRvsRBNN
In those positions 1 rook for each side was removed

Queen side scores 50,5% with equal number of pawns
Queen side scores 28,9% with Minors having an extra pawn

Value of Imbalance for Queen side: +0,02 pawns






QvsBNN
 In those positions all rooks are removed

Queen side scores 70,8% with equal number of pawns
Queen side scores 44,5% with Minors having an extra pawn

Value of Imbalance for Queen side: +0,79 pawns






It seems the number of rooks has huge impact on the value of the imbalance. Its value changes by more than a pawn. 

 Probably the imbalances of queen and pawns vs a rook and two minors are to hard to be calculated or require a bigger effort. The missing pawns give the side with the minors some development compensation and also half open files, so the value of the imbalance gets polluted with other stuff. So this remains maybe as a future but not immediate exercise.

Wednesday, December 18, 2013

Queen vs 3 minors

Recently there was an interesting "Lonely queen" discussion at Talkchess where it was claimed that stockfish seems to have trouble to understand positions where a queen is exchanged vs 3 pieces.

iCE also does not have special code to handle this imbalance. I relied here on the mobility (3 pieces have more mobility than a queen, whose mobility in iCE is cut at 14 squares). This imbalance is difficult to tune as it appears in less than 1% of the games.






H.G. Muller suggested a cool method to not tune but to measure the value of the imbalance by playing games that enforce the imbalance and measure the winning chances for both sides. This is what I did.

I created a set of starting positions where the imbalance is present. To make it simpler to count I always gave the 3 minors to Black, but awarded the first move in halve of the starting positions to Black.

I then played a iCE vs iCE tournament. After 1000 games, Black scored 63,75%. So the imbalance has a significant influence. I now have to translate this winning percentage into centipawns as this is the score unit in iCE.

Here I removed for Black also the f7 pawn in the starting positions and played again. Black now scored 46,96%. This made the pawn worth 16,79%.


Now I could calculate the value of the imbalance as 13,75% / 16,79% * 1 pawn = 0,82 pawns.

At Talkchess the value of the imbalance was guessed to be 0,70 pawns. It seems this was an excellent guess.

Next step is to include a material imbalance term in the evaluation. It can go with the material signature and material hash, so it provides almost no additional computational costs. The impact on the playing strength will be tiny as it occurs so seldom.

But if your engine plays in 1% of its games really awkward people will notice. The stockfish case is a good example for that.

Next steps will be the measurement of some more imbalances that also involve a rook.

Saturday, December 14, 2013

iCE and the "wrong" bishop

iCE vs Gaviota    55. Rxe7   1/2 : 1/2
After my failed pawn structure experiment I was looking for a small task that has a high chance of success in making my engine better ... just to overcome my disappointment.

I remembered that one of the nice folks that tested my engine after release wrote iCE doesn't know about the wrong bishop. iCE is reporting large winning scores in dead drawn games.

I got a reference of a game against gaviota 0.86. Where iCE exchanged into a endgame with the wrong bishop (diagram).

After the exchange 55. Rxe7 Kxe7 56. Kxe5 iCE found itself in this position and evaluated it with +7. So it thought it is winning,
Draw (wrong bishop)

Of course it is not.

56. ...     Kf8
57. Kf6  Kg8

and the black king controls the corner.






So this is a sign of missing chess knowledge. Humans will see the draw rather easily and if an engine scores such a position as winning it makes a bit a fool out of himself (and its author). The funny thing is, that iCE actually knows about the wrong bishop in king, bishop and pawn vs. king endgames. It did not apply this knowledge because black still owns a pawn. So the endgame module did not kick in (this requires a bare king).

In the above position iCE would not have captured the pawn at f7 (although it could). Its knowledge tells him that without the black pawn the position is a draw. So it tried to keep the black pawn alive and made silly looking moves instead.

To overcome that I coded a few lines to teach iCE about the wrong bishop in a KBPKP endgame. It is a bit more difficult than the original KBPK endgame. In some positions black can still win and in some positions having still a pawn is the losing fact for black. But finally I came up with a good set of positions that are recognized as drawn.

Now this knowledge helps in the above position

iCE 2.0 v1001 x32 [2013.12.2]
position fen 8/3Rbp2/5k2/4n3/2B1K2P/8/8/8 w - - 3 55
go depth 8
info depth 7 seldepth 10 time 0 nodes 2448 pv d7e7 e5c4 e7c7 c4d2 e4d5 f6g6 d5e5  nps 2447999 score cp 586

info depth 8 seldepth 11 time 15 nodes 17773 pv d7c7 e5c4 c7c4 f6g6 e4f3 g6h5 f3g3 f7f5 nps 1184866 score cp 477

At depth 8 iCE realizes that the exchange leads to a wrong bishop endgame and selects a different move.

Probably this not worth a single ELO point, but as I'm not belonging to the only ELO matters fraction it is worth keeping.

Saturday, November 30, 2013

Drawish pawn structures

Many moves later the games ends 1/2 : 1/2
One of the things I had on my todo list was related to pawn structures. I thought that some pawn structures have a more drawish character than others.

Especially blocked or symmetrical positions tend to be more drawish than open ones. 

My idea was to build up a statistics over a lot of games and calculate the probability that a game ends as draw depending on the pawn structure that's on the board. If my engine later encounters a pawn structure with a high probability for a draw the score is pulled towards 0.

So if my engine is ahead it tries to avoid such structures (e.g. not lock its pawns) and if behind it tries to play towards them.

The first step was to collect a lot of positions from games and categorize them depending on the game outcome of the game they were taken from. I assembled a collection of games that lasted at least 60 moves as a base.
  • I saved the position at move number 50 into either a WIN or DRAW file.
  • I then processed the positions from the WIN and DRAW files. 
  • In all positions I only considered the specific pawn structure for the files B - G
Index: 111101 111011 = 3.963
  • For a white pawn at the file I recorded a 1 in the lower 6 bits and for a black pawn a 1 in the higher 6 bits. This number served as a index where I stored the number of games that ended as DRAW or as WIN when this structure was present.
Example: In my statistics the above pawn structure was found in 2.286 games. 857 were drawn and 1.429 were won. This resulted in a draw probability of 37% pct. The overall draw probability over all games and structures was 36%. So this pawn structure is a tiny bit more drawish than the average.

I did this for every possible combination. If there is no white and black pawn present on the B - G files the structure is very drawish (62%), symmetrical combinations had values of about 45%.

Finally I modified my evaluation function to scale the score up or down depending on the pawn structure. It used a lookup table where it stored all the 4096 possible scaling values.

Unfortunately it did not work. The tests showed the modified engine as equal or even slightly weaker then the original engine. I used also different indexing schemes like only counting files with blocked pawns etc... Nothing worked. So the patch was removed from the code again.

It was worth a try.

Hopefully I have more luck with the next idea.
 


  

Sunday, November 10, 2013

Is it a draw ?

While working on my knowledge module for the king and two pawns vs. king and knight endgame I had to deal with a lot of questions like this.

Given the following position.

Black to move offers a DRAW. Should White accept it ?
White has two passed pawns including a rook pawn where the knight has often trouble with. He also controls the promotion square of the pawn.
But the pawns have still a long way to go and the black king and knight are also placed not so bad.

Black is convinced he can stop the pawns and offers a Draw.

Should White accept it ?

PS: Don't cheat be looking it up in a table base like most of the engines do ...

Friday, November 1, 2013

Knight vs two pawns endgame

In a recent chess tournament where the best Italian engines played against an international selection my iCE engine managed to get into the quarter finals. It was paired against Booot, a very strong chess program rated more than 200 ELO higher than iCE in the rating lists.

It was a 6 round match and iCE scored 2 out of 5 points in the first 5 games. It needed a win in the last round playing White to get into the tie break. It looked promising but at the end it only got a Draw and was out.

In this last game iCE exchanged into an endgame where it plays with two pawns against a lonely knight. iCE wasn't able to see that its pawns won't be able to queen until it was to late.

iCE played 1. Kxf2 Ne4+ 2. Ke3 Nxd6 with a very good score

It had to search 29 half moves (plies) deep to understand the position.

info depth 29 seldepth 53 nodes 1388226533 pv g1f2 f6e4 f2e3 e4d6 e3d4 d6b7 d4e5 h7g6 e5e6 g6g5 e6d7 b7a5 score cp 0

Most humans will see the draw without problems. One pawn is stopped by the knight and the other be the king. So not realizing this is a clear defect.

So I decided to put some effort in coding a few rules for a KPPKN endgame just for fun.

In a KPPKN endgame there are 855.071.008 legal positions and 439.033.368 can be won (usually but not always for the side with the pawns of course). This means almost half of the possible positions are draws. Nevertheless it is really difficult to find some patterns that spot the draws.

I was able to find rules that detect 30.371.460 draws, so less than 10%. This is not to impressive, luckily combined with search it already helps a lot.

The new iCE performs much better in the above position

position fen 8/7k/1P1R1n2/8/6P1/8/5r2/6K1 w - - 2 68
go depth 10
info depth 1 seldepth 0 time 0 nodes 23 pv g1f2 f6g4 f2f3 nps 22999 score cp 1086 

info depth 2 seldepth 3 time 16 nodes 142 pv g1f2 f6e4 f2e3 e4d6 nps 8874 score cp 521 
info depth 3 seldepth 4 time 0 nodes 200 pv g1f2 f6e4 f2e3 e4d6 nps 199999 score cp 521
info depth 4 seldepth 5 time 0 nodes 181 pv g1f2 f6e4 f2e3 e4d6 nps 180999 score cp 521
info depth 5 seldepth 7 time 0 nodes 281 pv g1f2 f6e4 f2e3 e4d6 e3d4 nps 280999 score cp 567 
info depth 6 seldepth 11 time 15 nodes 12448 pv d6e6 f6g4 b6b7 f2b2 e6e7 h7g6 g1f1 nps 829866 score cp 251
info depth 7 seldepth 11 time 15 nodes 17080 pv g4g5 f6e4 g5g6 h7g7 d6d7 g7g6 b6b7 f2b2 nps 1138666 score cp 218 
info depth 8 seldepth 14 time 0 nodes 12371 pv g4g5 f6e4 d6h6 h7g7 h6h2 f2f7 h2b2 f7b7 nps 12370999 score cp 153 
info depth 9 seldepth 16 time 16 nodes 26088 pv g4g5 f6g4 b6b7 f2b2 g5g6 h7g7 d6d7 g7g6 nps 1630499 score cp 53 
info depth 10 seldepth 18 time 31 nodes 55953 pv g4g5 f6e4 d6e6 f2b2 g5g6 h7g7 e6e4 nps 1804935 score cp 0 
bestmove g4g5 ponder f6e4
info time 172 nodes 124767 nps 725389


Already at depth 6 iCE realizes that Kxf2 will only draw and at depth 10 it realizes that the alternatives are no better.

Saturday, October 19, 2013

King safety tuning

An important term in the evaluation of an engine is King Safety. This addresses the fact that the game is lost when the own king is mated no matter what. This also allows an engine to consider material sacrifices when the enemy king comes under pressure by them. So without a decent king safety an engine will provide their opponents good chances for some spectacular mate combinations.

A simple approach is just to count the enemy pieces near the own king. The more pieces the more dangerous it is. This ignores the fact that sliders can be very dangerous from a distance. It is better than nothing but usually not good enough.

A more sophisticated implementation counts the attackers and their attacks against the area of the own king. It also considers defenders. Finally the counter is used as a look-up index into a non linear table. So a lone queen or rook attacking the king is not so bad, but if a knight and a rook join into the attack the king safety score jumps up exponentially.  




iCE uses the later approach. So it maintains a counter that is increased if an enemy piece attacks a
square near the king and decreased again if the square is defended by a own piece. The amount of increase and decrease depends on the piece type. In the current release of iCE those values are not tuned. So I just thought of some meaning full values, eg. if the queen attacks increase by 3, if the knight attacks increase by 2, if the pawn defends decrease by 1 etc...

I thought it might make sense to verify that the values I picked here are really good ones. Intuition is usually not the best adviser in chess programming.

I revived my genetic algorithm tuning framework that was responsible for most of the ELO gain of iCE 1.0 and modified it a bit to just tune the king safety counter values and not all the rest of the weights.

I then started an evolution of 1000 generations with a population size of 64, where the fittest individual is determined by a knock-out tournament within the generation. It ran for almost two weeks.

Here are some impressions:

The ranking table of the individuals of the final (1000.) generation. The 32 genoms that did not survive the initial 2-games round are removed from the list. 

 Rank Name     Elo    +    - games score oppo. draws
   1 ice.05   226   59   59   100   64%   131   23%
   3 ice.49   200  116  116    20   53%   191   35%
   4 ice.36   163   95   95    32   53%   154   31%
   5 ice.60   156   56   56   100   50%   164   31%
   6 ice.32   151  169  169     8   56%   127   38%
   7 ice.17   144   72   72    56   54%   123   36%
  10 ice.25   123  125  125    20   58%    96   15%
  12 ice.26   119  169  169     8   44%   152   38%
  13 ice.50   118  174  174     8   56%    96   38%
  14 ice.33   117  169  169     8   44%   158   38%
  16 ice.59   113  115  115    20   48%   131   35%
  17 ice.29   111  197  197     8   50%   146    0%
  18 ice.04    92  121  121    20   50%    97   20%
  20 ice.47    81   93   93    32   53%    65   38%
  21 ice.13    81  168  168     8   56%    61   38%
  24 ice.43    75  167  167     8   44%   103   38%
  26 ice.52    69   90   90    32   47%    89   44%
  28 ice.10    66  113  113    20   55%    35   40%
  30 ice.40    41  175  175     8   38%   103   25%
  32 ice.16    39   78   78    56   51%    47   27%
  34 ice.62    31  163  163     8   50%    38   50%
  36 ice.08     4  178  178     8   50%    19   25%
  37 ice.27   -11  173  173     8   56%   -24   38%
  39 ice.35   -20  118  118    20   45%    14   30%
  42 ice.02   -44  182  182     8   44%    20   13%
  48 ice.41  -102   93   93    32   45%   -73   41%
  49 ice.22  -108  169  169     8   44%   -61   38%
  50 ice.53  -118  120  120    20   45%   -53   30%
  52 ice.00  -127  114  114    20   55%  -147   40%
  54 ice.19  -149  168  168     8   44%  -114   38%
  55 ice.51  -170  162  162     8   50%  -159   50%
  58 ice.48  -187  164  164     8   38%  -135   50%


Here the winner was ice0.5. The final round was played between the individuals ice.05 and ice.60 and it ended 20 - 11 - 13.

Probability Vector after 100 generations. Most bits haven't decided yet whether to converge towards 0 or 1.

Probability Vector after 1000 generations. Most bits are fully converged towards 0 or 1.

The evolution made slow but steady progress. The entropy of the system dropped from 41 to about 10.


Some observations:

My original handpicked values were not so far off in general.

The knight seems more dangerous in attacking the squares in front of the king than the bishop, where as the bishop is a better defender. The most powerful attacking piece is the rook. 

Rook and queen are bad defenders.

The situation becomes more dangerous if the enemy has safe checks available, especially from the sliders (bishop, rook and queen). A safe check from a knight is almost worthless.

The threat of a back rank mate is about as dangerous as an additional attacker.


Result:

I played a final 16k games match between the tuned and the original version. It shows that with a very high statistical confidence the new iCE is stronger. Only 10 (selfplay) ELO, but most patches score muss less and I actually haven't added code. I just modified some values. So I decided to count my little project as a success.

Rank Name      Elo    +    - games score oppo. draws
   1 ice.tuned   5    4    4 16000   52%   -5   36%
   3 ice.base   -5    4    4 16000   48%    5   36%



Saturday, October 12, 2013

Batch processing in windows

For a little analyses project for my chess engine evaluation I was looking for a collection of chess positions to perform some statistics with that I can use later in my eval.

First I looked for some games collections. Ed Schroeder is hosting a large one (2.2 Mio games)  from human players at his site. After a first look I realized it is useless for my purpose. There are games in it where the players agreed to a draw after 17 moves or less. The outcome of chess games by human players is to noisy for statistics. It is influenced to much by things unrelated to the actual game (e.g. is the player happy with a draw because it is enough to secure its position in the tournament).

So I used chess engines games from the CCRL collection. There are also a lot available. I filtered them to remove the short games and to split them into won and drawn games with pgn-extract.

Then I used pgn2fen to serialize the pgn games into a list of consecutive FEN positions. This resulted in a long long list (like the one below)

...
2q5/4Q1bk/5pp1/2B1p2p/p2p3P/P2P1PP1/1P2P1K1/8 w - - 4 47
8/1q4bk/3Q1pp1/2B1p2p/p2p3P/P2P1PP1/1P2P1K1/8 w - - 6 48
2q5/6bk/3Q1pp1/4p2p/pB1p3P/P2P1PP1/1P2P1K1/8 w - - 8 49
8/1q4bk/3Q1pp1/2B1p2p/p2p3P/P2P1PP1/1P2P1K1/8 w - - 10 50
2q5/6bk/3Q1pp1/4p2p/pB1p3P/P2P1PP1/1P2P1K1/8 w - - 12 51
8/6bk/5pp1/2q1p2p/pB1p3P/P2P1PP1/1P2P1K1/8 w - - 0 52
...

From all those positions I was only interested in the ones at move number 50 with White to move.

In UNIX this would be pretty easy with just a cat and grep combination, but I don't have a UNIX system and so I looked for a solution in Windows.

It is hard to believe but it can be done in a small batch file

FOR %%X in (*.fen) do (
    FOR /F "tokens=1,2,3,4,6* delims=; " %%i in (%%X) do (
        if /I %%m EQU 50 (
            if /I %%j EQU w @echo %%i %%j %%k %%l %%m >> 50-%%X
        )
    )
)


The first for statement loops over all my input files (*.fen).
The second for loop processes every line in each file. It splits the line into different parts and compares the part the represents the move number with 50. If it maches the line is copied to the result file.

The 3rd test verifies that the side to move is White. This could be skipped as I told pgn2fen already to output only positions with White to move.

Now I have something to analyze. Maybe I can detect some interesting patterns in the positions.





Saturday, September 28, 2013

Regression testing

Currently I rework my evaluation a bit. Nothing major.

I just add a simple pattern that might be missing or remove a pattern that might overlap with already existing ones.





This is usually a 5 minutes job of coding. The tricky part is to find out whether the change improves the engine or makes it weaker. As the changes are tiny the original and the patched engine are almost similar in strength, probably within a 5 ELO range.

The only way to get a feeling about the quality of the patch is to play an huge number of games.

I currently use the following test methodology.

  1. Code the change and test it for functional correctness. For evaluation I have a automated unit test that must be passed.
  2. Compile a release version of the engine.
  3. Setup a match using cutechess-cli as tournament manager. I use 4 parallel threads on my 4 core i7 and a huge balanced opening book where start postions are randomly selected. Each opening is played twice (each engine gets the chance to be white and black).
  4. I run 16.000 games and don't stop early.
  5. Feed the resulting match file into bayeselo to calculate the strength difference and statistical error bars.
  6. Draw the conclusion whether to keep or discard the change.
For most of the patches even 16.000 games are not enough to be sure about an improvement. After 16k games the error bars for both versions are -4 / +4. So in the worst case the stronger engine might be 4 ELO weaker than displayed, the weaker engine 4 ELO stronger than displayed. I need a 9 ELO difference to be sure the suspected stronger is really stronger.


If a patch shows 9 ELO or more improvement the decision is easy. In some case I might even keep a 5 ELO patch if it simplifies the evaluation. This is a bit risky of course.

On the other hand changing something in the evaluation brings the evaluation a bit out of its former balance. So a change that gives -2 ELO might still be very good if the balance of the evaluation is restored again.

With that reasoning I accept also some changes that are not an improvement beyond doubt. They might show their real potential if I re-tune the evaluation in a later phase.

When I'm done with my todo list of small changes I will run the final version vs the first version (iCE 1.0). Hopefully here I see then an improvement beyond doubt. Fingers crossed.



Tuesday, September 17, 2013

Collecting the Principle Variation (PV)

Since the early days of iCE I did not spent to much thought about collecting the principle variation (PV) for a given search. The PV represents the moves sequence from both sides in a given position.
The evaluation of the last position reached when the moves are played is the final resulting score of the just performed search.

My approach was just to collect the PV from the main transposition table. The way it works enables as a side effect to collect the PV by following the exact entries in it.

This is an easy and fast way without overhead. Unfortunately especially on long searches the PV is incomplete because
  • necessary entries were overwritten
  • the PV extends into the quiescence search (where I don't store hash entries)
This did not bother me to much. I considered displaying the PV a rather cosmetic thing anyway.

However for debugging reasons it is helpful to have the complete PV so it is possible to really track down the position whose eval is responsible for the final score. So recently I decided to change my implementation.

Now I collect the PV during search. This involves some storing and copying moves between arrays when a new best move is found.

// update the pv
pv[0] = ml->moves[i];
for (int j = 0; j < STATIC_QSEARCH_MAX_DEPTH; j++)
{    

    // copy the moves from the subtree that this move generated to the pv until we hit an invalid move = end of list    
     if (INVALID_MOVE == (pv[j + 1] = subtree_pv[j])) break; 
}


I thought this to be really slow but the performance impact was almost not measurable.

Now iCE will display the correct and full PV much more often. There are still some rare cases left (Hash cutoffs in PV Nodes) where the PV is truncated. But this is rather rare.


Wednesday, August 21, 2013

Mission accomplished


As stated earlier my goal in the development of iCE was an engine strong enough to make it into the Top100 of the CCRL. iCE 1.0 has now played enough games (200 are required) to be officially listed and it moved up to rank #62 right away with a rating of 2663 ELO.

The previous version iCE 0.3 was at rank #129 with 2459 ELO.

Looks like a need a new goal for further development. Unfortunately it will now become increasingly difficult to improve iCE further. Anyway without a mission iCE is going nowhere.

Those are my long term goals for the future versions, not necessarily already the next one.

1. Break through the crafty-threshold (single core)
2. Get strong enough to beat Lucas' engine DiscoCheck
3. Enter the Top 40 of the CCRL
4. Reach 2800 ELO

We'll see how long it takes this time until I reach at least 1 of those 4.

Thursday, August 15, 2013

Bahrs rule

One thing that makes iCE special compared to many other engines is the possession of endgame knowledge. It does not access external tablebases. It uses heuristics, rules and special evaluators to deal with specific endgames.

This is of course a wild field and there is always stuff that is still missing. One of the missing piece of chess knowledge is a draw rule in a KPPKP endgame with a blocked rook pawn and an outside passed pawn. Under certain conditions the defender can get a draw. It would be nice to see that statically in the evaluation without relying on search for that.

Bahrs rule can be tried when
  • the attackers locked rook pawn has not reached the 5th rank yet
  • the attackers king is next to its passed pawn
  • the defenders king is in front of the passed pawn or king
Draw as the white pawn has crossed the borderline
 
There is an additional condition. In order for the defender to draw the white passed pawn must have crossed the borderline already.

The borderline can be implemented with bitboards very easily so the conditions of Bahrs rule are fast to check. I extended Bahrs rule a bit so a few more draws can now be recognized.

This is the output (shortend) from iCE 1.0 for the above position. So until depth 16 it assigns a huge bonus to White because of the passed pawn. It needs 17 ply to realizes it is a draw and the score drops to 0.

iCE 1.0 v1619 x32 [2013.6.17]
position fen 8/6k1/8/p4KP1/P7/8/8/8 w - - 0 1
go
info depth 16 seldepth 18 time 0 nodes 1030 pv g5g6 g7g8 f5f6 g8h8 f6g5 h8g7 nps 1029999 score cp 408
info depth 17 seldepth 21 time 0 nodes 1655 pv g5g6 g7g8 f5f6 g8h8 f6g5 h8g7 nps 1654999 score cp 0  


The engine that knows about Bahrs rule sees this much earlier

iCE 2.0 v143 x32 [2013.7.23]
position fen 8/6k1/8/p4KP1/P7/8/8/8 w - - 0 1
go
info depth 1 seldepth 0 time 0 nodes 6 pv f5e5 nps 5999 score cp 319
info depth 2 seldepth 2 time 0 nodes 14 pv f5e5 g7g6 nps 13999 score cp 263
info depth 3 seldepth 3 time 0 nodes 47 pv f5e5 g7g6 e5e6 nps 46999 score cp 0


Now it only takes 3 ply to see it.

So iCE is now a bit smarter in deciding whether it should exchange into a KPPKP endgame or better not.
 

Thursday, July 25, 2013

Fifty-move rule

When you ask an average chess player what the 50 move rule in chess is about he will probably say something like
A game is draw if no capture has been made and no pawn has been moved in the last fifty consecutive moves
I would have said the same and so I implemented it in iCE. But actually this is not correct. If you look the rule up in the FIDE laws they state

The game is drawn, upon a correct claim by the player having the move, if
(a) he writes on his scoresheet, and declares to the arbiter his intention to make a move which shall result in the last 50 moves having been made by each player without the movement of any pawn and without the capture of any piece, or
(b) the last 50 consecutive moves have been made by each player without the movement of any pawn and without the capture of any piece.
Of course it is difficult for a UCI chess engine to write on a scoresheet or even to claim a draw. This is usually done by the GUI automatically. But one important point remains. The game is not automatically a draw after 50 moves, there must exist a side having the right to move.

Why is that important ?

If one side mates the opponent with its 50th move the result is mate and not draw because after a mate there is no side to move that can claim a draw anymore.

Well, how likely is this having any relevance?

It is not as unlikely as one might think. The difference will be suppressed because often games are adjudicated very early by the GUI to save time. But if games are played until the end it will show up sooner or later.

One tester of iCE made me aware of a game where iCE just gave away half a point.

6R1/8/8/8/4K3/4N3/4r2k/8 b - - 96 147
Already 48 moves (96 plies) have been played. The draw is in sight. But what did iCE as Black play?

147.  ..  Kh3??
148. Kf3  Rh2
149. Rh8#

iCE was just assuming that Rh8 will finish the 50 move sequence and draw so he walked into the mate. An analyses shows this clearly. It sees Kh3 as a drawing move.

position fen 6R1/8/8/8/4K3/4N3/4r2k/8 b - - 96 147
analyze depth 8
e2e3 : -32.463  e2e3 e4e3
e2a2 :       0  e2a2 e3f5 a2a1 f5g7
e2e1 :       0  e2e1 g8h8 h2g3 h8g8
h2h3 :       0  h2h3 g8h8 h3g3 h8g8
e2g2 : -32.461  e2g2 g8g2 h2h1 g2c2 h1g1 e4f3 g1h1 c2c1 h1h2 e3g4 h2h3 c1h1
e2f2 :  -1.372  e2f2 e3g4 h2h3 g4f2 h3h2
e2d2 :  -1.346  e2d2 e3f1 h2h3 f1d2 h3h4 e4f4 h4h5 g8g5 h5h4 d2e4
e2c2 :  -1.373  e2c2 e3c2 h2h3 c2d4 h3h4 e4e5 h4h5 d4f5
e2b2 :       0  e2b2 e3f1 h2h3 g8h8
h2h1 :       0  h2h1 g8h8 e2h2 h8h2 h1h2


After knowing there is a problem it was easy to fix. I only had to remove a single character in the source.

inline bool TBoardStateHistory::is50MoveDraw()
{
    return hmc > 100;    // was hmc >= 100
}


Now it solves the above position correctly

position fen 6R1/8/8/8/4K3/4N3/4r2k/8 b - - 96 147
analyze depth 6
e2e3 : -32.463  e2e3 e4e3
e2a2 :       0  e2a2 e3f5 h2h3 f5g7
e2e1 :       0  e2e1 g8h8 h2g3 h8f8 e1h1
h2h3 :  -1.345  h2h3 e4f3 h3h4 f3e2 h4h5 e3f5
e2g2 :  -1.401  e2g2 g8g2 h2h1 g2f2 h1g1 f2f1 g1h2 e3d5
e2f2 :  -1.181  e2f2 e3g4 h2g2 g4f2 g2f1 g8h8 f1g1
e2d2 :  -1.378  e2d2 e3f1 h2h3 f1d2 h3h4 d2f3 h4h5 f3e5
e2c2 :  -1.378  e2c2 e3c2 h2h3 c2d4 h3h4 d4f3 h4h5 f3e5
e2b2 :       0  e2b2 e3f1 h2h3 e4f5 h3h4
h2h1 :       0  h2h1 g8h8 e2h2 h8g8 h2g2


It sees now that h2h3 will lose the rook to avoid an instant mate and select a different move.

Thanks to jack who reported the issue.

Saturday, July 13, 2013

Pondering about king positions in pawn hash

iCE 1.0 calculates a hash for the pawn structure and caches it. This is very efficient because different somehow related chess positions share the same pawn structure and evaluation terms related only to the positions of the pawns (e.g. isolated pawns) must not be evaluated again.

I'm now thinking about adding more pawn structure related terms to my evaluation, however terms that only depend on the pawn structure are limited. The king pawn shelter could be a pawn structure related term but it depends also on the king position. The same pawn structure will produce a different results for the shelter depending on the location of the king  (after king side or queen side castling). Only if I add the king position to the hash I can evaluate the pawn shelter. But there is a downside. The amount of hash hits and so the efficiency of the cache will decrease.

In order to visualize the effect I selected a number of positions and recorded the number of hash hits with and without including the king positions in the hash.

Successful hash hits with and without king positions in the hash
So it shows that for typical mid game positions there is an about 4% drop in the hit rate. All the features I'm now able to put into the hash additionally must compensate this 4% drop.

I have to try it out.

Thursday, July 4, 2013

The MSVCP100 mess hits again






I received a message today from someone trying out iCE that the 32 bit version of iCE 1.0 has the dependency on the well known msvcp100.dll. I usually build my executable without that dependency  (it really sucks). But obviously this time in the 32 bit branch I selected the wrong runtime library.

Most of the guys running chess engines have the Microsoft Visual C++ 2010 SP1 Redistributable Package (x86) installed and won't notice. But a few might have not.

I went back to my 1.0 code base and recompiled the 32 bit version and updated my website. So the error should not appear any longer, hopefully.

Thomas...


Saturday, June 29, 2013

How strong is the new iCE


iCE 1.0 has not played enough games in CCRL yet to be included in the official ranking there.

But I already got some feedback from other testers that the new version is a significant improvement of more than 200 ELO.




This little tournament was run with the 32 bit version and a time control of 5 minutes a game. Lars Hallerström was so kind to give me permission to post them. Thanks again.

78 Ice 1.0 : 2703  840 (+367,=226,-247), 57.1 %


Alaric 707                    :  40 (+ 16,= 10,- 14), 52.5 %
Amyan 1.7a                    :  40 (+ 18,=  9,- 13), 56.2 %
Chepla 0.63                   :  40 (+ 28,=  8,-  4), 80.0 %
ET 8.01                       :  40 (+ 23,=  8,-  9), 67.5 %
Tao 5.6                       :  40 (+ 29,=  4,-  7), 77.5 %
Ufim 8.01                     :  40 (+ 27,=  9,-  4), 78.8 %
Cyrano 0.6b17                 :  40 (+ 15,= 12,- 13), 52.5 %
Fruit 2.3.1                   :  40 (+ 10,= 10,- 20), 37.5 %
Ktulu 9                       :  40 (+ 10,= 16,- 14), 45.0 %
Ruffian 1.0.0                 :  40 (+ 15,= 12,- 13), 52.5 %
SOS 5.1                       :  40 (+ 19,= 11,- 10), 61.3 %
WildCat 8 beta 5              :  40 (+ 18,=  8,- 14), 55.0 %
Cheng3 1.06b                  :  40 (+ 17,= 11,- 12), 56.2 %
Chronos 1.9.9 beta            :  40 (+ 11,= 17,- 12), 48.8 %
Deuterium 12.01               :  40 (+ 15,= 13,- 12), 53.8 %
Dirty Sep  5 2012             :  40 (+ 16,= 12,- 12), 55.0 %
Pharaon 3.5.1                 :  40 (+ 16,= 13,- 11), 56.2 %
Tornado 4.78                  :  40 (+ 20,= 10,- 10), 62.5 %
Texel 1.01                    :  40 (+ 19,= 10,- 11), 60.0 %
BobCat 3.25                   :  40 (+ 11,= 11,- 18), 41.2 %
Bug2 1.9                      :  40 (+ 14,= 12,- 14), 50.0 %


So here iCE gained 243 ELO points compared with iCE 0.3. This is a huge jump and I guess iCE 1.0 will now enter also the Top 100 in CCRL. So I probably need a new goal.

 78 Ice 1.0 : 2703
183 Ice 0.3 : 2460

Tuesday, June 25, 2013

It's getting hotter

After the recent release of my engine it now starts playing in tournaments replacing its previous version. Those games are usually played at much longer time controls that I had used in the development. If a chess engine has something like a personalty or style it will show itself in such games.

It looks it has a rather speculative style. It doesn't fear to give away material to improve its position. And as the positional factors in its evaluation can reach rather big values it might sacrifice a lot. Probably this is not always sound but it creates interesting games.

Here is a position after move 15 from a game it played against Delphil 2.9g 64 bit (2529 ELO in CCRL). Time Control was 40 moves in 25 minutes and its a game from the latest Division 5 tournament of Graham Banks.

iCE 1.0 vs Delphil 2.9g, position after 15. .. Bd4

  iCE is already a pawn down
   but
  iCE has its rook at the open d-file opposing blacks queen

  Black has an weak isolated double pawn at the c5 file
  Blacks knight lacks some mobility
  Blacks king safety is a bit damaged









So if down in material give away some more. 16. Rxd4 (giving the rook for a bishop and a pawn)

16. Rxd4 cxd4  17. Qxd4+ Kg8  18. Qc4 e6  19. e5 g5  20. Ne4 Rf5  21. Bd2 Rxe5

iCE 1.0 vs Delphil 2.9g, position after 21. .. Rxe5

  iCE is now down a rook and a pawn vs. a bishop

  but

  it has the bishop pair

  the white knight has good destinations
  the white king is safe

  the black bishop at d7 is not more than a big pawn
  the c6 and e6 pawns are weak
  the black knight is still at the rim
  the king is exposed

Wednesday, June 19, 2013

iCE 1.0 sees the light of the day

It seems the news are spreading already so I don't wait till the weekend and give some details about the new release 1.0 this morning.

http://www.fam-petzke.de/cp_ice_en.shtml

The complete evaluation has been rewritten.

The old version used constant weights to evaluate certain features in a position. This helped the compiler to generate efficient code however every change to a weight required a recompliation of the program. For my later tuning efforts this was to inflexible.

Added endgame knowledge.

Knowledge that helps iCE to understand drawn positions in KRKP and KQKP endgames.

Evaluation of passed pawns

The potential of passed pawns is now better evaluated. iCE 0.3 was here to optimistic and often traded its pieces to get a passed pawn that had little chances of promoting. Additionally a recognizer to spot unstoppable passers is now part of the evaluation.

Evaluation Tuning
All earlier prototypes used evaluation weights that were not tuned. If a new term was not able to improve the engine with an untuned weight it was removed again. This was against the principle of only adding features if you can't improve the engine with existing ones anymore. For the tuning itself I implemented my own tuning framework using a genetic algorithm that worked on all weights in parallel. Some tests with positional EPD testsets showed that the positional understanding of iCE was improved by this.

Search Parameter Tuning
The same framework was used to tune the search control parameter. The framework itself was not changed. However as most of the control parameters were set to new values it behaves now completely different.

64 bit Versions
All prototype versions were only available as 32 bit versions. Some code changes were necessary to implement 64 bit versions of basic bit operations. The 64 bit version (/w popcount) runs about 70% faster than the 32 bit version on the same HW.

Unchanged
iCE is still a CLOP free zone (who trusts negatives Hessians) and does not access endgame tablebases in play. However the new endgame knowledge was verified against Gaviota table bases. Tribut and thanks for providing them goes to Miguel Ballicora. Some won endgames take a bit longer especially if the opponent has access to table bases and knows the optimal defense. But the moves played are probably more natural.

Everything related to opening books is unchanged and still works the same with 1.0

Still Missing
As already noticed iCE still uses hashes of fixed size (4 * 2^21 slots for the main hash + some for additional hashes - in total about 200 MB). I will change that in the next version.

In self tests and at short time controls the new version plays significantly stronger than the old one. Its LOS was 100%. And it did not took a lot of games to see that.

So comments, bug reports or suggestions for improvement are highly welcome.

Thomas...

Sunday, June 2, 2013

Move ordering competition

At Talkchess Ed Schroeder launched a not to serious move ordering competition. The idea is to find out which program is most successful in ordering a move that gives a cutoff early in the list.

Sunday, May 26, 2013

Chess programmers and harpists

Harpists spend 90 percent of their lives tuning their harps and 10 percent playing out of tune.
Igor Stravinsky

My computer just finished a second tuning run trying to optimize the search control parameters in my engine. This time it run for another 364 hours. I optimized the parameter set a bit, removed dead parameters (parameters belonging to a feature that the first run decided to disable) and added a few new ones. I also lowered the learning rate and modified the fitness function a bit.













Sunday, May 12, 2013

Population Based Incremental Learning and Chess - Pt. II

Population Based Incremental Learning (PBIL) is a variant of  a genetic algorithm. In the recent past I've used it a lot to improve my chess engine iCE.

The two main factors responsible for personality and strength within a chess engine are evaluation and search.

Search is responsible to analyze the game tree of possible future chess positions and selecting the most promising path. As the game tree is huge and exponentially growing with increasing depth it has to decide which parts of the tree should get more focus than others.

Evaluation is responsible for assigning a numerical value to a chess position that is expresses the winning probabilities of both sides. Search will select a path that has the best winning probability as determined by the evaluation.

Both features (especially evaluation) are heavily parametrized and selecting the right parameters is the one million dollar question. Intuition, game play performance, chess theory knowledge, crystal balls and sometimes voodoo are used to select the parameters. Every change (especially the small ones) requires extensive testing. Often 30.000 games and more are played just to see whether a small change in one of the parameters made the engine stronger or weaker. To make things worse most of the parameters are connected somehow. Changing one parameter might require to change other parameters as well to keep the whole thing in balance.

As I consider playing millions of games boring and I really don't have the resources to do that anyway I came up with the idea to just hand my engine over to the evolution and let nature take its course. PBIL is excellent in optimizing large parameter sets. Instead of maintaining a population it maintains a probability vector of population properties. This is much simpler but at least as effective as classic GAs.

Here are the results of my first approach to optimize the parameter sets used by Search.

 My search framework is controlled by 35 parameters that require 176 bits. Those parameters control the search extensions (e.g. check extension), late move reductions, null move, internal iterative deepening, lazy evaluation and futility pruning.

The current search parameter set of iCE that should be optimized through PBIL looks like

 ID | PARAMETER NAME                     | VALUE |   MAX
----+------------------------------------+-------+-------
  0 | EXTEND_BY_CAPTURE_TO_PAWN_EG       |    30 |     31
  1 | EXTEND_IN_CHECK                    |    10 |     15
  2 | EXTEND_BY_CHECK_SNGL_RESPONSE      |     5 |     15
  3 | EXTEND_BY_CHECK_DBL_RESPONSE       |     1 |     15
  4 | EXTEND_BY_PAWN_PUSH_7              |    10 |     15
  5 | S0MP_DEPTH                         |     3 |      7
  6 | S0MP_MARGIN_1                      |     0 |  1.023
  7 | S0MP_MARGIN_2                      |     0 |  1.023
  8 | NULL_MOVE_DEPTH                    |     1 |      3
  9 | NULL_MOVE_IN_PV                    |     0 |      1
 10 | NULL_MOVE_MARGIN                   |    63 |    127
 11 | NULL_MOVE_THRESHOLD_R3             |     7 |     15
 12 | F_PRUNING_DEPTH                    |     5 |      7
 13 | F_PRUNING_1                        |   120 |    511
 14 | F_PRUNING_2                        |     0 |    511
 15 | F_PRUNING_3                        |   255 |    511
 16 | F_PRUNING_4                        |     0 |    511
 17 | F_PRUNING_5                        |     0 |    511
 18 | F_PRUNING_6                        |     0 |    511
 19 | LMR_DEPTH                          |     2 |      7
 20 | LMR_IN_PV                          |     0 |      1
 21 | LMR_GOOD_MOVE_REDUCTION            |     1 |      3
 22 | LMR_BAD_MOVE_REDUCTION             |     2 |      3
 23 | LMR_ADDITIONAL_REDUCTION_NT        |     1 |      3
 24 | LMR_MOVES_SEARCHED_PV              |     0 |     31
 25 | LMR_MOVES_SEARCHED_NON_PV          |     0 |     31
 26 | LMR_QUIET_MOVES_SEARCHED_PV        |     0 |     31
 27 | LMR_QUIET_MOVES_SEARCHED_NON_PV    |     0 |     31
 28 | IID_DEPTH                          |     5 |      7
 29 | IID_ONLY_PV                        |     1 |      1
 30 | IID_REDUCTION_PV                   |     2 |      3
 31 | IID_REDUCTION_NON_PV               |     2 |      7
 32 | QS_DELTA_CUTOFF_MARGIN             |   200 |  1.023
 33 | LE_USE                             |     0 |      1
 34 | LE_MARGIN                          | 2.047 |  2.047
----+------------------------------------+-------+-------


PBIL settings:

Population Size:    32
Generations:        1100
Learning Rate:      0.018 - 0.040 (depending on the progress)
Neg. Learning Rate: 0.000
Elitism:            Yes
Mutational Prob.:   0.01 - 0.05 (depending on the progress)
Fitness Function:   Knock-Out Tournament (0.25 sec per move)
Run Time:           365 hrs

The starting entropy with 176 bits is 61. Here all bits have a 50/50 chance to flip either towards 0 or 1. In the course of the evolution the entropy dropped. Bits increased their chances to flip either towards 0 or 1. At the end of the evolution about half of the bits were fully converged. So their chance of being 0 or 1 was close to 100%.


To monitor the progress of the evolution I measured from time to time the behavior and performance of the generated population . One interesting but not surprising fact is that the generated engines in later generations reached higher search depths given a constant time control.


Average Search Depths reached for Mid Game positions for moves 10. - 25.

Another fact is that the diversity within the population decreased. This could be seen with an increasing probabilities for draws and also that the games played in the tournament took longer.


Non-Draw Percentage of Games

Finally I measured the real playing strength of the engines after each 100 generations. Here I played an engine with the current state of the evolution against the base version with the original weights. Both engines used an identical evaluation and only differed in the values of the search parameters. The evolution improved the playing strength very rapidly at first. In later generations it went in and out of local optima.


It looks like the generation #1100 were I stopped the evolution is not the strongest one. The global peak was probably reached around generation #900. ("probably" because those data points have an error bar of about -20/+20 ELO).

Next steps: I will optimize the parameter set using the results from the first run. Some features are over determined and some others are under determined, so this requires some adjustments. I will further try to remove some randomness by modifying the fitness function a bit.

And then give it another run. 

Saturday, May 4, 2013

To Null Move or not to Null Move

 
After tuning the evaluation weights I'm currently using a PBIL based evolution to tune the parameters within the search framework of iCE. It includes control parameters for Extensions, Futility Pruning, LMR, IID, Delta Pruning, Lazy Eval and of course Null Move.

Part of the genom is a bit for most features that allows the evolution to disable a feature or restrict it to PV or None PV nodes.




The evolution is still running but is already pretty converged. During the course of the run I realized that the evolution is unsure whether the Null Move should be performed always or restricted to Non PV nodes. This comes a bit as a surprise because I never considered null moving in a PV node even theoretical sound (I just included the bit in the genom for completeness).

I checked a few other engines including stockfish and crafty and although crafty is using null move extensively it excludes it in PV nodes. The only engine that seems to do it is Texel


In a PVS framework PV nodes are rather rare, so it might make little difference whether Null Moves are used here or not. Maybe this is the reason the evolution has a hard time to converge this bit. But I'm still wondering whether it breaks something if used. A PV that contains illegal moves is surely questionable.

If the final genom has it on, I will probably have to spend some additional asserts in the framework.

How are others approaching that ?

Followup: 

I performed a test of the final genom with and without null moving in PV and it made almost no difference. Therefore the evolution was not able to converge that. The one that used null move in PV nodes performed 4 ELO better but well within the error bar. So for the moment I will disable it.

Sunday, April 28, 2013

64 bits for 64 squares ?

iCE is a bitboard engine, which means it sees the board not as a 8x8 array but as a set of 64 bit integers (bitboards) representing the piece locations. So far I only compiled 32 bit code because I was missing a 64 bit OS. This forced the compiler to emulate 64 bit operations by splitting them into 32 bit ones.

I'm now owning a powerful 64 bit monster but I did not compile a 64 bit target yet. iCE is using some inline assembler to speed up some time critical bit operations and this is not portable to 64 bit out of the box. This code must be rewritten and I haven't done this yet.

But as tuning my engine is so CPU intensive I really liked to speed up the engine a bit so I can run the same amount of games in less time.  So I rewrote those parts lately.

In order to measure the outcome I produced different 64 bit targets using the Intel and the Microsoft compiler to find the fastest combination. The speed is measured solving a set of 25 positions searched 12 ply deep.

pgo - Profile Guided Optimization is used
popcnt - The popcount processor instruction available on the Nehalem CPUs is used (instead of a software algorithm)


Looks like it is really worth it. The fastest build is about 75% faster than my 32 bit build.



Sunday, April 21, 2013

Me vs evolution: 0 - 2

I'm fighting a nasty cold for a week now so I tried to relax a bit over the weekend meaning no sports and no programming but at least my computer can do some calculations. 

The evolved weight set from the last PBIL run was much stronger than my hand selected set, but some of the weights looked suspicious.

The penalty for a double pawn was almost 0, even for a triple pawn very low, a weight related to the pawn shelter was even 0. So I thought I manually correct those values. Just a little bit.

I created a modified set, called ice.tom and let it play another 6000 games against evol-2. But I was not able to improve the set this way. The one coming out from evolution seemed stronger.

Considering the error bars still a tiny chance exists my set is stronger, but I doubt it.  

Rank Name         Elo    +    - games score oppo. draws
   1 ice.evol-2    89    4    4 18000   55%    51   36%
   2 ice.tom       84    7    7  6000   49%    89   43%
   3 ice.evol-1    69    5    5 12000   54%    44   33%
   4 ice.04         0    5    5 12000   39%    79   28%


For the moment I'm convinced that PBIL is better in tuning than myself.

Saturday, April 13, 2013

Me vs. Evolution: 0 - 1

Now the results are in from some serious genetic evolutions. Here I used knockout tournaments between the individuals within one generation as fitness function.

This is fundamentally different than my former approach were I measured position solving performance
  • It is computationally much more expensive
  • It is not reproducible. If the best individual is determined in a generation chances are very high a different individual would be selected if the calculation would be repeated. This is caused by the high degree of randomness involved when playing small amount of games.  
  • There is no guarantee that the real best individual is winning a generation tournament. But chances are that at least a good one wins. 
As the computational effort is so big I only performed two runs so far. In one I played a fixed amount of games each generation. In the other I scaled the number of games up with each generation. As the algorithm converges the generated individuals are closer to each other so it might be a good idea to play more games to have a better chance to still spot the strongest one among them.

I call them evol-1 and evol-2.

                     Evol-1      Evol-2
Runtime in hrs          185         363
Generations           1.100       1.200
Total Games         184.800     362.345
EPDs solved*          2.646       2.652

*The number of solved positions out of a test set of 4.109 positions. This number is given to set this GA in relation to the previous test where the number of solved positions was the fitness criteria. Both solutions are better than the un-tuned version that scored only 2.437 points but worse than the version optimized towards solving this set that scored 2.811 points.


Entropy development of evol-1 and evol-2

Comparison of the ability to find correct positions in an EPD file
So from this data it looks like both versions performed very similar. The fact that version 2 played much more games does not really show up. One explanation could be that the error bar for decisions in both versions is huge, a bit smaller for evol-2 but still huge. So twice as much games doesn't already start to make a big difference.

And finally the real test: A direct round robin comparison between the two and the base version.

And the OSCAR goes to: evol-2

Rank Name         Elo    +    - games score oppo. draws
   1 ice.evol-2    89    5    5 12000   58%    35   32%
   2 ice.evol-1    69    5    5 12000   54%    44   33%
   3 ice.04         0    5    5 12000   39%    79   28%


Looks like all the effort finally paid off, considering also the fact that the base version is also not really a weak engine. It is a bit stronger than the offical iCE 0.3 which is rated 2475 ELO in the CCRL. 

Next I maybe tweak manually some of the weights from the final set because some look suspicious. I wonder whether I'm able to score half a point back against the evolution ...

Saturday, April 6, 2013

How fit is the fitness function

So far I tuned my engine by solving EPD positions. The final versions all solved more positions than the original one. But solving more positions is not the ultimate goal. Finally the engine shall play better chess.

To test that I ran a small tournament between four different versions of iCE.   

ice.04 is the base version that uses manual picked weights (reasonable ones but not tuned)
ice.4109 used a training set of 4109 filtered positions  
ice.175000 used a training set of 175000 filtered positions
ice.match plays games among the engines within a generation


Here are the results of a 6000 games round robin tournament.

Rank Name         Elo    +    - games score oppo. draws
   1 ice.match     52   11   11  3000   60%   -17   29%
   2 ice.04        14   11   11  3000   53%    -5   25%
   3 ice.4109      -9   11   11  3000   48%     3   27%
   4 ice.175000   -57   11   11  3000   39%    19   24%


Hmm !

Using an EPD solver as fitness function seems not the best choice. That is not a problem of the genetic algorithm itself. It solved what it was told to solve.

The final version of ice.4109 solved 2811 positions out of 4109. The base version of ice.04 only solved 2437 positions. The algorithm definitely found a better solution, unfortunately not for the problem I have.

On the other hand the more complex fitness function that plays games really seems to get the job done, even with a quick and dirty run that was only meant as reference in this case.

So either my approach with EPDs is not good, maybe a 2 ply search per position is not enough (in real games depending on time control iCE searches between 8 and 18 plies in the mid game) or the correlation between single position performance and game play is not strong enough.

It also shows that using more positions doesn't help. This is really surprising.

I still think the EPD approach has potential if it is maybe changed a bit. Things that come into mind here are
  • Using a higher search depth to solve each position in the hope to pickup more positional concepts in the then larger tree. But this slows down the algorithm so maybe the population size must be lowered then.   
  • Modify the fitness function. Currently I award 1 point for a position solved and 0 points otherwise. I could try to award more points for a quiet solution because they are harder to find.
  • Splitting the training data into different sets (instead of 1 file with 100k positions make it 50 files with 2k positions) and  give each generation a different file. This might prevent the algorithm to converge towards a very specific weight set to solve one file and leads to a more general solution instead, which is stronger in game play.
So this area is far from being fully exploited. For the moment I will put more focus on the tournament fitness to generate a stronger base version with a serious run. I will then add new terms to my evaluation which requires retuning anyway.

So more to come ...