Saturday, December 22, 2012

Getting used to some C++ techniques

In the new release of my engine I plan to rewrite its board evaluation (again). This is necessary because the old evaluation uses weights declared as compile time constants.

e.g. const int BISHOP_PAIR_BONUS = 50;

This is easy to maintain and also very fast because the compiler just inserts the value in the evaluation code and no memory access is later required to find out what the bonus for having two bishops is. The problem starts when you want to change that value. It requires a recompilation of the whole program. As I want to be able to change that value at run-time, maybe even while the engine is running using constant values is not an option anymore. I have to replace them with variables even if it might slowdown execution a bit, probably not much because those values are accessed so frequently that they will stay in the L1 cache of the CPU.

But it means that I have to touch everything in my board evaluation code and if I have to touch everything anyway I can just as well clean it up a bit.

One thing that bothered me a bit is code duplication. Every feature is evaluated first for white and then again for black. The implementation of the features is not identical, e.g. white pawns move up black pawns move down, white king is safe at G1, black king is safe at G8. But the source code is very similar anyway and differs only in a few parameters. It also makes the code harder to maintain because every time I change something I must remember to change it in the WHITE and in the BLACK part of the code.

Fortunately C++ offers a language construct called function template. It's main purpose is to uses the same code to work with different types of variables e.g. a sort function that sorts integers uses the same logic as a function that sorts floats, it uses just a different declaration. C++ templates allow the programmer now to write the source code only once and the compiler will create the different code for ints and floats automatically.

This is awesome and just what I need.

While rewriting the evaluation I extract those parts where templates might be helpful and clean up my code.

Instead of a function

void TEvaluator::evaluateThreats()
     // do something for white threats against black pieces

    // now do the same for black threats against white pieces

I have now a template function

template< EColor side > void TEvaluator::evaluateThreats()
    const EColor enemySide = (side == WHITE ? BLACK : WHITE);

    // do something for threats of "side" against the pieces of the "enemy side"

which is then called via

evaluateThreats< WHITE >();
evaluateThreats< BLACK >();

This works so well that I might rewrite other parts of the engine as well at some point in the future. The move generator currently duplicates code like hell and could definitely benefit from this technique a lot.

Saturday, October 6, 2012

A simultaneous over the board experience

I usually don't play over the board. I find it interesting but I don't have the time for that. I never played in clubs or tournaments. My chess efforts are rather programming related. But today I was visiting a local gaming fair and part of it was IM Cliff Wichmann playing 16 boards simultaneous chess. My wife convinced me to play one of the boards. I did and no surprise I lost but not as badly as I thought.

I played black and had a rather bad opening. White played a Queens Gambit and I actually don't know much theory of it. My chess engine plays openings from its opening database so I did not put much efforts in learning all the variations while programming it. So my little knight was chased around and White had the control of the center after 1. d4 d5 2. c4 Nf6 3. cd Nxd5 4. e4

However I survived the opening without plundering away any material and had a rather strong middle game. Our play led later to this position.

 IM Cliff Wichmann vs. Thomas Petzke

  White to Move

So I had a knight and rook vs. the bishop pair which I considered Ok. I analyzed the position at home with iCE and also stockfish. Both agree that black has a slight advantage of a bit more than a pawn here. But due to my lack of routine I wasn't able to hold that advantage. I had played already 90 minutes and my board was one of only 3 boards that were still played in the tournament. So my thinking time dropped because with only 3 boards left my opponent appeared very rapidly after his move at my board again. 

He stormed with his h and g pawn, I was later forced to trade my rook against a bishop and a pawn and he finally promoted 1 move before I could where I resigned. 

It was quite an interesting experience anyway.

Thursday, September 6, 2012

Mastering the King a Rook vs King and Pawn endgame

In iCE 0.3 I added a lot of endgame knowledge especially for drawish pawn less endgames with material in-balance (like KRNKR). However a few fundamental ones are still missing. One of them is King and Rook vs. King and Pawn which is really tricky to evaluate nevertheless very important.

Consider the following position with white to move

1. Rc8 is very tempting but only draws, after

1. ... Ra7 2. Ke8 Rxd7 3. Kxd7 we reach the following KRKP endgame, which is a draw

iCE did not know that. In fact iCE selected the move 1. Rd1+, but this is also only a draw after 1. ... Ke4. So while the endgame is not even on the board the lack of knowledge let play iCE the wrong move. I run a search for several hours, iCE was still happy with 1. Rd1+.  And of course it makes no sense to optimize towards a single position, the underlying knowledge problem must be solved.

There are a lot of winning positions for the pawn side if the pawn is already on the 6th or 7th rank. Here the rook is not able to stop the pawn from promoting and where the pawn side then wins the Queen vs Rook endgame.  But sometimes the pawn safely promotes but the rook side  is able to force a stalemate or give permanent checks. So finding out the pawn will promote is not enough for classifying it as a win.

If the pawn is not advanced the rook side might not be able to stop it without sacrificing the rook which leads to a draw.

In general one can state for this endgame
  • Usually the rook side is able to stop the pawn
  • If the attacker king is in front of the pawn the attacker will win
  • If the pawn gets support from its own king and the attacker king is away and behind the pawn the pawn side might be able to draw.
  • If the pawn is already on the 7th and sometimes on the 6th (with the pawn side to move) it might be able to promote and win 
  • There are hard to spot exceptions for both sides.
Those exceptions make it very hard to assemble a rule set to classify a position correctly. After several tries I decided on the following implementation for my KRKP module.

First I implemented a method to detect the wins for the pawn side. This method detects the trivial cases where the rook is immediately captured and the resulting KPK endgame is won, but also the non trivial ones where the pawn is able to promote and the queen survives (without stalemating the other king). The detector must handle the 3 non trivial cases
  • the pawn is on the 7th and the pawn side moves
  • the pawn is on the 7th and the rook side moves
  • the pawn is on the 6th and the pawn side moves
In all other non trivial positions the pawn side will not win. Most won positions obviously are found in the first case. I came up with a very complex rule that detects 99% of the wins in that situation. To detect the remaining 1% (about 1300 positions) I decided to just store them explicitly.

Only about 6000 positions make up the 2nd case. I tried hard to assemble some rules for that but at the end I gave up and also store them explicitly. The 3rd case is easy then. To win the pawn must move and we can look whether we have stored the resulting position for case number 2.

So this method detects 100% the pawn wins with a small internal database and a few complex rules.

Next I require a 2nd method that detects the drawn positions which is called only if we see that the position is not a win for the pawn obviously.

Theory states to count the tempi for the pawn side to promote with king support and the rook side to control the promotion square with both rook and king. If the pawn side requires less tempi than the rook side it is a draw. This is maybe a good rule of thumb for a human but it is not a correct rule for a computer. It will announce a lot of false draws. So I tried another approach.

Whether a position is drawn depends mainly on the pawn and king positions and less on the position of the rook. I assembled a small internal database of about 2600 records (each 2 byte) that contain those king and pawn positions that are drawn no matter where the rook stands or whose side to move it is. With those 2600 records I'm able to detect 700k out of 1.2M drawn positions.

I decided to stop here, this is good enough.

So after all that efforts I wanted to see whether it helped iCE to find a better move in the above position and voila it actually did. It sticks to the winning move realizing that the alternatives will draw. It is even able to announce the correct distance to mate.

depth move score  nodes
move score  nodes
1 d8=Q 379 56
d8=Q 277 72
2 d8=Q 379 275
d8=Q 277 234
3 d8=Q 379 386
d8=Q 267 336
4 d8=Q 367 1648
d8=Q 267 1220
5 d8=Q 344 2367
d8=Q 277 1993
6 d8=Q 356 6264
d8=Q 307 11897
7 Rc2 356 22k
d8=Q 317 20k
8 Ra1 341 123k
d8=Q 327 50k
9 Ra1 341 142k
d8=Q 327 57k
10 Rc8 356 5948k
d8=Q 327 285k
11 Rd1+ 341 2337k
d8=Q 367 363k
12 Rd1+ 341 805k
d8=Q 317 9M
13 Rd1+ 341 20M
d8=Q 317 14M
14 Rd1+ 336 8M d8=Q 367 3M
15 Rd1+ 341 13M d8=Q 417 8M
16 Rd1+ 338 54M d8=Q Mate 25 34M
17 Rd1+ 341 94M d8=Q Mate 24 73M

I like it when something works !

    Sunday, August 12, 2012

    mACE GUI Update

    I decided to publish a small update to the mACE GUI that can be used to play against my iCE engine. Unlike iCE it is written in Free Pascal, so I had a bit of a hard time to switch back to Pascal coding syntax ( := instead of =, = instead of ==, no ; before an else statement and this kind of stuff).

    The mACE GUI allowed 3 strength settings (low, medium and high) which allowed the engine 3, 5 and 10 minutes thinking time for 40 moves. It turned out that this is to strong, even in low it was unbeatable for an average amateur player. I decided to weaken it further by putting search limits to its search. I call that skill level and I implemented the skill level 0 - 10.

    The skill level specifies the max main search depth for the engine, so in skill level 5 the engine searches 5 ply deep. The engine performs still a quiescence search (it plays out all winning captures when the max search depth is reached, so it does not hang a piece there) and uses some extensions (like the check extension).

    I was surprised that even in skill level 0, the engine plays quiet reasonable and it takes some effort to beat it. So to have some real weak levels I also introduced some randomness. In skill level 0 and 1 there is a 10% - 20% chance that the GUI discards the move sent from the engine and picks a random move from the legal move list. This now gives me a real good chance to beat it and hopefully increases the fun for others as well.

    Monday, July 30, 2012

    iCE 0.2 last and final fight

    As I published now the new iCE in version 0.3 there will probably no more matches where iCE 0.2 participates in. Very likely the Division 5 of WBEC Ridderkerk 19 was it's final fight. Here almost 100 engines played, first in 4 groups and the best 5 in each group went into a playoff. The best 7 in the playoff qualify for Division 4 then.

    Little iCE did excellent and finished the playoff at number 2. So it would have earned iCE a spot in Division 4, which is not played anymore. Leo Dijksman has unfortunately decided to stop its tournaments as he lost his interest because of the many engine clones that appear. Very sad!

    Here is the final cross table

    WBEC Ridderkerk, 5th division FINAL.

    AMD-PHENOM-3100, 2012.07.04 - 2012.07.21
                                   Score     Di iC At If Be TJ Me Sj Ev Ay 
     1: DiscoCheck 3.61-x64      29.5 / 36   XX 01 10 11 11 10 10 01 01 11 
     2: iCE 0.2-b1092            26.5 / 36   10 XX =0 0= 10 10 11 =1 01 =1 
     3: Atlas 3.20-x64           25.5 / 36   01 =1 XX 10 01 01 =1 11 =1 11 
     4: Ifrit m1.8-x64-JA        25.0 / 36   00 1= 01 XX 00 == 1= =1 1= 11 
     5: Bearded Neural 44.5-x64  24.5 / 36   00 01 10 11 XX 0= 10 00 11 11 
     6: TJchess 1.1-x64          23.5 / 36   01 01 10 == 1= XX 01 =0 =1 01 
     7: Mediocre 0.4-JA          22.0 / 36   01 00 =0 0= 01 10 XX =0 0= 10 
     8: Sjakk 1.1.9              21.5 / 36   10 =0 00 =0 11 =1 =1 XX 01 00 
     9: EveAnn 1.67-b11          20.5 / 36   10 10 =0 0= 00 =0 1= 10 XX 01 
    10: Ayito 0.2.994            17.5 / 36   00 =0 00 00 00 10 01 11 10 XX 

    Thursday, July 12, 2012

    My Chess Engine iCE 0.3 is out

    Quite some time has passed since I released the last version of my little chess engine project. So I'm happy to announce that finally iCE 0.3 is seeing the light of the day. It's available from my homepage.

    In the development of iCE 0.3 a lot of ideas have been tried, most of them failed, some worked. I introduced new functionality and added tons of endgame chess knowledge. My initial tests indicate some ELO gain compared with iCE 0.2. But it is to early to tell how big it is. I'll know when it has played its first official 200 tournament games.

    Major changes in iCE 0.3
    • some new evaluation terms (e.g. pawns islands)
    • Understands the draw by fifty move rule and tries to avoid it when leading
    • Improved endgame knowledge. Better understanding of drawn positions especially if one side is ahead in material like king and 2 minors vs king and 1 minor or king and rook vs king and 2 minors.
    • Implements a small internal opening book if no external book is supplied.
    • Support for external opening books in a proprietary format.
    • Changed node counting rule (horizon nodes not counted twice anymore)
    • Algorithm improvements to speed up the code.
    • Draw by Repetition detection bug fix (was not really working)
    • New smarter time management, varies time for move depending on position and search progress. Leaves a safety buffer on the last move before time control to avoid time losses. Engine is now able to play very fast games with less than 100 ms per move without losing on time anymore. 
    • Implements a special move generator for "getting out of check" moves. Got some speedup.
    The engine is still only available as 32 bit version, as I don't own a 64 bit system yet. Most of its evaluation weights are not tuned at all, because I don't have the computing resources available to tune them. I used my intuition to pick a hopefully not so bad number. So tuning is definitely on the todo list for version 0.4 but this will required also some significant changes to the engine to make it tunable.

    Friday, June 29, 2012

    The fifty-move rule

    Most of my recent attempts to improve my engine failed, any possible improvement stayed within the error bar, so no breakthrough yet.

    As I'm running out of ideas for the moment to improve iCE without any major effort (e.g. major changes to eval and tuning the evaluation weights) I think I will release the current version of the code soon. It has at least functional enhancements compared with iCE 0.2 like using an opening book or better endgame knowledge.

    One of the missing functional features so far was the implementation of the DRAW by the 50 move rule. It is no big deal to implement but as I did not see any major benefit for playing strength in it I just thought I will implement it later when the time is right.

    It is right now and iCE does now know it.

    To demonstrate it consider the following position: 8/5k2/8/8/1R6/R6K/7P/8 w - - 96 200

    48 moves have been played without pawn move or capture. The old version of iCE here announces a Mate in 3, trying to mate with the rooks right away. This would fail because the game would be adjudicated as draw by the 50 move rule before Black is Mated.

    The new version of iCE is now seeing the way out.

    info depth 14 seldepth 24 time 13797 nodes 38389600 pv h3g2 f7e6 h2h3 e6e5 a3a5 e5d6 b4b6 d6c7 b6h6 c7b7 a5g5 b7c7 g5g7 c7d8 h6h8 nps 2782459 score mate 8

    So it is first moving the king and the pawn and mating with the rooks later.

    So one more source of possible embarrassment removed.

    Sunday, May 6, 2012

    Getting stronger is really tough

    All my ideas lately to improve the playing strength of my chess engine did not work out. Sometimes it looked promising but something that seemed like a small progress vanished if more games were played.

    I have not the resources to run hundreds of thousand of games. Playing 1000 games is already quite a commitment and the error bar after 1000 games is still very big. So my latest version v2589 shows something like a small improvement but actually it can even be worse than the last one in the list. They are all very close together and I kept the code changes not for a proven ELO gain but because they fixed some small bug or simplified the code a bit.  

    Rank Name                 Elo    +    - games score oppo. draws
       1 gaviota v0.85.1      199   35   35   368   78%   -20   17%
       2 cheng3 1.07 JA        52   14   14  1860   60%   -21   17%
       3 Daydreamer 1.75 JA    30   12   12  2358   57%   -19   29%
       4 iCE 0.3 v2589        -20   16   16  1295   52%   -31   37%
       5 iCE 0.3 v2459        -24   20   20   766   50%   -25   36%
       6 iCE 0.3 v2441        -25    9    9  3768   47%    -1   33%
       7 iCE 0.3 v2489        -25   18   18  1014   50%   -25   32%
       8 iCE 0.3 v2457        -25    8    8  5160   50%   -22   33%
       9 iCE 0.3 v2559        -27   10   10  3493   49%   -17   31%
      10 iCE 0.3 v2574        -31    9    9  4162   48%   -16   33%
      11 iCE 0.3 v2413        -34   12   12  2161   49%   -28   33%
      12 iCE 0.3 v2571        -35   11   11  2768   48%   -18   32%
      13 iCE 0.3 v2437        -35    9    9  4249   47%   -15   31%

    So I'm still looking for something that adds some ELO. But it seems most of the low hanging fruits have been picked.

    Thursday, April 12, 2012

    Evaluation Output

    The latest changes to iCE did not gain any ELO or made things worse so I threw them away. To see a bit of progress I decided to spend some time into something that does not affect playing strength so I can keep it for sure. I had a reformatting of the evaluation output on my todo list for some time already and I now decided to get it done. The eval output of the stockfish engine looks really nice and helpful so I decided to mirror that format just using the evaluation terms of iCE instead of the stockfish ones of course.

    So it now looks like this and I hope it will me help to fine tune eval in the later development of iCE.

    position fen r2q1rk1/pp2np2/5P2/3bp2Q/3pN2b/1B1P4/PPP3PP/R4RK1 w
    Total eval of the position : -52

                Eval term |    White    |    Black    |     Total
                          |   MG    EG  |   MG    EG  |   MG     EG
          Static Material |   ---   --- |   ---   --- |   141    141
         Dynamic Material |    36    36 |     2     2 |    38     38
                 Mobility |   -73   -73 |    66    66 |    -7     -7
       Pattern Recognizer |     0     0 |     0     0 |     0      0
            King Pressure |   -20   -20 |     1     1 |   -19    -19
       Static Pawn Struct |   ---   --- |   ---   --- |     6      0
     Dynamic. Pawn Struct |   -33   -38 |    -4    -4 |   -37    -42
                   Pieces |   -10   -10 |    -8     0 |   -18    -10
                  Threats |     0     0 |   -52   -87 |   -52    -87
                    Total |   ---   --- |   ---   --- |    52     14

    Scaling : 100% MG (52)  0% EG (0)  = 52
    NegaMax Adjustment for Side To Move (White): -52

    In order to be able to gather the sub terms of eval I had to rewrite parts of it and a minor functionality change related to the sub scores rounding (which is not done anymore) was unavoidable. But tests show it does not seem to hurt. In theory it should even help.

    But it now shows also a weakness of iCE. In this position Black is in big trouble but static eval does not indicated that. The king pressure evaluation definitely needs some adjustment. iCE finds its way through anyway once it reaches that position (... but it might never get so far)

    info depth 11 seldepth 26 time 2734 nodes 7865064 pv h5h6 e7f5 f1f5 h4f6 e4f6 d8f6 h6f6 f8d8 f5h5 d5b3 h5h8  nps 2876760 score mate 6 hashfull 133 tbhits 0
    bestmove h5h6 ponder e7f5

    Saturday, March 17, 2012

    Murphys law

    As I now do some code fine tuning in iCE I run long test series to see that the changes does not make the engine weaker. I started to see iCE losing some games because of illegal moves.

    I was absolutely dumbfounded by that. The move generator of iCE passes all perft test, I use a 64 bit hash signature to make undetected collisions very unlikely and as an additional safety belt I verify each move from the hash again for legality. So illegal moves should just be not possible.

    After some investigation I found that is was always the same illegal move in the same position and it actually came from the new opening book of iCE. So when I assembled the book I generated an illegal move and put it in the book.

    In this position the move to play was specified as 5. bxa6. The iCE book creator translated that move into Pawn from b2 to a6. So when it looked what pawn actually captures on a6 it picked the wrong one. The reason was an uninitialized bitboard and this was a quick fix.

    But it still did not explain why iCE actually used that move and did not detect it as illegal. After debugging I found that the move validator had also a small bug (a bracket was closed at the wrong spot) that just effected white pawn captures and so the move went ok through validation.

    I fixed that bug too and corrected the books but it showed "whenever something can go wrong it will".

    Sunday, March 4, 2012

    Another reference match

    In my efforts to establish a performance baseline I ran another reference match against a strong engine. This time I picked gaviota and decided to run a few more games at a slightly longer time control to get a more accurate result.

    600 games at a TC of 100 moves in 10 sec + 0.5 sec increment per move, ponder off, own book

    Score of gaviota v0.85 vs iCE 0.3 v2394 : 392 - 111 - 97  [0.73] 600
    ELO difference: 176

    which seems about petty correct. So this is the level I want to improve my engine from.

    Saturday, March 3, 2012

    Windows TimeResolution Trouble

    In recent tournaments iCE 0.2 sometimes lost on time which is a very bad thing. Its time control is implemented aggressively so it is using all available time on the last move before the time control but it should not overstep it from the way it was implemented, but it did. As I wanted to improve the time allocation anyway I introduced a soft and a hard time limit. Under certain conditions the engine might continue to search beyond the soft but never beyond the hard limit. The hard limit is verified to always be less than the available remaining time. On the last move before the time control a small safety buffer is used in addition. So I was pretty sure time loss are now a thing of the past.

    I was wrong. In a little private reference tournament at very fast time controls (1 sec + 0.1 sec increment per move) iCE was losing almost every game on time. After debugging the GUI and the Engine logs it showed that the engine was assuming to spend less time on a move as the GUI did. So the engine thought it has calculated for 90 ms where the GUI recorded 120 ms for a move. So the initial buffer was getting smaller and smaller with each move until there was almost no buffer anymore and the engine overstepped it.

    I investigated the issue a bit and found 2 root causes for this behavior
    • it seems that the Windows XP time resolution has a default quantum of about 10 ms, so the time related functions have a smallest resolution of 10 ms. So for 9 ms the engine thinks it did spend no time at all and 1 ms later it spent already 10.
    • the c++ Sleep(int wait) function is only specifying the minimum time the OS scheduler does not schedule the thread that called Sleep. The actual time the thread is suspended can be much greater
    If the engine is idle iCE is sending the worker thread to sleep until the IO thread is signaling new input. Otherwise and idle engine would consume a whole CPU core just by looping over a hasNewInput flag. The Sleep interval is actually pretty small, I was using 10 ms but as this is not reliable the GUI might have sent a search command and started the GUI clock while the engine was still sleeping for some ms. When the engine finally woke up realized the new command and started its clock the GUI clock was already ticking for some ms.

    Those 2 facts introduced a significant error margin in short TC games. I investigated some alternatives to either lower the time resolution of the OS (there is an API call for that) or eliminate the Sleep call in the main thread but all had its Cons. At the end I decided to introduce a TIME_RESOLUTION_ERROR_MARGIN of 25 ms. All time limits assume that the engine in reality is using 25 ms more than it thinks and the internal limits are adjusted for that. This works pretty well and eliminates almost all time losses even in very fast TC games. The downside is that the engine is sometimes using some ms less time than it could. But nothing is perfect.

    All this is only relevant in very short TC games anyway. In real tournaments the engine has much more time and 25 ms don't make a difference at all. But for engine tuning and change evaluation a lot of games are required and to finish 1000+ games in a reasonable amount of time you have to limit the time per move very much.

    With the new TC I ran a quick match against a pair of engines to establish a baseline which I measure engine changes against. iCE was actually doing not so bad, but I think this is TC related. At longer TCs the other engines would probably be stronger. For instance I wasn't able to include the engine Aristarch 4.50 at all because it is very weak at short TCs. It was mated in 9 out of 10 games by iCE (no time losses but real mates). At longer TCs Aristarch seems much stronger.

    Those are the results of iCE against 5 engines with 200 games each at a TC of 100 moves in 3 sec + 0.3 sec per move. All engines 32 bits and 1 core, PonderOff, own book where existing.

    Rank Name                        ELO   Games   Score   Draws
       1 Quazar 0.4 w32              282     200     84%     16%
       2 cheng3 1.07 JA               96     200     64%     12%
       3 iCE 0.3 v2394                19    1000     53%     17%
       4 Abrok 5.0                   -92     200     37%     24%
       5 Ufim 8.02                  -177     200     26%     18%
       6 Eeyore 1.52 UCI            -186     200     26%     15%
    Finished match

    Congratulations to Martin Sedlak. It's cheng engine is really very strong and greatly improved in Version 1.07.

    Saturday, February 4, 2012

    Speeding up the popcount stuff

    A recent profile of my little engine revealed that one of the biggest single hotspots in the entire engine is the "popcount" instruction . This instruction returns the number of set "1" bits in a 64 bit integer. Especially the position evaluation is using this instruction a lot. Questions like
    • How many pawns are on a file (double or triple pawn detection)
    • How many squares can the queen attack next move
    • How mobile are the bishops
    • How many pawns are protecting the king in its king shelter
    • How many squares has the knight to move that are not attacked by pawns of the enemy
    are all finally answered by counting the bits in a 64 bit integer. As iCE evaluates about 1 million positions per second this instruction is called many many million times. So even a tiny speed improvement can make a big difference.

    Intel introduces with its Nehalem architecture in i7 cores a popcount processor instruction but the new CPUs are not so wide spread yet and I don't own one either so for the immediate future popcount still has to be calculated in software. So far I used a stable algorithm that uses shift and multiply operations to do that. This algorithm is here called C-shift (as it is my old shifting implementation in C)

    int popCount_C_shift(int64 B)
            B = B - ((B >> 1) & 0x5555555555555555ULL);
            B = (B & 0x3333333333333333ULL) +
                ((B >> 2) & 0x3333333333333333ULL);
            B = (B + (B >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
            return (B * 0x0101010101010101ull) >> 56;

    A new approach implements a loop that deletes 1 bit at at time and counts how often it loops. This is very fast when only a few bits are set (C-sparse)

    int popCount_C_sparse(int64 b)
       int count = 0;
       while (b != 0)
           b &= b - 1; // reset LS1B
       return count;

    And as 3rd alternative I decided to implement the above C algorithm directly in Assembler using 32 bit instructions to maybe reduce some overhead introduced by the compiler.

                xor ecx,ecx
                mov edx, dword ptr [b]
                and edx, edx
                jz L1
            L0: inc ecx   
                lea eax, [edx-1]
                and edx, eax
                jnz L0
            L1: mov edx, dword ptr [b + 04h]
                and edx, edx
                jz L3

            L2: inc ecx   
                lea eax, [edx-1]
                and edx, eax
                jnz L2

            L3: mov eax, ecx

    I compiled the code using the Microsoft C compiler, the Intel C Compiler and the gnu C compiler and let each of the algorithms count the 1 bits in a predefined large set of random numbers with an increasing population count and measured the execution time.

    Here are the results

    The old C shift based implementation is stable, it runs always the same time independent of the number of set bits. Both looping algorithms outperform it with lesser bits. The break even is with about 7 bits. The MSVC compiler code was slightly faster as my handcrafted assembler.

    A similar picture is achieved using the Intel compiler

    The shift C code is stable and runs slightly faster that the same algorithm compiled with MSVC. The looping algorithms outperform it but are a bit slower than the MSVC code. No profile guided optimization has been used (PGO). Maye that would level the execution speed compared with MSVC but was not tested.

    Finally I compiled the source with the gnu C compiler using the gnu C builtin popcount instruction instead of my assembler code as the gnu C compiler does not understand inline assembler in Intel Syntax.

    It shows that the native gcc popcount instruction is probably implemented as a shifting algorithm and surprisingly it is the worst performing of all 3. The looping algorithm gets a performance hit between 2 and 3 bits set but it still outperforms the other 2 until 6 bits are set.

    So I took the architectural decision to drop my handcrafted assembler code and introduce the C version of the looping algorithm into my engine. I replace all popcount calls where the number of set bits is expected to be smaller than 7 (like number of pawns on the 7th rank) with the new popcount code and keep the old one for the remaining calls (e.g. number of squares a queen can attack).

    I got a nice speedup of more than 10% in execution speed, which was well worth the exercise.

    Sunday, January 22, 2012

    Fixing the repetition detection hash table

    Currently I try to speed up iCE a bit and investigate where iCE spends its time, yeah I'm profiling it a bit. One of the functions that used more than it should was the calculateRepetitions() function.

    This function is used to determine whether this position was encountered before so a 3fold repetition is likely. The function has to parts, whenever a new position is entered it increments for this position (zobrist key & size_of_hash -1) a counter in a small hash table by 1 and then checks the value of the counter. If it is now 1 this means it was 0 before and we have never seen this position before. We can be sure this position is not repeated and exit. If the counter is 2 or more we might have a repetition or a hash key collision (we don't know). We now scan through the list of previous positions up to the latest non reversible move. This takes a bit longer.

    When we unmake a move we decrement the counter in the hash table. This means the hash table is always almost empty. It only takes 1 slot for every real move on the board + 1 slot for every ply of search.

    To see why this function takes so long despite its hash early exit I measured how many early exits I take and how often I scan through the list of keys after all. To my surprise I only had an early exit rate of at most 20% and in longer searches it went down to 10%.

    How could that be ?

    I looked at the content of the table and it contained negative values, it was full of them. This should not have happened. It is of course not possible to encounter a position less than never.

    It turned out that I introduced an inconsistency when I added Null Move. I did not increment the hash slot when doing a Null Move but I decremented it when I undid the Null Move. So after a while most of the slots contained negative values and provided no value anymore.

    After fixing that my early exit rate went up to 99,95% and calculateRepetitions() went almost into non existence in my function profile list.

    Saturday, January 21, 2012

    iCE is a winner in the local newspaper chess study challenge

    Our local newspaper organizes once a year a local chess challenge with 1 chess study a week (5 in total) of increasing difficulty. Me and my engine were taking part this year and with a bit of luck we were the lucky winners of the 3rd challenge, a really nice Mate in 4 problem.

    Mate in 4, Study by Frank Fiedler

    White must take care not to stalemate the black king, so the rook must be blocked in its mobility for a moment. The correct solution is 1. Rh4 !

    iCE 0.3 v2058 by Thomas Petzke
    position fen 2b5/1p1p4/1P1P4/ppBP1B2/k2p1R2/N2P4/K1P1P3/8 w - - 0 1
    go depth 10
    info depth 1 seldepth 7 time 0 nodes 72 pv c5d4  nps 71999 score cp 1526
    info depth 2 seldepth 8 time 0 nodes 186 pv c5d4 b5b4  nps 185999 score cp 1510
    info depth 3 seldepth 10 time 0 nodes 813 pv c5d4 b5b4 nps 812999 score cp 1518
    info depth 4 seldepth 12 time 16 nodes 1894 pv c5d4 b5b4 nps 118375 score cp 1507 

    info depth 5 seldepth 18 time 15 nodes 6381 pv c5d4 b5b4  nps 425400 score cp 1531 
    info depth 6 seldepth 20 time 63 nodes 13244 pv c5d4 b5b4 nps 210222 score cp 1629 
    info depth 7 seldepth 28 time 94 nodes 53632 pv c2c4 b5b4 nps 570553 score cp 1660 
    info depth 8 seldepth 41 time 156 nodes 289838 pv f5g4 b5b4 nps 1857935 score mate 5 
    info depth 9 seldepth 9 time 125 nodes 375331 pv f4h4 b5b4 f5g4 b4a3 c2c4 d4c3 g4d7  nps 3002648 score mate 4
    bestmove f4h4 ponder b5b4

    So as iCE really needs a hardware upgrade, I'm running and developing it on a 6 year old desktop PC which must be replaced urgently the prize money of 30€ might earn it an additional GB of RAM in the new system.

    Sunday, January 15, 2012

    New compiler and an old bug

    I just installed gcc 4.6.2 and gave it a try on my sources to see whether it performs better than old gcc 4.5.2 I'm currently using. The sources compiled without any problems (I did not include the Gaviota TBS support however) and to my disappointment it did not gave any speed improvement. It was also somewhat slower than the MS Visual C++ compiler, so probably slower than the Intel ICC as well.

    However I noticed in one of the test positions a tiny node difference at higher plys between the MSVC compiled code and the code from gcc 4.6.2. This is always a bad sign because it indicates a bug.

    So I did some research to spot the exact node where both editions start behaving differently and it turned out to be a problem in the KBKP evaluator. In one version the position was announced as DRAW and in the other not so from here on both versions produced different results.

    So after I knew where to look for the problem, it was rather easy to find. I mixed a square value with a bitboard in a comparison operation and the result of that was random. So I fixed it and the search of gcc and the MSVC was identical again, but unfortunately identically wrong.

    The old bug disguised a logical problem in the evaluator that became visible now, so I had to recode the KBKP module and I had to do a complete verification cycle against a table base again which took several hours. But now it is fixed and I improved the performance of the module significantly. I does a much better job in spotting the draws than it did before. It is now spotting 90% of all draws using a few known patterns

    KBKP Recognizer Performance
    Incorrect Evals :         0
    Spotted   Draws : 6.355.575
    Missed    Draws :   607.948
    Spotted   Wins  : 1.248.570

    Saturday, January 7, 2012

    The Lucena position in a King Rook and Pawn vs King and Rook endgame

    After spotting the draws in the Philidor position I now wanted to enable to spot the Wins of the Lucena position. The Lucena position has the attacker king on the 8th rank in front of its pawn, which is not a rook pawn. The defender king is cut off from the pawn by own file or more by the attacker rook. The defender rook prevents the attacker king to leave the promotion square at one side.

    The winning strategy from the Lucena position involves building a bridge with the attacker rook that shields the king from checks of the defender so the pawn can advance.

    To teach the engine the rules of that positions is not so difficult. The tricky thing is the score that is awarded when this position is recognized.  When the score is to high the engine will play towards that position but then will try to stay in it. In order to win the attacker king must of course leave the promotion square which is then not the Lucena position anymore.

    I decided to award a bonus worth the material of a pawn when the Lucena position or a position close to it is recognized. This seems to do the trick.

    I tested the KRPKR module in a winning position with iCE 0.2 and iCE 0.3.

    This is a typical position that you might encounter in a KRPKR endgame. White moves and this position is won for White. The only winning move is Rd4!

     White moves and Mates in 29 moves. 1. Rd4 !

    I searched this position allowing both engines 60 seconds for the move. Both engines found the winning move almost instantly however when analyzing the output iCE 0.3 understands the position as winning where iCE 0.2 has no clue and just awards a better score for the more advanced pawn.

    iCE 0.2 (reaching 18 ply in 60 sec)
    info depth 2 seldepth 3 time 0 nodes 150 pv b4d4 c6h6  nps 150000 score cp 44 
    info depth 18 seldepth 34 time 16094 nodes 25335132 pv b4d4 c6e6 ...  nps 1574197 score cp 94

    iCE 0.3 (reaching 19 ply in 60 sec)
    info depth 2 seldepth 3 time 0 nodes 160 pv b4d4 c6c1  nps 159999 score cp 47
    info depth 17 seldepth 32 time 8391 nodes 12690308 pv b4d4 c6h6  nps 1512371 score cp 97
    info depth 18 seldepth 32 time 9921 nodes 13093163 pv b4d4 c6h6  nps 1319742 score cp 205

    info depth 19 seldepth 35 time 16219 nodes 21335820 pv b4d4 c6h6  nps 1315483 score cp 205

    At ply 18 iCE 0.3 spotted the possibility to reach the Lucena position and awarded the bonus of 100 cp therefore the sudden rise in its score. When looking at the moves iCE thinks that will be played it reaches this position

    White moves and Mates in 22. e7 !

    The line of play iCE calculated is not optimal but forcing towards a closer win. This position is the position from the 18 ply search that was already awarded with the bonus. The Lucena position is not yet reached but the engine understands that Black is not able to prevent it anymore.

    After 1. e7 Re1 2. Kf7 Rf1 3.Ke8  we are finally there

    The Lucena position ! Black moves and loses in 19. (Ra1)

    After 3. ... Ra1 iCE has no problem calculating the remaining moves

    info depth 8 time 93 nodes 134444 pv d4d2 a1b1 d2d8 b1f1 d8c8 c6b5 nps 1445634 score cp 215
    info depth 9 time 422 nodes 574486 pv d4c4 c6d6 c4f4 a1a7 f4f6 d6e5  nps 1361341 score cp 413
    info depth 10 time 313 nodes 421256 pv d4c4 c6d6 c4c2 a1b1 c2f2 b1h1 e8d8 h1d1 e7e8q nps 1345865 score cp 990

    In less than 1 sec it sees the queening of its pawn and raising the score.

    So it seems I was able to teach the engine a fundamental chess lesson. Giving it access to a table base would probably be easier but also less fun and so I did learn my chess lesson too as I had no clue about the Lucena position either till I started programming it.

    Wednesday, January 4, 2012

    The Philidor Position in a King Rook and Pawn vs King and Rook endgame

    I just worked on fixing the errors in the King Rook and Pawn vs King and Rook endgame evaluation. This endgame is very common and is dominated by two fundamental positions. The Philidor position where the defender can draw and the Lucena position where the attacker can win.

    As this KRP vs KR endgame contains more than 50% won positions it is very difficult to find patterns that rule out all the winning positions to be left with the draws. Here instead I try to find positive rules that mark the draw and announce a UNKNOWN if no rules match.  

    To find those patterns the chess theory helps. The characteristics of the Philidor position are
    •     the defending king is on the queening square of the pawn or adjacent to it
    •     the opposing pawn has not yet reached its sixth rank
    •     the opposing king is beyond the defender's third rank
    •     the defender's rook is on the third rank


    My first unverified version of the patterns was very buggy. I corrected the patterns and added also some rules when the attacker pawn is not very advanced. The engine has now a good understanding of the Philidor position and will try to reach it when losing or try to prevent it when winning.

    This is the final performance of the fixed code compared with the first version

                          iCE 0.2          iCE 0.3
    Incorrect Draws :      83.061                0
    Spotted   Draws :   1.551.216        9.012.914
    Missed    Draws : 210.404.306      202.942.608
    Spotted   Wins  : 271.313.885      271.396.946        

    I now have to have a look at the evaluation of this endgame to spot the Lucena position and give it a higher score so the engine will play / exchange pieces towards it when winning.

    Monday, January 2, 2012

    KBNKN improvement and next step

    I thought before I move on to the next step I try an early exit idea in the KBNKN recognizer. As the position is more likely a DRAW if defender king and knight have good mobility I added 2 rules (1 for black and 1 for white) that classify a position as DRAW if King and Knight have 4 or more safe squares to move to.

    Those rules are on top of all the other rules, so if their conditions are met I get an early exit. Measurement shows that about 16% of recognizable draws are already recognized by the early exit rule. An additional benefit is that I get a slight improvement in the number of recognized draws as a small number of positions is now correctly announced as draw that would otherwise be announced as unknown because they are covered by one of the "not draw" patterns.

                              Old           New
    Incorrect Draws :           0             0                                     
    Spotted   Draws :  64.403.306    65.560.677
    Missed    Draws : 117.746.021   116.588.650
    Spotted   Wins  :  37.532.061    37.532.061

    So this is enough now for KBNKN and I move on.

    When I started with 5 man endgames I implemented a simple recognizer for KRPKR because I read this is the most common endgame of all and I wanted iCE to recognize the Philidor and Lucena positions. As I had no 5 man tablebase I was not able to verify whether the recognizer works correctly I just had some sample positions where it did.

    I now checked it against the table base and its performance is terrible. It misses most of the DRAWs and has a 5% error rate where it announces a DRAW which is in fact not a DRAW. This requires definitely some immediate fixing. I hope it is not as difficult as the the previous one, having no knights certainly helps but then again having a pawn now on the board makes it more difficult.

    Sunday, January 1, 2012

    Endgame King Bishop and Knight vs. King and Knight

    All my latest efforts to improve the play of iCE went into mastering the endgame of King Knight and Bishop vs King and Knight. The theory does not say much about this endgame, obviously it is not interesting enough for human play and chess engine play it via table bases.

    However I want to build some endgame knowledge into iCE directly so I tried to figure out some basic properties about it.

    It seems overall it is drawish, so there are a lot of drawn positions. But there is also a significant amount positions that look drawish but are winnable by the stronger side. The line of play from those positions is sometimes not clear. So an engine without any knowledge of this endgame might easily move from a drawn into a lost position as it cannot tell the difference.

    White moves and Mates in 101 !

    Black moves and loses in 95 !

    So it is very hard to tell whether a position is drawn or won. Some general principles for winnable positions are
    • the attacker king is actively placed
    • the distance between defender knight and defender king is big
    • the defender knight is restricted in its mobility
    • the attacker king is closer than the defender king to the defender knight
    I then coded a module that detects dangerous patterns in a position. If a position does not contain any dangerous pattern it is a draw. A knight on the board makes patterns difficult and in this endgame 2 knights are on the board. So the pattern module is now a very comprehensive chunk of code, it took a lot of time to build it and it is far from perfect. It never announces a won/lost position as draw but does miss a lot of drawn positions.

    Incorrect Draws : 0
    Spotted   Draws : 64.403.306
    Missed    Draws : 117.746.021
    Spotted   Wins  : 37.532.061

    So about a third of the drawn positions are recognized, two thirds are missed but this is as good as I could make it. The effort is still worth it in my opinion because if a draw is recognized search stops at this node cutting of potential huge sub trees that otherwise would have been searched. So there is still a significant savings potential.

    I now have a neat little number of special endgame modules for different kinds of endgames. The code that forks the evaluation into one of this modules is a bit messy. I will clean that up and then I have to do some extensive testing whether those modules integrate correctly into search and evaluation.

    More fun to come...