Friday, June 29, 2012

The fifty-move rule

Most of my recent attempts to improve my engine failed, any possible improvement stayed within the error bar, so no breakthrough yet.

As I'm running out of ideas for the moment to improve iCE without any major effort (e.g. major changes to eval and tuning the evaluation weights) I think I will release the current version of the code soon. It has at least functional enhancements compared with iCE 0.2 like using an opening book or better endgame knowledge.

One of the missing functional features so far was the implementation of the DRAW by the 50 move rule. It is no big deal to implement but as I did not see any major benefit for playing strength in it I just thought I will implement it later when the time is right.

It is right now and iCE does now know it.

To demonstrate it consider the following position: 8/5k2/8/8/1R6/R6K/7P/8 w - - 96 200

48 moves have been played without pawn move or capture. The old version of iCE here announces a Mate in 3, trying to mate with the rooks right away. This would fail because the game would be adjudicated as draw by the 50 move rule before Black is Mated.


The new version of iCE is now seeing the way out.

info depth 14 seldepth 24 time 13797 nodes 38389600 pv h3g2 f7e6 h2h3 e6e5 a3a5 e5d6 b4b6 d6c7 b6h6 c7b7 a5g5 b7c7 g5g7 c7d8 h6h8 nps 2782459 score mate 8

So it is first moving the king and the pawn and mating with the rooks later.

So one more source of possible embarrassment removed.

Sunday, May 6, 2012

Getting stronger is really tough

All my ideas lately to improve the playing strength of my chess engine did not work out. Sometimes it looked promising but something that seemed like a small progress vanished if more games were played.

I have not the resources to run hundreds of thousand of games. Playing 1000 games is already quite a commitment and the error bar after 1000 games is still very big. So my latest version v2589 shows something like a small improvement but actually it can even be worse than the last one in the list. They are all very close together and I kept the code changes not for a proven ELO gain but because they fixed some small bug or simplified the code a bit.  

Rank Name                 Elo    +    - games score oppo. draws
   1 gaviota v0.85.1      199   35   35   368   78%   -20   17%
   2 cheng3 1.07 JA        52   14   14  1860   60%   -21   17%
   3 Daydreamer 1.75 JA    30   12   12  2358   57%   -19   29%
   4 iCE 0.3 v2589        -20   16   16  1295   52%   -31   37%
   5 iCE 0.3 v2459        -24   20   20   766   50%   -25   36%
   6 iCE 0.3 v2441        -25    9    9  3768   47%    -1   33%
   7 iCE 0.3 v2489        -25   18   18  1014   50%   -25   32%
   8 iCE 0.3 v2457        -25    8    8  5160   50%   -22   33%
   9 iCE 0.3 v2559        -27   10   10  3493   49%   -17   31%
  10 iCE 0.3 v2574        -31    9    9  4162   48%   -16   33%
  11 iCE 0.3 v2413        -34   12   12  2161   49%   -28   33%
  12 iCE 0.3 v2571        -35   11   11  2768   48%   -18   32%
  13 iCE 0.3 v2437        -35    9    9  4249   47%   -15   31%


So I'm still looking for something that adds some ELO. But it seems most of the low hanging fruits have been picked.


Thursday, April 12, 2012

Evaluation Output

The latest changes to iCE did not gain any ELO or made things worse so I threw them away. To see a bit of progress I decided to spend some time into something that does not affect playing strength so I can keep it for sure. I had a reformatting of the evaluation output on my todo list for some time already and I now decided to get it done. The eval output of the stockfish engine looks really nice and helpful so I decided to mirror that format just using the evaluation terms of iCE instead of the stockfish ones of course.

So it now looks like this and I hope it will me help to fine tune eval in the later development of iCE.


position fen r2q1rk1/pp2np2/5P2/3bp2Q/3pN2b/1B1P4/PPP3PP/R4RK1 w
Total eval of the position : -52
Evaluation:
===========

            Eval term |    White    |    Black    |     Total
                      |   MG    EG  |   MG    EG  |   MG     EG
 ---------------------+-------------+-------------+--------------
      Static Material |   ---   --- |   ---   --- |   141    141
     Dynamic Material |    36    36 |     2     2 |    38     38
             Mobility |   -73   -73 |    66    66 |    -7     -7
   Pattern Recognizer |     0     0 |     0     0 |     0      0
        King Pressure |   -20   -20 |     1     1 |   -19    -19
   Static Pawn Struct |   ---   --- |   ---   --- |     6      0
 Dynamic. Pawn Struct |   -33   -38 |    -4    -4 |   -37    -42
               Pieces |   -10   -10 |    -8     0 |   -18    -10
              Threats |     0     0 |   -52   -87 |   -52    -87
 ---------------------+-------------+-------------+--------------
                Total |   ---   --- |   ---   --- |    52     14

Scaling : 100% MG (52)  0% EG (0)  = 52
NegaMax Adjustment for Side To Move (White): -52


In order to be able to gather the sub terms of eval I had to rewrite parts of it and a minor functionality change related to the sub scores rounding (which is not done anymore) was unavoidable. But tests show it does not seem to hurt. In theory it should even help.

But it now shows also a weakness of iCE. In this position Black is in big trouble but static eval does not indicated that. The king pressure evaluation definitely needs some adjustment. iCE finds its way through anyway once it reaches that position (... but it might never get so far)

info depth 11 seldepth 26 time 2734 nodes 7865064 pv h5h6 e7f5 f1f5 h4f6 e4f6 d8f6 h6f6 f8d8 f5h5 d5b3 h5h8  nps 2876760 score mate 6 hashfull 133 tbhits 0
bestmove h5h6 ponder e7f5

Saturday, March 17, 2012

Murphys law

As I now do some code fine tuning in iCE I run long test series to see that the changes does not make the engine weaker. I started to see iCE losing some games because of illegal moves.

I was absolutely dumbfounded by that. The move generator of iCE passes all perft test, I use a 64 bit hash signature to make undetected collisions very unlikely and as an additional safety belt I verify each move from the hash again for legality. So illegal moves should just be not possible.

After some investigation I found that is was always the same illegal move in the same position and it actually came from the new opening book of iCE. So when I assembled the book I generated an illegal move and put it in the book.


In this position the move to play was specified as 5. bxa6. The iCE book creator translated that move into Pawn from b2 to a6. So when it looked what pawn actually captures on a6 it picked the wrong one. The reason was an uninitialized bitboard and this was a quick fix.

But it still did not explain why iCE actually used that move and did not detect it as illegal. After debugging I found that the move validator had also a small bug (a bracket was closed at the wrong spot) that just effected white pawn captures and so the move went ok through validation.

I fixed that bug too and corrected the books but it showed "whenever something can go wrong it will".

Sunday, March 4, 2012

Another reference match

In my efforts to establish a performance baseline I ran another reference match against a strong engine. This time I picked gaviota and decided to run a few more games at a slightly longer time control to get a more accurate result.

600 games at a TC of 100 moves in 10 sec + 0.5 sec increment per move, ponder off, own book

Score of gaviota v0.85 vs iCE 0.3 v2394 : 392 - 111 - 97  [0.73] 600
ELO difference: 176

which seems about petty correct. So this is the level I want to improve my engine from.

Saturday, March 3, 2012

Windows TimeResolution Trouble

In recent tournaments iCE 0.2 sometimes lost on time which is a very bad thing. Its time control is implemented aggressively so it is using all available time on the last move before the time control but it should not overstep it from the way it was implemented, but it did. As I wanted to improve the time allocation anyway I introduced a soft and a hard time limit. Under certain conditions the engine might continue to search beyond the soft but never beyond the hard limit. The hard limit is verified to always be less than the available remaining time. On the last move before the time control a small safety buffer is used in addition. So I was pretty sure time loss are now a thing of the past.

I was wrong. In a little private reference tournament at very fast time controls (1 sec + 0.1 sec increment per move) iCE was losing almost every game on time. After debugging the GUI and the Engine logs it showed that the engine was assuming to spend less time on a move as the GUI did. So the engine thought it has calculated for 90 ms where the GUI recorded 120 ms for a move. So the initial buffer was getting smaller and smaller with each move until there was almost no buffer anymore and the engine overstepped it.

I investigated the issue a bit and found 2 root causes for this behavior
  • it seems that the Windows XP time resolution has a default quantum of about 10 ms, so the time related functions have a smallest resolution of 10 ms. So for 9 ms the engine thinks it did spend no time at all and 1 ms later it spent already 10.
  • the c++ Sleep(int wait) function is only specifying the minimum time the OS scheduler does not schedule the thread that called Sleep. The actual time the thread is suspended can be much greater
If the engine is idle iCE is sending the worker thread to sleep until the IO thread is signaling new input. Otherwise and idle engine would consume a whole CPU core just by looping over a hasNewInput flag. The Sleep interval is actually pretty small, I was using 10 ms but as this is not reliable the GUI might have sent a search command and started the GUI clock while the engine was still sleeping for some ms. When the engine finally woke up realized the new command and started its clock the GUI clock was already ticking for some ms.

Those 2 facts introduced a significant error margin in short TC games. I investigated some alternatives to either lower the time resolution of the OS (there is an API call for that) or eliminate the Sleep call in the main thread but all had its Cons. At the end I decided to introduce a TIME_RESOLUTION_ERROR_MARGIN of 25 ms. All time limits assume that the engine in reality is using 25 ms more than it thinks and the internal limits are adjusted for that. This works pretty well and eliminates almost all time losses even in very fast TC games. The downside is that the engine is sometimes using some ms less time than it could. But nothing is perfect.

All this is only relevant in very short TC games anyway. In real tournaments the engine has much more time and 25 ms don't make a difference at all. But for engine tuning and change evaluation a lot of games are required and to finish 1000+ games in a reasonable amount of time you have to limit the time per move very much.

With the new TC I ran a quick match against a pair of engines to establish a baseline which I measure engine changes against. iCE was actually doing not so bad, but I think this is TC related. At longer TCs the other engines would probably be stronger. For instance I wasn't able to include the engine Aristarch 4.50 at all because it is very weak at short TCs. It was mated in 9 out of 10 games by iCE (no time losses but real mates). At longer TCs Aristarch seems much stronger.

Those are the results of iCE against 5 engines with 200 games each at a TC of 100 moves in 3 sec + 0.3 sec per move. All engines 32 bits and 1 core, PonderOff, own book where existing.

Rank Name                        ELO   Games   Score   Draws
   1 Quazar 0.4 w32              282     200     84%     16%
   2 cheng3 1.07 JA               96     200     64%     12%
   3 iCE 0.3 v2394                19    1000     53%     17%
   4 Abrok 5.0                   -92     200     37%     24%
   5 Ufim 8.02                  -177     200     26%     18%
   6 Eeyore 1.52 UCI            -186     200     26%     15%
Finished match


Congratulations to Martin Sedlak. It's cheng engine is really very strong and greatly improved in Version 1.07.

Saturday, February 4, 2012

Speeding up the popcount stuff

A recent profile of my little engine revealed that one of the biggest single hotspots in the entire engine is the "popcount" instruction . This instruction returns the number of set "1" bits in a 64 bit integer. Especially the position evaluation is using this instruction a lot. Questions like
  • How many pawns are on a file (double or triple pawn detection)
  • How many squares can the queen attack next move
  • How mobile are the bishops
  • How many pawns are protecting the king in its king shelter
  • How many squares has the knight to move that are not attacked by pawns of the enemy
are all finally answered by counting the bits in a 64 bit integer. As iCE evaluates about 1 million positions per second this instruction is called many many million times. So even a tiny speed improvement can make a big difference.

Intel introduces with its Nehalem architecture in i7 cores a popcount processor instruction but the new CPUs are not so wide spread yet and I don't own one either so for the immediate future popcount still has to be calculated in software. So far I used a stable algorithm that uses shift and multiply operations to do that. This algorithm is here called C-shift (as it is my old shifting implementation in C)

int popCount_C_shift(int64 B)
{
        B = B - ((B >> 1) & 0x5555555555555555ULL);
        B = (B & 0x3333333333333333ULL) +
            ((B >> 2) & 0x3333333333333333ULL);
        B = (B + (B >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
        return (B * 0x0101010101010101ull) >> 56;
}

A new approach implements a loop that deletes 1 bit at at time and counts how often it loops. This is very fast when only a few bits are set (C-sparse)

int popCount_C_sparse(int64 b)
{
   int count = 0;
   while (b != 0)
   {
       count++;
       b &= b - 1; // reset LS1B
   }
   return count;
}


And as 3rd alternative I decided to implement the above C algorithm directly in Assembler using 32 bit instructions to maybe reduce some overhead introduced by the compiler.

__asm
    {
            xor ecx,ecx
            mov edx, dword ptr [b]
            and edx, edx
            jz L1
       
        L0: inc ecx   
            lea eax, [edx-1]
            and edx, eax
            jnz L0
       
        L1: mov edx, dword ptr [b + 04h]
            and edx, edx
            jz L3

        L2: inc ecx   
            lea eax, [edx-1]
            and edx, eax
            jnz L2

        L3: mov eax, ecx
    } 

I compiled the code using the Microsoft C compiler, the Intel C Compiler and the gnu C compiler and let each of the algorithms count the 1 bits in a predefined large set of random numbers with an increasing population count and measured the execution time.

Here are the results














The old C shift based implementation is stable, it runs always the same time independent of the number of set bits. Both looping algorithms outperform it with lesser bits. The break even is with about 7 bits. The MSVC compiler code was slightly faster as my handcrafted assembler.

A similar picture is achieved using the Intel compiler














The shift C code is stable and runs slightly faster that the same algorithm compiled with MSVC. The looping algorithms outperform it but are a bit slower than the MSVC code. No profile guided optimization has been used (PGO). Maye that would level the execution speed compared with MSVC but was not tested.

Finally I compiled the source with the gnu C compiler using the gnu C builtin popcount instruction instead of my assembler code as the gnu C compiler does not understand inline assembler in Intel Syntax.












It shows that the native gcc popcount instruction is probably implemented as a shifting algorithm and surprisingly it is the worst performing of all 3. The looping algorithm gets a performance hit between 2 and 3 bits set but it still outperforms the other 2 until 6 bits are set.

So I took the architectural decision to drop my handcrafted assembler code and introduce the C version of the looping algorithm into my engine. I replace all popcount calls where the number of set bits is expected to be smaller than 7 (like number of pawns on the 7th rank) with the new popcount code and keep the old one for the remaining calls (e.g. number of squares a queen can attack).

I got a nice speedup of more than 10% in execution speed, which was well worth the exercise.