Most of my recent attempts to improve my engine failed, any possible improvement stayed within the error bar, so no breakthrough yet.
As I'm running out of ideas for the moment to improve iCE without any major effort (e.g. major changes to eval and tuning the evaluation weights) I think I will release the current version of the code soon. It has at least functional enhancements compared with iCE 0.2 like using an opening book or better endgame knowledge.
One of the missing functional features so far was the implementation of the DRAW by the 50 move rule. It is no big deal to implement but as I did not see any major benefit for playing strength in it I just thought I will implement it later when the time is right.
It is right now and iCE does now know it.
To demonstrate it consider the following position: 8/5k2/8/8/1R6/R6K/7P/8 w - - 96 200
48 moves have been played without pawn move or capture. The old version of iCE here announces a Mate in 3, trying to mate with the rooks right away. This would fail because the game would be adjudicated as draw by the 50 move rule before Black is Mated.
The new version of iCE is now seeing the way out.
info depth 14 seldepth 24 time 13797 nodes 38389600 pv h3g2 f7e6 h2h3 e6e5 a3a5 e5d6 b4b6 d6c7 b6h6 c7b7 a5g5 b7c7 g5g7 c7d8 h6h8 nps 2782459 score mate 8
So it is first moving the king and the pawn and mating with the rooks later.
So one more source of possible embarrassment removed.
Friday, June 29, 2012
Sunday, May 6, 2012
Getting stronger is really tough
All my ideas lately to improve the playing strength of my chess engine did not work out. Sometimes it looked promising but something that seemed like a small progress vanished if more games were played.
I have not the resources to run hundreds of thousand of games. Playing 1000 games is already quite a commitment and the error bar after 1000 games is still very big. So my latest version v2589 shows something like a small improvement but actually it can even be worse than the last one in the list. They are all very close together and I kept the code changes not for a proven ELO gain but because they fixed some small bug or simplified the code a bit.
Rank Name Elo + - games score oppo. draws
1 gaviota v0.85.1 199 35 35 368 78% -20 17%
2 cheng3 1.07 JA 52 14 14 1860 60% -21 17%
3 Daydreamer 1.75 JA 30 12 12 2358 57% -19 29%
4 iCE 0.3 v2589 -20 16 16 1295 52% -31 37%
5 iCE 0.3 v2459 -24 20 20 766 50% -25 36%
6 iCE 0.3 v2441 -25 9 9 3768 47% -1 33%
7 iCE 0.3 v2489 -25 18 18 1014 50% -25 32%
8 iCE 0.3 v2457 -25 8 8 5160 50% -22 33%
9 iCE 0.3 v2559 -27 10 10 3493 49% -17 31%
10 iCE 0.3 v2574 -31 9 9 4162 48% -16 33%
11 iCE 0.3 v2413 -34 12 12 2161 49% -28 33%
12 iCE 0.3 v2571 -35 11 11 2768 48% -18 32%
13 iCE 0.3 v2437 -35 9 9 4249 47% -15 31%
So I'm still looking for something that adds some ELO. But it seems most of the low hanging fruits have been picked.
I have not the resources to run hundreds of thousand of games. Playing 1000 games is already quite a commitment and the error bar after 1000 games is still very big. So my latest version v2589 shows something like a small improvement but actually it can even be worse than the last one in the list. They are all very close together and I kept the code changes not for a proven ELO gain but because they fixed some small bug or simplified the code a bit.
Rank Name Elo + - games score oppo. draws
1 gaviota v0.85.1 199 35 35 368 78% -20 17%
2 cheng3 1.07 JA 52 14 14 1860 60% -21 17%
3 Daydreamer 1.75 JA 30 12 12 2358 57% -19 29%
4 iCE 0.3 v2589 -20 16 16 1295 52% -31 37%
5 iCE 0.3 v2459 -24 20 20 766 50% -25 36%
6 iCE 0.3 v2441 -25 9 9 3768 47% -1 33%
7 iCE 0.3 v2489 -25 18 18 1014 50% -25 32%
8 iCE 0.3 v2457 -25 8 8 5160 50% -22 33%
9 iCE 0.3 v2559 -27 10 10 3493 49% -17 31%
10 iCE 0.3 v2574 -31 9 9 4162 48% -16 33%
11 iCE 0.3 v2413 -34 12 12 2161 49% -28 33%
12 iCE 0.3 v2571 -35 11 11 2768 48% -18 32%
13 iCE 0.3 v2437 -35 9 9 4249 47% -15 31%
So I'm still looking for something that adds some ELO. But it seems most of the low hanging fruits have been picked.
Thursday, April 12, 2012
Evaluation Output
The latest changes to iCE did not gain any ELO or made things worse so I threw them away. To see a bit of progress I decided to spend some time into something that does not affect playing strength so I can keep it for sure. I had a reformatting of the evaluation output on my todo list for some time already and I now decided to get it done. The eval output of the stockfish engine looks really nice and helpful so I decided to mirror that format just using the evaluation terms of iCE instead of the stockfish ones of course.
So it now looks like this and I hope it will me help to fine tune eval in the later development of iCE.
position fen r2q1rk1/pp2np2/5P2/3bp2Q/3pN2b/1B1P4/PPP3PP/R4RK1 w
Total eval of the position : -52
Evaluation:
===========
Eval term | White | Black | Total
| MG EG | MG EG | MG EG
---------------------+-------------+-------------+--------------
Static Material | --- --- | --- --- | 141 141
Dynamic Material | 36 36 | 2 2 | 38 38
Mobility | -73 -73 | 66 66 | -7 -7
Pattern Recognizer | 0 0 | 0 0 | 0 0
King Pressure | -20 -20 | 1 1 | -19 -19
Static Pawn Struct | --- --- | --- --- | 6 0
Dynamic. Pawn Struct | -33 -38 | -4 -4 | -37 -42
Pieces | -10 -10 | -8 0 | -18 -10
Threats | 0 0 | -52 -87 | -52 -87
---------------------+-------------+-------------+--------------
Total | --- --- | --- --- | 52 14
Scaling : 100% MG (52) 0% EG (0) = 52
NegaMax Adjustment for Side To Move (White): -52
In order to be able to gather the sub terms of eval I had to rewrite parts of it and a minor functionality change related to the sub scores rounding (which is not done anymore) was unavoidable. But tests show it does not seem to hurt. In theory it should even help.
But it now shows also a weakness of iCE. In this position Black is in big trouble but static eval does not indicated that. The king pressure evaluation definitely needs some adjustment. iCE finds its way through anyway once it reaches that position (... but it might never get so far)
info depth 11 seldepth 26 time 2734 nodes 7865064 pv h5h6 e7f5 f1f5 h4f6 e4f6 d8f6 h6f6 f8d8 f5h5 d5b3 h5h8 nps 2876760 score mate 6 hashfull 133 tbhits 0
bestmove h5h6 ponder e7f5
So it now looks like this and I hope it will me help to fine tune eval in the later development of iCE.
position fen r2q1rk1/pp2np2/5P2/3bp2Q/3pN2b/1B1P4/PPP3PP/R4RK1 w
Total eval of the position : -52
Evaluation:
===========
Eval term | White | Black | Total
| MG EG | MG EG | MG EG
---------------------+-------------+-------------+--------------
Static Material | --- --- | --- --- | 141 141
Dynamic Material | 36 36 | 2 2 | 38 38
Mobility | -73 -73 | 66 66 | -7 -7
Pattern Recognizer | 0 0 | 0 0 | 0 0
King Pressure | -20 -20 | 1 1 | -19 -19
Static Pawn Struct | --- --- | --- --- | 6 0
Dynamic. Pawn Struct | -33 -38 | -4 -4 | -37 -42
Pieces | -10 -10 | -8 0 | -18 -10
Threats | 0 0 | -52 -87 | -52 -87
---------------------+-------------+-------------+--------------
Total | --- --- | --- --- | 52 14
Scaling : 100% MG (52) 0% EG (0) = 52
NegaMax Adjustment for Side To Move (White): -52
In order to be able to gather the sub terms of eval I had to rewrite parts of it and a minor functionality change related to the sub scores rounding (which is not done anymore) was unavoidable. But tests show it does not seem to hurt. In theory it should even help.
But it now shows also a weakness of iCE. In this position Black is in big trouble but static eval does not indicated that. The king pressure evaluation definitely needs some adjustment. iCE finds its way through anyway once it reaches that position (... but it might never get so far)
info depth 11 seldepth 26 time 2734 nodes 7865064 pv h5h6 e7f5 f1f5 h4f6 e4f6 d8f6 h6f6 f8d8 f5h5 d5b3 h5h8 nps 2876760 score mate 6 hashfull 133 tbhits 0
bestmove h5h6 ponder e7f5
Saturday, March 17, 2012
Murphys law
As I now do some code fine tuning in iCE I run long test series to see that the changes does not make the engine weaker. I started to see iCE losing some games because of illegal moves.
I was absolutely dumbfounded by that. The move generator of iCE passes all perft test, I use a 64 bit hash signature to make undetected collisions very unlikely and as an additional safety belt I verify each move from the hash again for legality. So illegal moves should just be not possible.
After some investigation I found that is was always the same illegal move in the same position and it actually came from the new opening book of iCE. So when I assembled the book I generated an illegal move and put it in the book.
In this position the move to play was specified as 5. bxa6. The iCE book creator translated that move into Pawn from b2 to a6. So when it looked what pawn actually captures on a6 it picked the wrong one. The reason was an uninitialized bitboard and this was a quick fix.
But it still did not explain why iCE actually used that move and did not detect it as illegal. After debugging I found that the move validator had also a small bug (a bracket was closed at the wrong spot) that just effected white pawn captures and so the move went ok through validation.
I fixed that bug too and corrected the books but it showed "whenever something can go wrong it will".
I was absolutely dumbfounded by that. The move generator of iCE passes all perft test, I use a 64 bit hash signature to make undetected collisions very unlikely and as an additional safety belt I verify each move from the hash again for legality. So illegal moves should just be not possible.
After some investigation I found that is was always the same illegal move in the same position and it actually came from the new opening book of iCE. So when I assembled the book I generated an illegal move and put it in the book.
In this position the move to play was specified as 5. bxa6. The iCE book creator translated that move into Pawn from b2 to a6. So when it looked what pawn actually captures on a6 it picked the wrong one. The reason was an uninitialized bitboard and this was a quick fix.
But it still did not explain why iCE actually used that move and did not detect it as illegal. After debugging I found that the move validator had also a small bug (a bracket was closed at the wrong spot) that just effected white pawn captures and so the move went ok through validation.
I fixed that bug too and corrected the books but it showed "whenever something can go wrong it will".
Sunday, March 4, 2012
Another reference match
In my efforts to establish a performance baseline I ran another reference match against a strong engine. This time I picked gaviota and decided to run a few more games at a slightly longer time control to get a more accurate result.
600 games at a TC of 100 moves in 10 sec + 0.5 sec increment per move, ponder off, own book
Score of gaviota v0.85 vs iCE 0.3 v2394 : 392 - 111 - 97 [0.73] 600
ELO difference: 176
which seems about petty correct. So this is the level I want to improve my engine from.
600 games at a TC of 100 moves in 10 sec + 0.5 sec increment per move, ponder off, own book
Score of gaviota v0.85 vs iCE 0.3 v2394 : 392 - 111 - 97 [0.73] 600
ELO difference: 176
which seems about petty correct. So this is the level I want to improve my engine from.
Saturday, March 3, 2012
Windows TimeResolution Trouble
In recent tournaments iCE 0.2 sometimes lost on time which is a very bad thing. Its time control is implemented aggressively so it is using all available time on the last move before the time control but it should not overstep it from the way it was implemented, but it did. As I wanted to improve the time allocation anyway I introduced a soft and a hard time limit. Under certain conditions the engine might continue to search beyond the soft but never beyond the hard limit. The hard limit is verified to always be less than the available remaining time. On the last move before the time control a small safety buffer is used in addition. So I was pretty sure time loss are now a thing of the past.
I was wrong. In a little private reference tournament at very fast time controls (1 sec + 0.1 sec increment per move) iCE was losing almost every game on time. After debugging the GUI and the Engine logs it showed that the engine was assuming to spend less time on a move as the GUI did. So the engine thought it has calculated for 90 ms where the GUI recorded 120 ms for a move. So the initial buffer was getting smaller and smaller with each move until there was almost no buffer anymore and the engine overstepped it.
I investigated the issue a bit and found 2 root causes for this behavior
Those 2 facts introduced a significant error margin in short TC games. I investigated some alternatives to either lower the time resolution of the OS (there is an API call for that) or eliminate the Sleep call in the main thread but all had its Cons. At the end I decided to introduce a TIME_RESOLUTION_ERROR_MARGIN of 25 ms. All time limits assume that the engine in reality is using 25 ms more than it thinks and the internal limits are adjusted for that. This works pretty well and eliminates almost all time losses even in very fast TC games. The downside is that the engine is sometimes using some ms less time than it could. But nothing is perfect.
All this is only relevant in very short TC games anyway. In real tournaments the engine has much more time and 25 ms don't make a difference at all. But for engine tuning and change evaluation a lot of games are required and to finish 1000+ games in a reasonable amount of time you have to limit the time per move very much.
With the new TC I ran a quick match against a pair of engines to establish a baseline which I measure engine changes against. iCE was actually doing not so bad, but I think this is TC related. At longer TCs the other engines would probably be stronger. For instance I wasn't able to include the engine Aristarch 4.50 at all because it is very weak at short TCs. It was mated in 9 out of 10 games by iCE (no time losses but real mates). At longer TCs Aristarch seems much stronger.
Those are the results of iCE against 5 engines with 200 games each at a TC of 100 moves in 3 sec + 0.3 sec per move. All engines 32 bits and 1 core, PonderOff, own book where existing.
Rank Name ELO Games Score Draws
1 Quazar 0.4 w32 282 200 84% 16%
2 cheng3 1.07 JA 96 200 64% 12%
3 iCE 0.3 v2394 19 1000 53% 17%
4 Abrok 5.0 -92 200 37% 24%
5 Ufim 8.02 -177 200 26% 18%
6 Eeyore 1.52 UCI -186 200 26% 15%
Finished match
Congratulations to Martin Sedlak. It's cheng engine is really very strong and greatly improved in Version 1.07.
I was wrong. In a little private reference tournament at very fast time controls (1 sec + 0.1 sec increment per move) iCE was losing almost every game on time. After debugging the GUI and the Engine logs it showed that the engine was assuming to spend less time on a move as the GUI did. So the engine thought it has calculated for 90 ms where the GUI recorded 120 ms for a move. So the initial buffer was getting smaller and smaller with each move until there was almost no buffer anymore and the engine overstepped it.
I investigated the issue a bit and found 2 root causes for this behavior
- it seems that the Windows XP time resolution has a default quantum of about 10 ms, so the time related functions have a smallest resolution of 10 ms. So for 9 ms the engine thinks it did spend no time at all and 1 ms later it spent already 10.
- the c++ Sleep(int wait) function is only specifying the minimum time the OS scheduler does not schedule the thread that called Sleep. The actual time the thread is suspended can be much greater
Those 2 facts introduced a significant error margin in short TC games. I investigated some alternatives to either lower the time resolution of the OS (there is an API call for that) or eliminate the Sleep call in the main thread but all had its Cons. At the end I decided to introduce a TIME_RESOLUTION_ERROR_MARGIN of 25 ms. All time limits assume that the engine in reality is using 25 ms more than it thinks and the internal limits are adjusted for that. This works pretty well and eliminates almost all time losses even in very fast TC games. The downside is that the engine is sometimes using some ms less time than it could. But nothing is perfect.
All this is only relevant in very short TC games anyway. In real tournaments the engine has much more time and 25 ms don't make a difference at all. But for engine tuning and change evaluation a lot of games are required and to finish 1000+ games in a reasonable amount of time you have to limit the time per move very much.
With the new TC I ran a quick match against a pair of engines to establish a baseline which I measure engine changes against. iCE was actually doing not so bad, but I think this is TC related. At longer TCs the other engines would probably be stronger. For instance I wasn't able to include the engine Aristarch 4.50 at all because it is very weak at short TCs. It was mated in 9 out of 10 games by iCE (no time losses but real mates). At longer TCs Aristarch seems much stronger.
Those are the results of iCE against 5 engines with 200 games each at a TC of 100 moves in 3 sec + 0.3 sec per move. All engines 32 bits and 1 core, PonderOff, own book where existing.
1 Quazar 0.4 w32 282 200 84% 16%
2 cheng3 1.07 JA 96 200 64% 12%
3 iCE 0.3 v2394 19 1000 53% 17%
4 Abrok 5.0 -92 200 37% 24%
5 Ufim 8.02 -177 200 26% 18%
6 Eeyore 1.52 UCI -186 200 26% 15%
Finished match
Congratulations to Martin Sedlak. It's cheng engine is really very strong and greatly improved in Version 1.07.
Saturday, February 4, 2012
Speeding up the popcount stuff
A recent profile of my little engine revealed that one of the biggest single hotspots in the entire engine is the "popcount" instruction . This instruction returns the number of set "1" bits in a 64 bit integer. Especially the position evaluation is using this instruction a lot. Questions like
Intel introduces with its Nehalem architecture in i7 cores a popcount processor instruction but the new CPUs are not so wide spread yet and I don't own one either so for the immediate future popcount still has to be calculated in software. So far I used a stable algorithm that uses shift and multiply operations to do that. This algorithm is here called C-shift (as it is my old shifting implementation in C)
A new approach implements a loop that deletes 1 bit at at time and counts how often it loops. This is very fast when only a few bits are set (C-sparse)
int popCount_C_sparse(int64 b)
{
int count = 0;
while (b != 0)
{
count++;
b &= b - 1; // reset LS1B
}
return count;
}
And as 3rd alternative I decided to implement the above C algorithm directly in Assembler using 32 bit instructions to maybe reduce some overhead introduced by the compiler.
I compiled the code using the Microsoft C compiler, the Intel C Compiler and the gnu C compiler and let each of the algorithms count the 1 bits in a predefined large set of random numbers with an increasing population count and measured the execution time.
Here are the results
The old C shift based implementation is stable, it runs always the same time independent of the number of set bits. Both looping algorithms outperform it with lesser bits. The break even is with about 7 bits. The MSVC compiler code was slightly faster as my handcrafted assembler.
A similar picture is achieved using the Intel compiler
The shift C code is stable and runs slightly faster that the same algorithm compiled with MSVC. The looping algorithms outperform it but are a bit slower than the MSVC code. No profile guided optimization has been used (PGO). Maye that would level the execution speed compared with MSVC but was not tested.
Finally I compiled the source with the gnu C compiler using the gnu C builtin popcount instruction instead of my assembler code as the gnu C compiler does not understand inline assembler in Intel Syntax.
It shows that the native gcc popcount instruction is probably implemented as a shifting algorithm and surprisingly it is the worst performing of all 3. The looping algorithm gets a performance hit between 2 and 3 bits set but it still outperforms the other 2 until 6 bits are set.
So I took the architectural decision to drop my handcrafted assembler code and introduce the C version of the looping algorithm into my engine. I replace all popcount calls where the number of set bits is expected to be smaller than 7 (like number of pawns on the 7th rank) with the new popcount code and keep the old one for the remaining calls (e.g. number of squares a queen can attack).
I got a nice speedup of more than 10% in execution speed, which was well worth the exercise.
- How many pawns are on a file (double or triple pawn detection)
- How many squares can the queen attack next move
- How mobile are the bishops
- How many pawns are protecting the king in its king shelter
- How many squares has the knight to move that are not attacked by pawns of the enemy
Intel introduces with its Nehalem architecture in i7 cores a popcount processor instruction but the new CPUs are not so wide spread yet and I don't own one either so for the immediate future popcount still has to be calculated in software. So far I used a stable algorithm that uses shift and multiply operations to do that. This algorithm is here called C-shift (as it is my old shifting implementation in C)
int popCount_C_shift(int64 B)
{
B = B - ((B >> 1) & 0x5555555555555555ULL);
B = (B & 0x3333333333333333ULL) +
((B >> 2) & 0x3333333333333333ULL);
B = (B + (B >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
return (B * 0x0101010101010101ull) >> 56;
}
{
B = B - ((B >> 1) & 0x5555555555555555ULL);
B = (B & 0x3333333333333333ULL) +
((B >> 2) & 0x3333333333333333ULL);
B = (B + (B >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
return (B * 0x0101010101010101ull) >> 56;
}
int popCount_C_sparse(int64 b)
{
int count = 0;
while (b != 0)
{
count++;
b &= b - 1; // reset LS1B
}
return count;
}
And as 3rd alternative I decided to implement the above C algorithm directly in Assembler using 32 bit instructions to maybe reduce some overhead introduced by the compiler.
__asm
{
xor ecx,ecx
mov edx, dword ptr [b]
and edx, edx
jz L1
L0: inc ecx
lea eax, [edx-1]
and edx, eax
jnz L0
L1: mov edx, dword ptr [b + 04h]
and edx, edx
jz L3
L2: inc ecx
lea eax, [edx-1]
and edx, eax
jnz L2
L3: mov eax, ecx
}
{
xor ecx,ecx
mov edx, dword ptr [b]
and edx, edx
jz L1
L0: inc ecx
lea eax, [edx-1]
and edx, eax
jnz L0
L1: mov edx, dword ptr [b + 04h]
and edx, edx
jz L3
L2: inc ecx
lea eax, [edx-1]
and edx, eax
jnz L2
L3: mov eax, ecx
}
I compiled the code using the Microsoft C compiler, the Intel C Compiler and the gnu C compiler and let each of the algorithms count the 1 bits in a predefined large set of random numbers with an increasing population count and measured the execution time.
Here are the results
The old C shift based implementation is stable, it runs always the same time independent of the number of set bits. Both looping algorithms outperform it with lesser bits. The break even is with about 7 bits. The MSVC compiler code was slightly faster as my handcrafted assembler.
A similar picture is achieved using the Intel compiler
The shift C code is stable and runs slightly faster that the same algorithm compiled with MSVC. The looping algorithms outperform it but are a bit slower than the MSVC code. No profile guided optimization has been used (PGO). Maye that would level the execution speed compared with MSVC but was not tested.
Finally I compiled the source with the gnu C compiler using the gnu C builtin popcount instruction instead of my assembler code as the gnu C compiler does not understand inline assembler in Intel Syntax.
It shows that the native gcc popcount instruction is probably implemented as a shifting algorithm and surprisingly it is the worst performing of all 3. The looping algorithm gets a performance hit between 2 and 3 bits set but it still outperforms the other 2 until 6 bits are set.
So I took the architectural decision to drop my handcrafted assembler code and introduce the C version of the looping algorithm into my engine. I replace all popcount calls where the number of set bits is expected to be smaller than 7 (like number of pawns on the 7th rank) with the new popcount code and keep the old one for the remaining calls (e.g. number of squares a queen can attack).
I got a nice speedup of more than 10% in execution speed, which was well worth the exercise.
Subscribe to:
Posts (Atom)


