As I now do some code fine tuning in iCE I run long test series to see that the changes does not make the engine weaker. I started to see iCE losing some games because of illegal moves.
I was absolutely dumbfounded by that. The move generator of iCE passes all perft test, I use a 64 bit hash signature to make undetected collisions very unlikely and as an additional safety belt I verify each move from the hash again for legality. So illegal moves should just be not possible.
After some investigation I found that is was always the same illegal move in the same position and it actually came from the new opening book of iCE. So when I assembled the book I generated an illegal move and put it in the book.
In this position the move to play was specified as 5. bxa6. The iCE book creator translated that move into Pawn from b2 to a6. So when it looked what pawn actually captures on a6 it picked the wrong one. The reason was an uninitialized bitboard and this was a quick fix.
But it still did not explain why iCE actually used that move and did not detect it as illegal. After debugging I found that the move validator had also a small bug (a bracket was closed at the wrong spot) that just effected white pawn captures and so the move went ok through validation.
I fixed that bug too and corrected the books but it showed "whenever something can go wrong it will".
Saturday, March 17, 2012
Sunday, March 4, 2012
Another reference match
In my efforts to establish a performance baseline I ran another reference match against a strong engine. This time I picked gaviota and decided to run a few more games at a slightly longer time control to get a more accurate result.
600 games at a TC of 100 moves in 10 sec + 0.5 sec increment per move, ponder off, own book
Score of gaviota v0.85 vs iCE 0.3 v2394 : 392 - 111 - 97 [0.73] 600
ELO difference: 176
which seems about petty correct. So this is the level I want to improve my engine from.
600 games at a TC of 100 moves in 10 sec + 0.5 sec increment per move, ponder off, own book
Score of gaviota v0.85 vs iCE 0.3 v2394 : 392 - 111 - 97 [0.73] 600
ELO difference: 176
which seems about petty correct. So this is the level I want to improve my engine from.
Saturday, March 3, 2012
Windows TimeResolution Trouble
In recent tournaments iCE 0.2 sometimes lost on time which is a very bad thing. Its time control is implemented aggressively so it is using all available time on the last move before the time control but it should not overstep it from the way it was implemented, but it did. As I wanted to improve the time allocation anyway I introduced a soft and a hard time limit. Under certain conditions the engine might continue to search beyond the soft but never beyond the hard limit. The hard limit is verified to always be less than the available remaining time. On the last move before the time control a small safety buffer is used in addition. So I was pretty sure time loss are now a thing of the past.
I was wrong. In a little private reference tournament at very fast time controls (1 sec + 0.1 sec increment per move) iCE was losing almost every game on time. After debugging the GUI and the Engine logs it showed that the engine was assuming to spend less time on a move as the GUI did. So the engine thought it has calculated for 90 ms where the GUI recorded 120 ms for a move. So the initial buffer was getting smaller and smaller with each move until there was almost no buffer anymore and the engine overstepped it.
I investigated the issue a bit and found 2 root causes for this behavior
Those 2 facts introduced a significant error margin in short TC games. I investigated some alternatives to either lower the time resolution of the OS (there is an API call for that) or eliminate the Sleep call in the main thread but all had its Cons. At the end I decided to introduce a TIME_RESOLUTION_ERROR_MARGIN of 25 ms. All time limits assume that the engine in reality is using 25 ms more than it thinks and the internal limits are adjusted for that. This works pretty well and eliminates almost all time losses even in very fast TC games. The downside is that the engine is sometimes using some ms less time than it could. But nothing is perfect.
All this is only relevant in very short TC games anyway. In real tournaments the engine has much more time and 25 ms don't make a difference at all. But for engine tuning and change evaluation a lot of games are required and to finish 1000+ games in a reasonable amount of time you have to limit the time per move very much.
With the new TC I ran a quick match against a pair of engines to establish a baseline which I measure engine changes against. iCE was actually doing not so bad, but I think this is TC related. At longer TCs the other engines would probably be stronger. For instance I wasn't able to include the engine Aristarch 4.50 at all because it is very weak at short TCs. It was mated in 9 out of 10 games by iCE (no time losses but real mates). At longer TCs Aristarch seems much stronger.
Those are the results of iCE against 5 engines with 200 games each at a TC of 100 moves in 3 sec + 0.3 sec per move. All engines 32 bits and 1 core, PonderOff, own book where existing.
Rank Name ELO Games Score Draws
1 Quazar 0.4 w32 282 200 84% 16%
2 cheng3 1.07 JA 96 200 64% 12%
3 iCE 0.3 v2394 19 1000 53% 17%
4 Abrok 5.0 -92 200 37% 24%
5 Ufim 8.02 -177 200 26% 18%
6 Eeyore 1.52 UCI -186 200 26% 15%
Finished match
Congratulations to Martin Sedlak. It's cheng engine is really very strong and greatly improved in Version 1.07.
I was wrong. In a little private reference tournament at very fast time controls (1 sec + 0.1 sec increment per move) iCE was losing almost every game on time. After debugging the GUI and the Engine logs it showed that the engine was assuming to spend less time on a move as the GUI did. So the engine thought it has calculated for 90 ms where the GUI recorded 120 ms for a move. So the initial buffer was getting smaller and smaller with each move until there was almost no buffer anymore and the engine overstepped it.
I investigated the issue a bit and found 2 root causes for this behavior
- it seems that the Windows XP time resolution has a default quantum of about 10 ms, so the time related functions have a smallest resolution of 10 ms. So for 9 ms the engine thinks it did spend no time at all and 1 ms later it spent already 10.
- the c++ Sleep(int wait) function is only specifying the minimum time the OS scheduler does not schedule the thread that called Sleep. The actual time the thread is suspended can be much greater
Those 2 facts introduced a significant error margin in short TC games. I investigated some alternatives to either lower the time resolution of the OS (there is an API call for that) or eliminate the Sleep call in the main thread but all had its Cons. At the end I decided to introduce a TIME_RESOLUTION_ERROR_MARGIN of 25 ms. All time limits assume that the engine in reality is using 25 ms more than it thinks and the internal limits are adjusted for that. This works pretty well and eliminates almost all time losses even in very fast TC games. The downside is that the engine is sometimes using some ms less time than it could. But nothing is perfect.
All this is only relevant in very short TC games anyway. In real tournaments the engine has much more time and 25 ms don't make a difference at all. But for engine tuning and change evaluation a lot of games are required and to finish 1000+ games in a reasonable amount of time you have to limit the time per move very much.
With the new TC I ran a quick match against a pair of engines to establish a baseline which I measure engine changes against. iCE was actually doing not so bad, but I think this is TC related. At longer TCs the other engines would probably be stronger. For instance I wasn't able to include the engine Aristarch 4.50 at all because it is very weak at short TCs. It was mated in 9 out of 10 games by iCE (no time losses but real mates). At longer TCs Aristarch seems much stronger.
Those are the results of iCE against 5 engines with 200 games each at a TC of 100 moves in 3 sec + 0.3 sec per move. All engines 32 bits and 1 core, PonderOff, own book where existing.
1 Quazar 0.4 w32 282 200 84% 16%
2 cheng3 1.07 JA 96 200 64% 12%
3 iCE 0.3 v2394 19 1000 53% 17%
4 Abrok 5.0 -92 200 37% 24%
5 Ufim 8.02 -177 200 26% 18%
6 Eeyore 1.52 UCI -186 200 26% 15%
Finished match
Congratulations to Martin Sedlak. It's cheng engine is really very strong and greatly improved in Version 1.07.
Subscribe to:
Posts (Atom)