Thursday, December 25, 2014

Revisiting king safety

In order to improve the evaluation function in my engine I decided to revisit my king safety evaluation.

King safety is an important concept as when the king is going to be mated the game is over. King safety takes care of good king locations, intact pawn shelters, number of attacker and defenders around the king, material mating potential etc. Overall this is a fairly complex sub evaluation and to have it fairly accurate is important for the playing strength. A decision whether an engine should sacrifice material in order to be able to launch a mating attack against the opponents king comes mainly from this concept. And a wrong decision is usually fatal.

To get a better understanding what terms are important I first ran some statistics. I analyzed a few million positions where the king safety term kicked in and recorder the location of the kings.

Mid game probability the king is on a given square (black side is mirrored)
In the huge majority of the cases the king resides on the target square after castling short. In some positions it has not yet castled and in even less positions it has castled long. So obviously king safety evaluation around a king on G1 is the most important.

In my evaluation I increase or decrease a counter for attacking or defending pieces. Each piece type is handled individually. A defending queen gets no bonus, a defending bishop does. The valid counter range after adding all attackers and subtracting the defenders is between 0 and 50. Now I wanted to know how this counter is distributed. I expected a uniform distribution, but it looked quiet differently.


Obviously there are some values much more likely than others. Probably because some attack / defend combinations are much more likely than others but overall the range between 10 and 30 is the most interesting one. I intend later to give tuned values higher focus if they relate to a counter value that occurs frequently.

Finally I tuned the king safety evaluation where I assigned each counter value an evaluation value. If plotted as a curve the evaluation values form a somewhat exponential function.

Evaluation per Counter
The tuning resulted not in a smooth curve. I really would have like that. I thought about modelling the curve with an exponential function minimizing the error in the important data points. But so far I did not do it. I just hard coded all 50 tuned values for now. For a serious attack (counter >= 40) the evaluation goes over the value for a rook. This might encourage some nice sacrifices, I hope.

The final regression test got me a somewhat nice improvement in strength over iCE 2. So at least it was worth it.



Saturday, November 29, 2014

http://www.amateurschach.de

Frank Quisinsky runs a nice engine tournament with amateur chess engines and the qualification for the next round was just finished. iCE did really well in this field and ended up being first. As each program has played 1000 games the list is already a good reflection of the relative strength of the engines. 

So far it has not happend to often that iCE ended up first, so I'm enjoying that moment a lot. 
 

          Programs                                             Score                  1-0    =     0-1                   EloS 
01. iCE 2.0 v2240 POP x64         649.5/1000  433 433 134  65.0%  2750 
02. Spark 1.0 x64                 628.5/1000  410 437 153  62.8%  2735  
03. Deuterium 14.2.33.276 x64     608.0/1000  401 414 185  60.8%  2721 
04. Gaviota 1.0 AVX x64           600.0/1000  410 380 210  60.0%  2715 
05. Tornado 5.0 SSE4 x64          596.0/1000  400 392 208  59.6%  2713  
06. Vajolet2 1.45 POP x64         578.0/1000  360 436 204  57.8%  2700
07. Nirvanachess 1.7 x64          568.0/1000  369 398 233  56.8%  2694
08. Fizbo 1.2 x64                 545.0/1000  339 412 249  54.5%  2678 
09. Arasan 17.4 POP x64           537.5/1000  326 423 251  53.8%  2673
10. Cheng4 0.36c x64              507.5/1000  295 425 280  50.7%  2653
11. Crafty 24.1 SSE42 x64         493.0/1000  314 358 328  49.3%  2644
12. EXchess 7.31b x64             477.0/1000  277 400 323  47.7%  2633
13. Glaurung 2.2 JA x64           471.5/1000  267 409 324  47.1%  2629
14. Rodent 1.4 POP Build 2 x64    466.0/1000  264 404 332  46.6%  2626
 --- first 14 places are qualified (without Glaurung and with Andscacs 0.70)
15. Atlas 3.70em x64              465.0/1000  264 402 334  46.5%  2625
16. OctoChess r5190 SSE4 x64      458.5/1000  231 455 314  45.9%  2621
17. Andscacs 0.64 POP x64         419.0/1000  220 398 382  41.9%  2594
18. Rhetoric 1.4.1 x64            413.0/1000  205 416 379  41.3%  2590
19. Godel 3.4.9 x64               403.5/1000  207 393 400  40.4%  2584
20. Djinn 1.021 POP x64           312.0/1000  129 366 505  31.2%  2518
21. ProDeo 1.87 w32               303.5/1000  147 313 540  30.3%  2511


Wednesday, November 19, 2014

iCE 2 and the chess engine rating lists

CCRL 40/40



iCE 2 has now played enough games and is strong enough to be listed in the major chess engine rating lists. My main reference source is the CCRL as here it was listed since it very first prototype version. With some 400 games played it is now listed at number #39 in the "games at long time control" section. Its ELO will probably improve a bit over time with more games played. I've seen this with iCE 1.0 too that ended up after 1000 played games even above the error bar from the early games. In the short time control list iCE 2 is listed at  #28 with 2921 ELO (after far more games were played) and I don't expect a 30 ELO difference just coming from a different time control.
 
CEGT 40/20
CEGT lists iCE 2 at #26. iCE was also tested for a "ponder ON" list, where the engine keeps thinking when it's the opponents turn. This revealed an ugly bug in iCE that went so far undetected and has been fixed by now.

So when I reflect on my goals to pass the 2800 ELO mark and to enter the top 40 of the CCRL I think I'm done. I also passed the best single CPU version of crafty. Just passing the latest version of DiscoCheck is still open. Currently they are to close together to be able to tell which engine is stronger.

I also might fall out of the Top40 again, so I keep my goals open for the next version of iCE.
   




Friday, October 24, 2014

Some more GUI fun

Shaker GUI Proto Type

To relax a bit from the engine programming madness I worked a bit further on my GUI project. Here the challenges are completely different.

So far I have

  • a somewhat working PGN parser. I have not tested it with a large collection of pgns so there might be some he cannot process (especially illegal ones)
  • A game recorder that creates a pgn from manually played moves on the board including annotations.
  • A simple tournament manager that is able to run some engine vs engine matches. It still needs a lot of configuration options. Everything is currently still hard coded.
  • An interface to play an engine as a human.
  • A working time control
  • A simple analyses interface that analyzes a given position
Especially keeping track of the GUI / Engine status is mind boggling. Certain commands only work when GUI and loaded engine are in a certain state so the menus must be enabled or disabled depending on those states.

So I think this will keep me entertained for a while and then I will probably very welcome a bit of simple engine programming again.

Saturday, September 20, 2014

iCE 2 first ratings are being established


After an official release one is always curious whether one was able to improve the engine and if so by how much. I was pretty sure that the new iCE is better, my test against the old version at very short time controls showed a real improvement. But I was unsure how this scales to other opponents and much longer time controls.

As with iCE 1 Lars Hallerström performed an initial rating test and the results look encouraging. In his private rating list iCE 2 gained 200 ELO and climbed 66 spots. Considering the fact that it now already is faced with some 4 core engines while iCE is running only on 1 this is nice progress. 


 53 Ice 2.0 (64) : 2954   19  19  1000    57.2 %   2904   26.2 %
119 Ice 1.0 (64) : 2754   19  19   900    52.3 %   2738   26.7 %


53 Ice 2.0 (64)              : 2954  1000 (+441,=262,-297), 57.2 %

Fruit 2.3.1                   :  40 (+ 26,= 10,-  4), 77.5 %
Ktulu 9                       :  40 (+ 28,=  7,-  5), 78.8 %
Spike 1.4                     :  40 (+ 14,= 12,- 14), 50.0 %
Komodo CCT (64)               :  40 (+  3,=  9,- 28), 18.8 %
Junior 12.5.0.3 (4cores)      :  40 (+ 14,=  9,- 17), 46.2 %
Deuterium 14.01 (64)          :  40 (+ 14,= 11,- 15), 48.8 %
Gaviota 0.86 (64)             :  40 (+ 21,= 13,-  6), 68.8 %
Dirty Feb 26 2014 (64)        :  40 (+ 27,=  7,-  6), 76.2 %
Sjeng WC2008 (64)             :  40 (+ 21,=  8,- 11), 62.5 %
Crafty-23.8 (64)              :  40 (+ 16,= 13,- 11), 56.2 %
Texel 1.03 (64)               :  40 (+ 14,= 18,-  8), 57.5 %
Bug2 1.9 (64)                 :  40 (+ 27,=  8,-  5), 77.5 %
Scorpio 2.7.6 JA (64)         :  40 (+ 10,= 14,- 16), 42.5 %
EX 7.11b (64)                 :  40 (+ 25,= 12,-  3), 77.5 %
Djinn 1.010 (64)              :  60 (+ 39,= 19,-  2), 80.8 %
Cheng 4.36 (64-4cores)        :  40 (+ 14,= 15,- 11), 53.8 %
Rybka 4.1 SSE42 (64-4cores)   :  40 (+  3,= 13,- 24), 23.8 %
Hannibal 1.4a (64-4cores)     :  40 (+  5,=  9,- 26), 23.8 %
Chiron 2 (64-4cores)          :  40 (+  3,= 10,- 27), 20.0 %
Senpai 1.0 sse42 (64-4cores)  :  40 (+  1,= 12,- 27), 17.5 %
BobCat 3.25 (64)              :  40 (+ 20,= 11,-  9), 63.7 %
Brutus 8.05 JA (64)           :  40 (+ 34,=  6,-  0), 92.5 %
GreKo 12.0 JV (64)            :  60 (+ 55,=  4,-  1), 95.0 %
Spark 1.0 (64-4cores)         :  40 (+  7,= 12,- 21), 32.5 %


Thanks Lars for providing the data

Sunday, September 7, 2014

iCE 2 has been released



Yesterday I uploaded the new release of iCE to my website and released it into the wild. Again more than 1 year development and tuning went into this release. The last release was dominated by changes to the evaluation and weight tuning. Most of the committed changes in this release are now related to the search framework. 


Of course it has no established rating yet but I'm confident that it is stronger than iCE 1. I have not really an idea by how much. I tried to pair it with some free engines but I got contradicting results. Against some engines iCE ended up only as crushed iCE while it was able to really freeze some other engines despite the fact that all those engines are close together in their CCRL rating. Probably the time control I test with or my test setup is just bad so I leave the actual testing to the experts.

Here a summary of the changes for this release.   

Changes:

Bugfixes: Bugs related to recognizing a trapped rook and to the 50 move rule were fixed

Pawn Hash Handling: King position is now stored into the pawn hash which allows the hashing of additional eval terms while the hit rate goes down a bit.

Endgame Knowledge: Some special code to handle certain 5 piece endgames was added. iCE now uses the material hash to recognize the endgame types. The ugly list of if ... then ... statements that checks for piece combinations is now gone.

Evaluation: Some changes to the evaluation to understand certain positions better. An example are positions with an isolated queens pawn. Tests showed only limited impact on the strength. I kept them because I hope the overall playing style of iCE looks more natural now in those positions. Some minor terms have been dropped or have been replaced by others. My pawn structure evaluation is still clumsy, all attempts to improved it failed.

Table Bases: iCE is not using external table bases during play. iCE 1 had the DTM data for all the 3-men TB built into the executable. Those tables were removed in iCE 2. They are calculated on the fly when the engine starts. The calculation is fairly optimized and takes less than 100 ms on a decent 1 core system.

Search Changes: Added History Heuristics, Late Move Pruning, Razoring and Counter Move Heuristics. Change LMR to be less aggressive. Lazy eval removed. Added LMR at the root node. And tuning, tuning, tuning ...

Code Cleanup: A lot of code was refactored and simplified. Especially similar code for WHITE and BLACK was merged using C++ templates. Overall from iCE 1 to iCE 2 5600 lines of code were removed while 3800 lines were added.

Unchanged iCE is still a CLOP free zone and does not access endgame tablebases in play. However the new endgame knowledge was verified against Gaviota table bases. Tribut and thanks for providing them goes to Miguel Ballicora. Also everything related to opening books is unchanged and still works the same with iCE 2.

I recorded my work on iCE over the last years in this Blog. It contains all the dirty details. So someone interested to learn about my approach to automated parameter tuning using an GA might find something interesting here

Tuesday, August 26, 2014

Leisure time and GUI programming

My main computer is busy again running thousands of games trying to find some ELO by tuning engine parameters so I have some time to do some actual programming.

When I started chess programming I was coupling my chess logic with a GUI so I finally had a chess program but not an engine. Only humans could play it through a GUI interface.

This is of course not the way to do it and I then came up with a real engine, nevertheless the GUI part was also fun to write. This comes mainly from the fact that if you write some code you see an improvement of your work right away. You don't have to throw away complete git branches because the changes did work but not improve the engine. My last post about changing the pawn structure evaluation is such an example.

So while my computer is blocked anyway I decided to do a little bit of GUI programming. I spent some time to find the right programming language. All my later stuff was written in C++ with Visual Studio. It has the MFC framework but I got some advice not to use it. So finally I decided to give Lazarus (Free Pascal) another shot. It has a nice coupling of presentation and programming logic and wraps some of the nasty low level stuff very nicely.

A few days later now I have my first GUI prototype. It is already capable to run fixed depth matches between two engines and display their moves on the screen. Even if this probably sounds simple it took already quiet some code to get to this point.

So next on the list is an improvement of the tournament logic e.g. some real tournament time control stuff.

Wednesday, July 30, 2014

Pawn structure evaluation

Lately I spent some efforts trying to improve the evaluation of pawn structures. A pawn structure is a rather long lasting feature of a position. A good understanding of strength and weaknesses of those structures helps the engine to better understand the position.

I'm not so happy with the current evaluation of iCE in that area so I tried to improve it. I studied some of the theory e.g. Pawn Power in Chess by Hans Kmoch and tried to deduct meaningful patters like different kinds of double pawns (the closely shielded central double pawn which is not necessarily a weakness), isolated and backward pawns, levers etc.

I modified my evaluation with those patterns but unfortunately none of the regression tests succeeded. The best version was still a minor regression of about -4 ELO although I noticed that the playing style of the engine changed a lot. Usually if I test a new version against the old one I have a draw ratio of more than 50%. Here I only experienced a draw ratio of only about 30% which is the usual draw ratio of unrelated engines. So the changed pawn structure code had a significant impact. To bad it was not stronger.

So I went back to my old evaluation.

Nevertheless it was not a complete waste of time. Not my engine but I did learn something along the way about pawn structures and I got an impression what terms to target next. But overall of course I hoped for a better outcome.


Saturday, June 28, 2014

Not being lazy anymore


Lazy evaluation describes a technique where an engine does not do a full evaluation of a certain position but does only a rough estimate.

Lets say the engine knows already it has a line where it will even with best play of the opponent end with a pawn up (+1.00). Now it considers a different line where already the basic material evaluation shows that it is a rook down (-5.00). Now the engine can save time, because it does not really need to know what the influence of other factors in that position like mobility, space, open file coverage, weak pawns etc. is. All those will not compensate the loss of a rook so the engine will not play that line anyway.

This sounds like a simple and smart concept and some well known engines use it. It really speeds up the evaluation, so nps (evaluated nodes per seconds) goes up. But there is no free lunch. The problem is to pick the right safety margin. In the above example the safety margin was more than a rook, this is pretty safe. But with such a big margin you have only few cases where you can apply lazy eval. If you shrink the margin you have more and more cases for lazy eval, but so also increases the danger that you will miss stuff. 

My experiences where that while lazy eval increases the raw speed of the engine it also increases the amount of work the engine has to do to reach the same quality. In terms of time it was about even.

So far in iCE I had a very limited form of lazy eval. The safety margin was between a rook and a queen. This seemed to help overall a little but I always disliked it.

So I recently tested the complete removal again. And after 16.000 games both versions were equal. Considering the error margin removing the feature can still lose about 5 ELO in the worst case but I'm willing to take that risk.

Safety first.

Sunday, June 22, 2014

Next Generation chess players

My success in chess relates rather to improving my little engine. Although I enjoy playing over the board I'm not really doing great in that area. Maybe the next generation is here more sucessful.

Last Saturday my elder son (8 years) played its first chess tournament and he did surprisingly well. He managed to win its first 4 games and only lost the last one for the tournament victory. So he got a remarkable 2nd place in its group.

Congratulations, Jonas.




Tuesday, June 10, 2014

Razoring

In chess programming Razoring describes a forward pruning algorithm. In order to process the huge search tree some "uninteresting" branches are not searched as deep as others. This simulates a bit the human behavior where obviously bad lines are not considered as carefully as promising lines.

When thinking ahead and the engine finds itself in a position that is worse than a position it already knows it can get to it will stop searching the moves into that direction and consider other moves.

Of course this means the engine can also miss some good moves that it would have found if it searched just a bit more into that direction. So this is a bit gambling. If the engine gains more from skipping bad moves than it loses from missing some good ones overall it is a gain.

In the past iCE did not use razoring, I tried it and it did not work. I also read hat Bob Hyatt tried it extensively and it did not work for him either.

Anyway from time to time I retest some features. Recently I got history heuristics working that also failed in the past. So now I retested razoring and it also seems that it now works. Even without tuning the razor conditions iCE got another small improvement.

It seems the time for the big jumps is gone, so progress means piling up small gains. But in total a lot of small steps gets you as far as a big jump. It is just more work.

Tuesday, May 27, 2014

iCE 1.0 breaks through the 2700 ELO barrier


It's about 9 month since I released iCE 1.0 and it has played now well over 800 games for the CCRL. Its ratings is stabilizing with more games played and it has now reached the 2700 ELO mark.

I thought I make a screenshot. Just in case it does not last for long. Currently iCE is playing in the already very strong division 3 of the CCRL where it is one of the weakest engine of that division. Probably it will score some bad losses and lose some points again.

But for now time for a little celebration.

Sunday, May 25, 2014

Older versions of iCE

A fellow programmer asked me about older weaker versions of iCE to measure progress against. I think this is a nice idea. I remember when I started most of the programs I found were so strong iCE did lose almost every game. I measured progress by the moves until checkmate was delivered.

So I looked up my old files. I did not have an ice01 executable anymore but I still had a collection of the sources and they still compiled, yeah !

So here is a link to a fresh compile of old ice01. This version is from 2011. It is not able to use a book already.

http://www.fam-petzke.de/downloads/ice01.exe

I have no package of ice02 anymore. This was before I was using git. The sources were modified to become ice03 and I don't have the ones that compile to ice02 anymore.

But I still have a iCE 0.3 available. You find it here

http://www.fam-petzke.de/downloads/iCE_v03_2750.zip

Both ice01 and ice03 are only available as 32 bit versions. They don't compile as native x64 applications due to some inline assembler that I used. They run however fine on a x64 system, just a bit slower.

iCE 1.0 is the first iCE that is available as native x64 application.

Have fun with them. I hope they help.

Thursday, May 1, 2014

Move ordering

The way the chess engine search demand a very good ordering of moves. To be able the handle the huge search space it is important to search the best move in a certain position first. Often it turns out that this move is so good, the opponent won't play moves leading into that position and then you don't have to look at the remaining moves at all.

There are several techniques available to deal with that and iCE uses most of them. Last time I measured it did search the best move first in more than 90% of the positions.

For that I tried to use some knowledge in the function that orders moves, e.g. order moves with an attacked piece early and moves to squares attacked by opponents pawns late in the list so the function got a bit complex.

One technique I do not use (I tried once but it failed) is history heuristics. This is a simple statistics. Moves that were in other earlier searched positions good are tried before moves that have no or a bad history. I wondered whether I can maybe now replace a bit of my complex knowledge with such a statistics to simplify the engine a bit.

And it turned out I can.

I striped all my move ordering functions that handle quiet moves from knowledge and replaced it by history. The now simpler engine even seems to play a bit better, not by a huge margin but statically significant.

After a lot of failed attempts I'm happy I encountered now a change that works and even makes things simpler. However the idea is well known and widely applied. So nothing to get to excited about.

Saturday, March 22, 2014

Some more tuning results

I just realized that I had some flaws in my tuning setup using the Texel method. A +2 score in iCE can be compared to a +1 score in other engines. I must take this difference into account in order for the tuning to work.

I adjusted the scaling C constant in my fitness function


The best value for C to produce a minimal E given the evaluation function in iCE is -0.20. In my former runs I used a C of -0.58.

With this new function I repeated the tuning process following the Texel method with a slight modification. Instead of using a test set of 8 million positions I only used 7 million. With the remaining positions I performed a verification. I wanted to know whether the tuning produces an evaluation that only handles the positions used in the tuning well or whether the new evaluation also handles positions better it has not seen before.

Score E of the fitness function by generation - Lower E indicate higher fitness

The upper lines show the development of E against the 7M positions used in the actual tuning. The horizontal line is the E that the current development version of iCE produces for this set.

The lower lines show the development of E against the verification set of positions.

It nicely shows that the evaluation is not over-fitted to the positions it sees but also translates well to other unrelated positions.    

Finally I played a 16.000 games match between the new version and my previous one and it came out very close. The new one showed an improvement of 2 ELO, which is still within the error bar. But anyway this time the tuning only took 2 days. This is a huge speedup compared to the 4 weeks I spend with a game playing fitness function.

There is only a small thing that still annoys me. I have an earlier dev version of iCE that still scores 12 ELO better than the one produced by this tuning. This earlier version has the advantage that its search parameters were tuned to fit the evaluation weights. The Texel method does not take this into account. This might explain the difference.

I will address that when I now move my focus a bit away from eval and onto search. The framework is still the one from iCE 0.3 (just a bit better tuned). So its worth spending a bit time here too.

Sunday, March 16, 2014

Pawn Advantage in iCE

The Texel tuning method tries to minimize the evaluation error for a set of positions. The error is minimal if the score of a position is a good prediction of the outcome of the game the position is from.

There is a formula in the chess programming wiki that helps to translate the score (usually given in fractions of a pawn) into a winning probability. Plotted as graph it looks similar to the one below.
A score of +1 relates to a winning percentage of 79% and +2 to 93%.


I verified whether this distribution applies to the evaluation of iCE and found out that it does not. I used several sets of a million positions each and calculated the function that fits the winning percentage distribution best. The calculated optimal function coefficients were different depending on the set of test positions but when plotted as graph the all were almost identical.


The blue line is the original graph from above, the three others represent the winning probability by score in iCE for three different sets of positions. 

A +2 score only indicates a 72% winning percentage.


I should take that into account in future tuning excercises.


Monday, March 10, 2014

The texel way of tuning

In a recent discussion at Talkchess Peter Ă–sterlund, the author of the chess engine "Texel", explained his way of auto tuning his engine. He was able to improve its engine significantly with this method. I thought it is worth giving it a try with my engine too.

My reference engine to measure progress against is the latest development version of iCE whose weights have just been massively auto tuned by game play. It played more than 700k games in that process and it took almost 4 weeks. So it is rather well tuned.

For the new method I collected a large set of games from the internet (CCRL, CEGT and others) and serialized them into FEN positions from won, lost or drawn games from Whites point of view. I then filtered the positions. I removed positions from the opening, positions with a score >600 cp and non quiet positions. From the remaining FENs I selected a random sample of 8.000.000 positions.

The fitness of a certain engine is now the evaluation error against this set. The smaller the error the fitter a certain set.

E = 1/N * sum(i=1,N, (result[i] - Sigmoid(qScore(pos[i])))^2)

I used my usual GA framework to optimize the set with this fitness function and it converged rather nicely. The entropy dropped steadily.

Entropy of the system


Also the evaluation error decreased with each additional generation, so the evolution produced fitter individuals each generation.

Evaluation error (fitness) - Lower values indicate a fitter population

It took less than 100 generations to drive the error below the error of the reference engine.

I noticed that the optimization heavily favored 0s. In my former GA runs with game play the bit convergence towards 0 or 1 was about even. From my set of 515 bits half of the bits converged towards 1 and the other half towards 0.

In this case the evolution tried to eliminate evaluation features (setting the complete weight to 0), so the final evaluation is then dominated only by material and piece mobility. 

     Convergence:  79.35% (0.088%)     Entropy:  57.93 (-0.175)
     ^
     |█
 155 +█
     |█
     |█                                                █
     |█                                                █
     |█▄▄                                         ▄    █
   0 +-----+----+----+----+----+----+----+----+----+----+
      0   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9   1


This can be caused by weight redundancy, but it is unlikely that if two terms overlap one is set to 0 and the other to full power. Its more likely that each term gets part of the weight. Also the magnitude of weights being tossed out is suspicious. Maybe if material and mobility is even it is a good strategy to minimize the evaluation error by just betting on a draw as final game outcome. Eliminating the smaller terms does just that. It draws the score towards 0.

So I was curious how this version performs in a direct match against my reference engine. The bad news, it is a regression. The new version is about 20 ELO weaker than my reference engine. This is not so bad considering the fact that the terms were derived without a single played game and in only 48 hours. Against a not fully tuned engine the process would probably do reasonably well.

Rank Name                       Elo    +    - games score oppo. draws
   1 iCE 2.0 v1543 x64/popcnt    11    5    5 14117   53%   -11   31%
   2 ice-g741                   -11    5    5 14117   47%    11   31%


What worries me more is the trend in the process to eliminate the lesser important terms that contribute only small parts to the score. They are an essential part of the personality of the engine and they are in total worth at least 20 ELO, so they should stay.

I maybe re-run the experiment later with a different training set. It certainly has potential.


Sunday, March 2, 2014

Tablebase generation on the fly - Part2

I worked a bit more on the generation algorithm and was able to speed it up a bit more. The generation time for the KPK table was reduced from 171 ms to 31 ms. This is now fast enough to generate all 3men tables at engine startup.

I integrated the code into iCE and removed the built-in tables. On my i7 the complete generation takes 78 ms, which I consider acceptable. On a slower system and as as 32 bit code it takes an also acceptable 125 ms.


Surprisingly the difference between the debug and the release version of the code is about factor 40. I cannot really explain why it is so big.

TableBase: KPK   Generation Time:  0.031 sec   Iterations: 29

STM           WHITE     BLACK                    WHITE     BLACK
----------------------------------------------------------------
ILLEGAL      49.279    32.944      UNKNOWN           0         0
MATED             0        42      DRAWS        19.184    36.206

MATE IN   1      40         0      MATED IN  1       0        64
MATE IN   2      97         0      MATED IN  2       0       163
MATE IN   3     219         0      MATED IN  3       0       406
MATE IN   4     422         0      MATED IN  4       0       821
MATE IN   5     915         0      MATED IN  5       0     1.698
MATE IN   6   1.636         0      MATED IN  6       0     3.124
MATE IN   7   3.121         0      MATED IN  7       0     5.841
MATE IN   8   5.647         0      MATED IN  8       0     7.603
MATE IN   9   7.541         0      MATED IN  9       0     8.110
MATE IN  10   8.395         0      MATED IN 10       0     7.857
MATE IN  11   8.601         0      MATED IN 11       0     7.215
MATE IN  12   8.178         0      MATED IN 12       0     5.843
MATE IN  13   6.719         0      MATED IN 13       0     4.185
MATE IN  14   3.829         0      MATED IN 14       0     2.501
MATE IN  15   1.065         0      MATED IN 15       0     1.026
MATE IN  16   1.154         0      MATED IN 16       0     1.194
MATE IN  17   1.066         0      MATED IN 17       0       902
MATE IN  18     871         0      MATED IN 18       0       711
MATE IN  19     658         0      MATED IN 19       0       597
MATE IN  20     558         0      MATED IN 20       0       436
MATE IN  21     606         0      MATED IN 21       0       565
MATE IN  22     562         0      MATED IN 22       0       430
MATE IN  23     343         0      MATED IN 23       0       292
MATE IN  24     144         0      MATED IN 24       0       109
MATE IN  25      64         0      MATED IN 25       0        31
MATE IN  26      19         0      MATED IN 26       0        14
MATE IN  27       7         0      MATED IN 27       0         4
MATE IN  28       3         0      MATED IN 28       0         2



Sunday, February 9, 2014

TableBase Generation on the fly

8/1k6/8/8/8/8/1K4P1/8 w - - 0 1
Most chess engines access external table bases for a perfect play in the endgame with 6 or less pieces. It also helps them to decide whether the should exchange into such an endgame or not.

I'm not a huge fan of those databases. It eliminates an important part of knowledge that an chess engine should possess and replaces it with a database access call. However most engines do it and maybe in a future version I will also turn my chess engine into a database client, but not yet.

Also that iCE is not using table bases is not fully true, since the very early versions iCE has some tables directly built into its binaries that tell it the score of all positions with 3 pieces or less (like a 3 men table base).

I calculated the values for those table myself, so its at least my own contribution. I had my own little table base generator. It was awfully slow as I was using a lot of code that I already had for move generation, board setup and stuff. It was not optimized towards a TB generator and it took several seconds to generate a single 3 men table. But this was Ok, as I was just using its output and was not executing the generation itself my engine.

As an intellectual challenge I thought it is now time to come back to that. To write a table base generator some real mind bending thinking is required. This is a nice change from the usual "change 1 value in the program and wait for a day to see whether it worked." If I get the generator fast enough I might be able to throw out the ugly large internal tables from the source code and generate them on the fly when the program starts. 

Table Base look up for 8/1k6/8/8/8/8/1K4P1/8 w - - 0 1

So 1st priority are the 3 men tables (1 piece in addition to both kings.) I introduced a new board data type where the whole board is represented
as a single packed integer. A move is also  represented as integer. It is the resulting board. A move execution is then just replacing the current board int with the move int. This enables some really fast board operations now.

Then I went to the actual TB generator code. This was really tricky to get it right. I experienced inflated Mate scores. So in the King, Queen against King endgame the longest Mate is a "Mate in 10" but I saw a lot of funny "Mate in 30"s generated.

The reason for that is retrograde generation of the data. A predecessor of a Mated in 1 position is not necessarily a Mate in 2 position it can also be a Mate in 1 position. But finally I was able to correct those scores into a consistent data set.

For a verification run I compared my new generated data against the old internal values from iCE and to my surprise 0,6% of the KPK positions deviated from each other. A quick check revealed that iCE currently has some errors in its table and my new set is the correct one. The errors were not fatal. A Win was a Win, the Mate distance was just off by 1 and sometimes even 2 moves. Probably no one would have ever noticed that. But for me the whole exercise already has a benefit.

Whether I really include the on the fly generation code in iCE I have not yet decided. It depends on how fast I can get it with some profiling or improvements in the algorithms. Currently I'm not fully happy with the speed yet.

Here is the data for my KPK set, run as a 32 bit binary on an older machine (which somewhat marks the lower limit of the hardware spectrum I target). The set already exploits board symmetries to reduce the positions it has to process and store.  

TableBase: KPK   Generation Time:  0.172 sec   Iterations: 20/8

STM           WHITE     BLACK                    WHITE     BLACK
----------------------------------------------------------------
ILLEGAL      49.279    32.944      UNKNOWN           0         0
MATED             0        42      DRAWS        19.184    36.206


Saturday, February 1, 2014

Balanced endgames

Is the position balanced ?
Almost all evaluation terms have a endgame component, some evaluation terms only exist for the endgame. In order to tune them it is very helpful not to start games from the starting position. 

A lot of games might be decided without even reaching the endgame, so the outcome of the game does tell very little whether the evaluation values of the winner were good or bad. So it is better to start games from a late midgame position. But here starts the challenge. 

What are good starting positions ?

If the position is to drawish and always leads to a draw it does not help. If it favors one side to much it is also rather useless. To find good starting positions is rather tricky.

I searched the web and some helpful fellows posted positions that they thought were balanced. They searched them with an engine and if the score was within a range around 0 they used it. 

I used those as a starting point and ran a 16000 games tournament between Stockfish and Komodo using 248 of those positions. I then analyzed the resulting games file and got mixed results. Some of those "balanced" were not balanced at all, some of those positions were dead drawn. I then assigned points for usefulness to them. Those below are the ones from the tested 248 positions which seem the most useful. 

Maybe they are of some help to others. The numbers to the right mark the number of games played with this position, the wins by white and black and the number of draws.

PS.: The above posted position is not balanced. White scores about 80% (41 - 2 - 23).

  
  #  FEN                                                     Gm   W    B    D
---+--------------------------------------------------------+-----------------   
  1. 5rk1/N4p1p/1p6/2p2p2/4p3/P4bP1/1PP2P1P/2K1R3 b - -      64   21 - 27 - 16
  2. 5k2/p1p2pp1/2p4p/2N5/1P2R2P/P1P5/5PPK/3r1b2 b - -       64   24 - 19 - 21
  3. 1B4k1/p3r1p1/1n3p1p/4p3/1Rp5/P1P2P2/2P2P1P/5K2 w - -    64   16 - 25 - 23
  4. 2k1r3/2p4p/pp6/2p1p3/b3P2B/P1R2P2/P5PP/6K1 w - -        66   23 - 14 - 29
  5. r4b2/1R6/4k1pp/2ppp3/Pp6/1P3P1P/1BP1K1P1/8 b - -        66   21 - 14 - 31
  6. 2kr4/1p2R1p1/2p4p/Pp6/3b1P1P/6P1/2P3K1/4B3 w - -        64   16 - 16 - 32
  7. 3k4/p1p2p1R/1p2p1p1/3bn3/8/PP4P1/2P3P1/2K2B2 b - -      64   22 - 13 - 29
  8. 3r1b2/5k2/pp3p2/4pp1p/P7/2P2PP1/1PKN3P/4R3 b - -        66   28 - 11 - 27
  9. r2k4/p1p2pN1/1p5p/2p1P1bP/8/1P4P1/PBP2P2/6K1 w - -      64   31 - 11 - 22
 10. 5r2/p3k2p/3pn3/3B1p2/PPPp4/6PP/2P4K/B7 w - -            66   15 - 15 - 36
 11. 3B1b2/p2n1p2/5P2/1k6/1p6/6P1/1p3P1P/1R4K1 b - -         64   10 - 33 - 21
 12. 5nk1/pp3ppp/4p1b1/3pP3/P2P4/3B4/1P3PPP/2B3K1 w - -      64   17 - 13 - 34
 13. 4b1k1/2p1r1p1/1p1p3p/p2P3P/3PP1P1/5K2/P2N4/6R1 b - -    64   15 - 13 - 36
 14. 2k5/2p2p2/p1p2rp1/3n3p/1P2KB1P/P5P1/1P3P2/3R4 w - -     64   17 - 12 - 35
 15. 2k2b1r/3n4/5P2/8/1pp5/4B1P1/1P3P1P/5RK1 w - -           64    9 - 31 - 24
 16. 3r2k1/p4ppp/1p2pn2/8/P2N4/2P3P1/1P3PP1/4R1K1 b - -      64   16 - 12 - 36
 17. 1B4k1/1p4pp/p3pb2/5p2/3P1P2/4PP2/P3K2P/1n1N4 w - -      64   18 - 11 - 35
 18. r7/pp3pkp/1n1p2p1/3P4/5PN1/1P6/P5PP/4R1K1 w - -         64   15 - 12 - 37
 19. r7/5p2/2p2k1p/ppNp1b2/3R4/1PP2P2/1P4PP/4K3 w - -        66   19 - 10 - 37
 20. 4kr2/1pp3pp/p1p5/4b3/4P3/3PB3/PP3P1P/R5K1 w - -         66   16 - 11 - 39
 21. 2r3k1/3b1pp1/4p3/1p2P1p1/1N6/P1R5/1PP4P/2K5 w - -       64   12 - 14 - 38
 22. r3k1B1/1p1b4/p6p/5p2/3p1p2/5P1P/PPP2P2/2K4R b q -       64   23 -  9 - 32
 23. 2n2k2/n4pp1/pp2p2p/3pP3/1P1P2PP/P2KB3/5P2/4N3 w - -     64   18 - 10 - 36
 24. 2r3k1/3b4/p2p2pp/3P1p2/1PpN4/P3P2P/5KP1/2R5 w - -       64   18 - 10 - 36
 25. r7/p4pk1/1p6/4n1p1/4PpP1/P7/1P2B1PP/3K3R b - -          64   13 - 12 - 39
 26. 3kb3/pp2p2p/3p2p1/4n1P1/4PK1P/1PP5/P1N1B3/8 b - -       64   13 - 12 - 39
 27. 2nr1k2/p4p2/4pp1p/2p5/2P3P1/P3P3/2K1BP1P/1R6 w - -      66   25 -  8 - 33
 28. 3k3r/1p2b2p/p3pp2/2p5/P4P2/2NKP3/1P4PP/7R b - -         66   10 - 16 - 40
 29. 2rk4/p4ppp/1pn5/2p5/4B3/P1P3P1/4PP1P/1R4K1 w - -        64   25 -  8 - 31
 30. 3b4/3k2p1/3p4/pp3p2/4bP2/4N1PP/P3K3/3R4 b - -           64   24 -  8 - 32
 31. r5k1/p7/2p3p1/4R2p/7P/2P2P1N/PP4n1/3K4 b - -            64   13 - 11 - 40
 32. 1r4k1/4b2p/2np4/8/1p2R3/1P1R4/P5PP/1K6 b - -            64   15 - 10 - 39
 33. 8/2pk3p/1p4p1/p5P1/2bPnB1P/P7/6K1/1R6 b - -             64   15 - 10 - 39
 34. 3r2k1/pp2p2p/4pnp1/4R3/8/7P/PPP2PP1/3N2K1 w - -         64   21 -  8 - 35
 35. 2r5/4k1p1/p3p1P1/6b1/2p5/7B/PPP5/1K1R4 w - -            64   21 -  8 - 35
 36. 3r1k2/3b1pp1/7p/1p1PpP2/2p1P3/7P/1PB3P1/R5K1 b - -      64   12 - 11 - 41

Saturday, January 18, 2014

Material signatures and std::map


iCE uses material signatures to classify material combination where it has special knowledge for. It uses a simple enumeration as id for certain material combinations

// ENDGAME IDENTIFIER FOR KNOWN ENDGAMES WITH SPECIAL MODULES
enum EEndGameId { EG_UNKNOWN, EG_KK,
                  EG_KNK, EG_KBK, EG_KRK, EG_KQK, EG_KPK, ...,  EG_MAXCOUNT };


Simple stuff so far. The interesting part is the code that assigns the correct id to a position. So far iCE was using the trivial approach. It just performed a long list of comparisons.

switch (bb.pcsCount)
{
case 2:    return EG_KK; break;
case 3: 

if (bb.pcsTypeCnt[W_KNIGHT] == 1 || bb.pcsTypeCnt[B_KNIGHT] == 1)  return EG_KNK;
if (bb.pcsTypeCnt[W_BISHOP] == 1 || bb.pcsTypeCnt[B_BISHOP] == 1)  return EG_KBK;
if (bb.pcsTypeCnt[W_ROOK] == 1   || bb.pcsTypeCnt[B_ROOK] == 1)    return EG_KRK;
if (bb.pcsTypeCnt[W_QUEEN] == 1  || bb.pcsTypeCnt[B_QUEEN] == 1)   return EG_KQK;
if (bb.pcsTypeCnt[W_PAWN] == 1   || bb.pcsTypeCnt[B_PAWN] == 1)    return EG_KPK;
break;  



This was ok as long as the number of modules was rather small, but over time I added more and more knowledge to iCE and this list grew. Currently iCE has special code for 40 endgames. So the above list was already rather long. It was not really a performance problem. The code is executed only when the material on the board changes and often it just stays as it is. But such a long list is not really elegant.

I considered three alternatives.

1.)
Using a table where I store the endgame ids which is indexed by the lower part of the material hash key. This should be very fast as it is just a look-up. I determined that for my kind of material hash keys I need a table of 8192 slots to allow a collision free perfect hash as long as I restrict the signatures  to 5 men or less positions.  This is currently the case but might change in the future. When adding 6 men positions the size of the required table for a perfect hash will be much bigger.

2.)
Storing the material hash keys of the interesting positions along with the id in a sorted list and then do a binary search in the list. This is a very compact data structure, search will take a bit longer.

3.)
Use the material hash key directly as underlying value in the enumeration. But this is ugly.

The implementation for 1 is straight forward. For the the second I used a std::map container. This is pretty cool. You just store pairs and the map gives you easy access to your values. It is a bit more expensive than the lookup table but should not make a practical difference. If it turns out to be a problem in profiling later I can still change it but I doubt it.

At program start I calculate the material hashes of the combinations that I'm interested in and store them in the map. During search whenever the material on the board changes I have a look there

EEndGameId TBoard::calculateEndGameId()
{
    if (bb.pcsCount > 5 || endgames.count(getMaterialHash()) == 0) return EG_UNKNOWN;
    return endgames[getMaterialHash()];
}


Very elegant and I was able to get rid of an ugly module.

Sunday, January 5, 2014

1659









The Battle of the Lines of Elvas was fought on 14 January 1659, in Elvas, between Portugal and Spain. It ended in a decisive Portuguese victory.

By coincidence 1659 is the exact lines of code that I removed from the iCE source code while reworking its move generators. The move generation code belongs to the oldest modules in iCE. It was created for version 0.1 several years ago. I was not slow or buggy but my coding style changed over time and it did not fit really in anymore. 

iCE has several move generators for quiet moves, tactical moves and check evasions. The all followed the scheme

if (sideToMove == WHITE)
{
    // generate white moves
} else
{
    // do something similar for black
}

This is not really elegant.

Fortunately C++ provides function templates that solve that really nice. It required some code rework including the definition of some new data types. But overall the code now looks much nicer.

As far as I could measure there was no impact on speed and of course the new move generators pass the perft unit test again.

iCE 2.0 v1352 x32 [2014.1.5]
perft        Nodes    Time
  1             20    0 sec
  2            400    0 sec
  3          8.902    0 sec
  4        197.281    0.015 sec
  5      4.865.609    0.109 sec
  6    119.060.324    2.699 sec
 

Perft 1 - 6 executed in 2.823 sec  (43.96 Mnps)

position fen 8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - -
perft        Nodes    Time
  1             14    0 sec
  2            191    0 sec
  3          2.812    0.015 sec
  4         43.238    0 sec
  5        674.624    0.047 sec
  6     11.030.083    0.312 sec
  7    178.633.661    4.867 sec


Perft 1 - 7 executed in 5.241 sec  (36.32 Mnps)


Thursday, January 2, 2014

A bit of New Years cleanup

I'm planning to do spend some time on my move ordering schema, again. So far iCE is not using  a history heuristic scheme. It only relies on the hash table move and the killer moves.


 
For this I have to change my internal move representation a bit so I get a few more bits to store the history score. So far I have 7 bits to hold a move ordering score, which allows values from 0 to 127.




My TMove looks like this

/**********************************************************************
*  A Move is stored as a compact 32bit integer
*     off    bits   name           description
*      0      4     movedPiece     the id of the moved Piece
*      4      6     fromSq         the start square of the move   
*     10      6     toSq           the target square of the move
*     16      4     capturedPiece  which piece is captured
*     20      4     promoteTo      which piece do we promote To
*     24      1     enPassent      this move is an en Passent Capture
*                                  this influences the square where the
*                                  piece is captured
*     25      7     order          an ordering score for move ordering
*                                  range 0 .. 127
***********************************************************************/  


This includes some redundancy. The piece we promote to is stored in 4 bits because the actual piece type is directly stored here. In reality I can only promote to a knight, bishop, rook or queen. The color of the promoted piece is the same a from the moved pawn.  So I only need to code 4 states which require 2 bits

This means with some conversion work I can save 2 bits here, which I can reassign to the ordering value.

I changed and debugged my code and now I'm ready for some more move ordering exercises.