Thursday, January 6, 2011

There is always one more bug

Today I ran into an extremely annoying bug. I found it when I was doing a stress test against my transposition table. I do that once in a while to see that recent changes did not break anything.

Usually I run a ply 26 search against the position known as FINE70 (or the Lasker-Reichhelm Position)
8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - -
The only winning moves is Kb1!. All other moves lead to draw. With a correct implemented transposition table the position is solved in about 1 sec or less, without a transposition table or with errors in it it takes much longer or is not solved at all.

The last test revealed that the ice engine was not able to solve it anymore. It ran quite fast through the plys but at ply 26 it showed Kb2 as best move.

So the fun began, trying to find an error in the transposition table. I found several bugs where I used & in a logical expression where it should read && but this did not solve the problem.

The last thing I added was considering piece square values in the evaluation and indeed when I removed that code the position was solved. But the code is really simple, I could not imagine I did anything totally wrong here, I debugged it anyway but to no result. It worked as it should just when it added a small value to the score less than 1/10 of a pawn the position was not solved anymore.
At the end I turned the hash table off and let the engine run without it (it calculated over an hour) and to my surprise the position wasn't solved either. So obviously it was not a hash table problem. So I went to the PVS search, turned off extensions, quiescence search, LMR, zero window search in PVS, everything I could think off. Trying to debug a 26 ply search in a recursive algorithm is really mind boggeling. I wasn't able to spot any error, with every feature alone or all together turned off, the position was still not solved.

I finally started to collect the principle variations at ply 25 to see what happens to the winning move line and to my surprise at ply 25 no principle variation with an exact score that contained the winning move Kb1 was encountered. It seemed that this move and so no line starting with it was existing in the search tree, which was quite odd. But this revealed the most likely source of the error to me. It had to be in the code where the root position is handled. Here for every legal move in the root position a PVS search is performed. (The root node is outside the normal search, because at the root I need more than the score of the position, here I need the best move also).

It showed that I did not run the search for all moves at the root, I skipped the last one, the code was
for (i=0; i < ml->moveCnt -1 ; i++)
It did not show as a problem before because the move list is usually well ordered and the last move will very unlikely turn out as the best. Adding the piece square evaluation to the the score changed the order for the moves in the list, Kb2 was considered better than Kb1 on lower plys (king closer in the center), this changed the order of the moves in the root position and Kb1 was moved to the end for searches from ply 4 onwards. Here it was not considered anymore because of the -1 bug.

So my head hurts from recent head banging but I'm glad I found that nasty little bug. It really bothers me knowing to have a bug somewhere in the code and no be able to find it.

So this one is fixed now, but I already encountered the next one. There seems to be a problem with the zero window search in PVS. There is always one more bug ...