## Monday, August 30, 2010

### Endgame Tablebases

At the moment I take a bit of rest from working on the engine and play around with a different aspect of chess, the endgame.
My engine posses a bit of endgame knowledge, like

force the downside king to the edge
keep the upside king close to the downside king

This is usually enough for the engine to determine a quick mating sequence through search, even if the mate is not within its search depth yet.

A slight enhancement for an engine would be the usage of a endgame database where it stores for a variety of positions its correct value, so if the engine encounters such a position it knows its value instantly without searching further. Obviously the number of possible positions growths very rapidly with the number of remaining pieces on the board. In a 3 piece endgame (2 kings and 1 additional piece) there are 2^18 possible positions per remaining piece type and side to move ( a good amount of it is illegal, but anyway a huge number).

If the remaining piece is a knight or a bishop the position is a draw (insufficient material), so the interesting combination are king and queen or rook or pawn vs king, with pawn being probably the most interesting as it is not as trivial to determine whether the position is a win or a draw as with rook or queen. But it is also more difficult as the pawn must promote first turning the board into a king and queen vs king endgame.

So I started with the creation of a database of a king and rook or queen vs king endgame. The algorithm works as follows.

1. Create all positions, evaluate them and mark those where the downside is mated as "Mated"
2. Look through all the positions marked as "Mate" and mark all those positions the upside can reach in 1 move as "Mate in 1"
3. Now generate all positions again and the moves in that position (it is downsides turn). Simulate the moves and look at the resulting positions. If all positions found are marked as "Mate in 1" the actual position gets "Mated in 1" score.
4. Now generate all positions again and generate the moves for the upside, simulate the move and if you encounter 1 position marked "Mated in 1" the actual position gets the score of "Mate in 2".
5. Go back to 3. but now look for positions where the downside cannot avoid a "Mate in 2" position. This position becomes "Mated in 2".
6. Alternate between upside and downside processing until you make no longer progress (finding at least one score of a previously unknown position). Then you're done.
Here is sample of how it look at the end. This position is number 5.202 of 262.144

Its score in the database was determined as
• when it is upsides move = Mate in 2 (1. Rc2 Ka1 2. Rc1#)
• when its downsides turn = Mated in 1 (1. ... Ka1 2. Rc1#)
• and if the rook was not a rook but a queen, then it is a "Mate in 1" (1. Qe1#) or a Draw (Stalemate)
So far I have only played around with the basic stuff, but it is fun so I will now work on the king and pawn vs king table.

Again, I don't expect the engine to play stronger with that knowledge, but I think it is fun to enable the engine to announce deep mates, like "Mate in 16". However I have not decided how to make that knowledge available to the engine, at the moment it is just written to a file in the file system.

## Wednesday, August 25, 2010

### Implementing Late Move Reductions

After the unpleasant fixing of the transposition table design flaw I thought it is time to move to something more interesting to actually maybe improve the engine a bit.

I thought that "Late Move Reductions" is worth giving a try.

In a normal Alpha Beta tree you have 3 types of nodes

PV-nodes = nodes that have a score that ends up being inside the alpha beta window. These nodes have all moves searched, and the value returned is exact, which may propagate up to the root.

All-nodes = nodes in which no move's score exceeded alpha. Every move at an All-Node is searched, and the score returned is an upper bound, the exact score might be less.

Cut-nodes = nodes in which a beta cutoff  is performed. The score returned is a lower bound (might be greater) on the exact score of the node.

Statistics show that in Cut Nodes with reasonable move ordering the cut off happens very early, within the first 3 to 4 moves. In "All Nodes" you search every move just to find out no one can improve your situation. Move ordering in All Nodes has no effect so a lot of search time the engine spends in handling the All Nodes. Unfortunatly you don't know that you are searching an All Node until you are done.

Late Move Reduction (LMR) tries to save some of the time here. The idea is that a move that is probably not a Cut Node (you searched the first 4 moves and are still there) and not a PV Node (you searched the probably best 3-4 moves after move ordering and no move was good enough to improve the score you already secured) is most likely an All Node. So the node is searched with a reduced depth to verify that it is really an All Node. This is where the name Late Move Reduction comes from, you reduce the late moves in such a node. If this reduced search reveals that you're guess was wrong you do a research of the moves otherwise you saved a lot of nodes.

However you should not reduce all moves just because they are late in the list. You should not reduce moves that give check, moves that respond to check, captures, promotions ... everything that is interesting or you miss something important.

After implementing this the engine searched a lot deeper but was actually playing weaker than the one without LMR. So I added another restriction that in nodes below a PV node no reduction takes place, as a wrong score here might more likely propagate up to the root. This seemed to do the trick.

I played 10 games of the engine with LMR against the engine without it, and the engine with LMR scored 7:3. One loss of the LMR engine was due to a time control issue where it lost on time in a winning position, so the loss is not a result of LMR. I find this a rather convincing improvement of the engine.

Arena tournament
RankEngineScoreEnEnS-B
1Engine127,0/10· ·· ·· ·· ··11===01=11 21,00
2Engine113,0/1000===10=00· ·· ·· ·· ·· 21,00

10 games played / Tournament is finished

Level: Tournament 40/10

The end of the tournament, the engine with LMR as black will Mate in 2

8/1p3kp1/3P3p/p1p1B3/P7/RP3qP1/5r2/3R2K1 w - - 0 40

## Monday, August 23, 2010

### Struggeling with a design flaw

When I was creating my engine I had to decide where to place the transposition tables. I decided to make them part of the board class as I wanted the verify its correct implementation first with the move generator. Let the perft calculation use the transposition table and if not the same number of moves is generated as before you know there is an error with it (the table itself, the hash key generation or make/unmake move). So the transposition table became a property of the board class in order to be available for the perft method.

I then developed my engine and my GUI further and it seemed to work very nicely this way. But when I finally sized the transposition table to a serious size of a couple of million entries I realized that the memory consumption of the GUI went also up very much. The reason is that the code for the board class is used by both the GUI and the engine. The GUI uses board properties like which square holds which piece and the move generation methods to verify the user input but by using the same code it also allocates the size of the transposition table.

So my GUI was suddenly consuming several hundreds of MB RAM.

I first tried to work around that by conditional compiling the GUI and engine

#IFDEF ENGINE
TTSIZE = 2097152
#ELSE
TTSIZE = 100
#ENDIF

It worked somehow (not really good because I had to explicitly save the unit with the constants to force a recompile of it). But I was not really happy with that work around anyway. I rather fix problems at the root.

So I decided to remove the transposition table from the board class and put it in the engine class although this requires a lot of code to be changed. I will spend some unpleasant hours on fixing my design. I hope the engine will not run slower after that and I do not introduce many new bugs.

Lessons learnt: Don't mess up your code with something that seems easier at first sight if it involves implementing something in a place where it does not belong to.

## Wednesday, August 18, 2010

### Does my engine need an opening book

When programming a chess engine having an opening book is a nice feature. In my opinion it addresses to aspects

1. The search in the engine is deterministic, in the same position with the same amount of time, search will always respond with the same move. Having an opening book where the engine can chose between multiple good moves for a position introduces a bit of randomness. The engine will not always play the same game.
2. The engine saves time when playing the first moves as it doesn't have to search for a move. Without an opening book the moves from the engine are also not stupid but it took time to search for them. Engines with an opening book have a slight advantage against engines without one.
For those reasons I decided to implement an opening book in mACE. I will not use an existing one or rely on a GUI to play the opening moves. I consider this cheating. I rather have a weaker engine that is mine than having a stronger one with code or intellectual properties from others.

The reason to program a chess engine after all is to test my ability to create one not to test my ability to copy code or rely on information in existing databases created by others.

So lets think about how to do it myself ...

## Friday, August 13, 2010

### [BUG] UCI protocol implementation error

Today I realized an error in my implementation of the UCI protocol. It's been there for ages and I didn't noticed because I implemented the bug in the GUI as well as in the engine so they both could communicate perfectly.

Today I plugged the engine into another GUI called Arena (available freely at http://www.playwitharena.com) and here it became obvious. The engine was not able to initialize a board position when it is not the start position.

211.172-->2: position fen 4k3/p1p3p1/1p5p/8/8/1P5P/P1P3P1/4K3 w - - 0 1
211.172-->2: go wtime 600000 btime 600000 winc 0 binc 0 movestogo 40
211.218<--2: Invalid position or FEN string !

The reason was simple. The UCI position command is followed by either the keyword 'startpos' or 'fen'. My engine just wasn't aware of the 'fen' keyword. If the GUI did not send 'startpos' the engine tried to read in the fen string and failed because 'fen' are no valid FEN characters. It responded with invalid position or FEN string.

So fixing the error was now simple, tell the engine to expect the fen keyword and start reading the actual fen input with substring 3 instead of 2. Tell the GUI to send the fen keyword before the actual FEN.

And all is well again...

## Thursday, August 12, 2010

### [BUG] Silly bug in the board evaluation

At some point in time every chess engine must be able to evaluate a given board position. The easiest way to do that is to just score the material. But then the engine moves around rather silly so you try to evaluate a position a bit better.

One of the things I consider in evaluation is the existence of castling rights. Whenever a side has preserved the right to castle it get a small bonus for it (one for king side and one for queen side). This way the engine gives not easily away the rights to castle by making a king move. It also tries now to take away the castling rights from the opponent.

So far so good, but this alone is not enough as the engine will now not castle because this means losing the rights to castle (after castling the engine can't do it again) and the bonuses associated with the rights. To get around this I rewarded a bigger bonus if the engine has made a castling move outweighing the loss of the castling rights.

Engine can castle king side +10
Engine can castle queen side +10
Engine has castled + 60

So if the engine castles it gets a overall bonus of +40  (60-10-10). This actually seemed to work pretty well. until recently I discovered a problem with that approach.

rnbq1r2/pppppk1p/5n1b/8/3P4/P7/1PP1PPPP/RNBQKB1R w KQ -

Consider the following position. This is one of the positions that search might encounter as a leaf node.

This position can be reached via various combination of moves (transpositions). If in the sequence of moves black has castled king side the position is evaluated with the +40 bonus for black. If the sequence of moves just involves black moving seperately with the rook and king it gets a penalty of -20 (black has given away both castling rights).

So the very same position gets 2 different scores depending on the move history, which I consider a bug in this case. I will try to fix it by removing the bonus for castling and giving a bonus for the king when located at the castling target positions (a good piece-square value for the king).

Another possibility when 2 positions get different scores is the issue of position repetition. Then we might approach a draw by repetition so the position is worse for the one with an advantage if it repeats itself. But this I actually consider a feature and not a bug.

## Wednesday, August 11, 2010

### Coupling the Engine with the GUI - Part 2

In part 1 I described the engine interface on the GUI side. This is simpler as the GUI starts the engine as process and therefore has a handle to it, knows its process id and so on.

Programming the interface on the engine side is more difficult. The engine is started as a process and all it knows is, that it reads input from Standard In and writes output to Standard Out.

In Free Pascal you use the classes TIOStream for both purposes.

inputStream:=TIOStream.Create(iosInput);
outputStream:=TIOStream.Create(iosOutput);

Now letting the engine write some output to Standard Out is performed by a simple call. As the UCI protocol demands that lines end with a CRLF I wrapped a procedure around the output that appends #13#10 to the string and write to to the output stream.

procedure TEngine.writeOutput(anOutputStr : String);
begin
anOutputStr:=anOutputStr+chr(13)+chr(10);
outputStream.Write(anOutputStr[1],length(anOutputStr));
end;

Now to the really tricky part, Input reading from Standard In. There are several challenges

1. You can't expect to receive the input line by line. All you receive is a bunch of bytes. You have to scan the received bytes for New Line characters which mark the end of one line and the start of another.
2. You might receive input without any New Line characters at all. This happens when you try to read when the sending side was not yet finished sending. You can't just return that half completed line. You have to store it and complete it when new input becomes available.
3. New Line characters are not always #13#10. The UCI protocol allows different combination for that, so you have to handle all possible New Line character variants.
4. As far as I know there is no way to check for new input before you actually try to read it. Good old Pascal had this "keypressed" function, which returned true if the user pressed a key and you could use "readkey" after wards to actually read it. This does not exist anymore and actually Standard In is a stream, input might be available without any key pressed at all. The nasty thing with this is that reading from an empty stream blocks your execution until input becomes available.
Points 1-3 are just things you have to be aware of and at the end it is just a bit more programming effort to handle them. Point 4 causes some headache.

The UCI protocol demands that the engine is responsive even when calculating. In order to do that you have to poll for new input from time to time when doing a longer search for a best move. If you do that and no input is available the engine hangs and does not proceed with the search.

After trying and failing a lot I finally decided to run a different thread in my engine that does nothing than polling for new input. So it tries to read, blocks until something becomes available and if it receives input it sets a flag to let the engine know, something was received. All the engine does then is polling the flag from time to time.

The execution loop of the thread looks like this. (streamReader.Readln implements a blocking read from the input stream handling all the New Line variations)

while (not Terminated) do
begin
if newInputAvailable=False then
begin
if S<>'' then
begin
newInput:=S;
newInputAvailable:=True;
end;
end;
end;

The tread implements also a read method the engine uses to retrieve the input. Whenever this method is called the tread knows it is safe to try to read again from the stream and maybe overwrite what he has read so far.

function TWatchStdIn.getNewInput:String;
begin
Result:='';

if newInputAvailable = False then Exit;
Result:=newInput;

newInputAvailable:=False;
newInput:='';
end;

This works very nicely but there is a problem with it. Using threads slows down the engine a lot. It is not the work the thread does, it is the pure existence of a thread. The performance is hit even if the thread is just declared and not even told to start executing. The reason is that the compiler builds a different program when he realizes that it runs more than one thread. Memory management within the program is more complex to make it thread safe and this costs performance.

To just outline how big the impact is I list the numbers for the perft calculation with threads and without it.

5               4.865.609             1,265 sec                             2,343 sec
6           119.060.324           29,687 sec                           56,781 sec

So using threads almost doubles the time for the same work. For a chess engine this is a problem. Fortunately the impact seems much smaller when considering the complete search and evaluation process and as I see no real alternative to threads my engine is using them for the IO handling.

## Tuesday, August 10, 2010

### Coupling the GUI with the Engine - Part 1

In my new engine programming approach I separated the GUI from the engine into two standalone executables.  So now I have to find a way that they communicate with each other. The UCI (Universal Chess Interface) refers to the usage of Standard In and Standard Out of a process. So it was clear what to do, remains only the question how to do that in Free Pascal.

From the GUI perspective the engine is a process. Free Pascal defines a class TProcess, where you can specify something like the command line to the executable, what pipes are used and so on. When the GUI lanches it calles TProcess.Execute which starts the engine. It is running as process in the background. You see no engine window but it is listed in the Windows Task Manager process list.

To send something to the engine you add a CRLF (#13#10) to the string (defined in the UCI protocol) and call engine.Input.Write

ngCmd:=aCommand+chr(13)+chr(10);
procEngine.Input.Write(ngCmd[1], length(ngCmd));

To look for input and receive it from the engine you use

Process.Output.NumBytesAvailable is very useful because it allows you to check whether there is any input at all and if not just return. In pipe communication trying to read from a pipe that is empty might block you execution until input becomes available in the pipe.

On the engine side of the communication Process.Output.NumBytesAvailable does not exist and this makes the interface on the engine side much more difficult and brings some nasty consequences.

More to that in another post

## Saturday, August 7, 2010

### Bit Twiddling Fundamentals - Finding the least significant bit in a BitBoard

As explained in the post about board representation BitBoards are very efficient for applying certain conditions onto the board. The result is a BitBoard where those bits are set for which all applied conditions are true.

In the example for the move generation of white pawns we advanced all pawns one rank and filtered out those pawns reaching the 8th rank and those hitting another piece. The result was a BitBoard with valid target squares for simple pawn moves.

In order to make use of that result we must now translate the set bit positions back into squares. The naive way is just looping through the bitboard and looking at each single bit whether it is 1.

Something like

bb : BitBoard;
i : integer;
pos:=0;
while (pos<=63) and ((bb AND (2^pos))<>0) do inc(pos);
if pos<64 then ...

This finds the least significant bit in bb. You can now do something with this information. Then you delete this bit in bb via  bb := bb and NOT (2^pos) and start over again to find the next bit until bb is 0. Then all bits are processed.

This works but is slow. As finding a bit in a BitBoard is called millions of times during search for a best move you want that operation to be very fast. So this is not a reasonable approach.

There are better ways to do that. They might involve a look up tables or magic constants (search for DeBruijn).

The approach I took finally was relying on a processor instruction to do that. That makes the code less portable but it is most likely the fastest way to do that. Free Pascal supports in line assembler and my function looks like this

function getLSB(const Input: Int64): Byte;  Assembler;
// Returns the least significant bit of a 64 bits number
// or ERROR if no bit is set
asm
bsf eax, dword ptr Input
jz @@0
jnz @@2
@@0:
bsf eax, dword ptr Input + 04h
jz @@1
jnz @@2
@@1:
mov eax, ERROR
@@2:
end;

You must process the input in 2 steps as bsf (bit scan forward) works on 32 bit arguments.

I have similar functions for finding the most significant bit or counting the number of set bits in a BitBoard.

## Friday, August 6, 2010

### New board representation

I learnt with my first approach with a chess engine the way how you store the chess board is an important (if not the most important) decision when you start programming. The way you store the content of the board determines how efficient can you generate moves or evaluate the current situation of the board. When finding a move the engine generates hundreds of thousands of moves and evaluates millions of positions so you want to do that very efficiently.

In my first approach I used a 12x12 board with two rings of invalid squares around the actual board. The advantage of this approach is that you know instantly for any given square whether it is empty or what type of piece it holds. I also stored the positions of both kings. So if you want to know whether one side is in check you don't have to loop over all the squares just to find out where the kings are.

But even with this information at hand generating moves is very slow. You have an outer loop that loops over all squares, if it hits a piece of the side to move it jumps to piece specific code for generating the moves for that piece. For sliding pieces like bishop, rook or queen this involves inner loops that travel along the lines until they hit a piece or invalid square (the edge of the board).

If you research a bit how other engines represent the board you find a method called BitBoards for storing the board information. This is actually a very clever way to do that but you should be comfortable with Boolean algebra (bitwise AND, OR, NOT and XOR).

The main idea is that you store for every piece type a 64 bit Integer in which a 1 bit on a certain position in that integer stands for a piece of that type on that square. So you will need in total 12 of that Int64 for all types of pieces (6 for white and 6 for black).   It makes also sense to have some helper BitBoards like OccupiedSquares, EmptySquares, WhitePieces and BlackPieces. It is possible to generate them any time by bitwise OR of the piece BitBoards but as they are required very often it is faster to maintain them when executing moves.

To give a glimpse what BitBoards can do here the method for generating standard white pawn moves.

The pawn positions on the board are in a BitBoard called w_Pawns. In the bitboard bit 0 stands for square A1,  bit 1 for B1 and bit 63 for square H8. To generate the moves we do the following

// we generate the target squares by shifting all pawns 8 bits left and remove those
// who hit another piece or reach the 8th rank (those are special as they promote
// and handled differently)
// BB_RANKS is a array of BitBoard constants, that have bits set that specify
// a certain rank e.g. BB_RANKS[8]  is set to  \$FF00000000000000
// TMove.Create accepts start square, target square, moved piece and captured
// piece as arguments

targetSq_bb := (w_Pawns shl 8) AND EmptySquares AND NOT(BB_RANKS[8]);
for each bit=1 in targetSq_bb do
targetSq := bitPosition;
move := TMove.Create(targetSq-8, targetSq, W_Pawn, NONE);
next

We have now generated all simple pawn moves in one go. Very efficient. How to efficiently determine the position of a 1 bit in a bitboard is explained in a later post.

Although BitBoards are sufficient to know what piece is located on what square, it requires some effort to answer the question what piece stands on a certain square. I have to loop through all the piece BitBoards until I find the one with the 1 bit for that square set. So I decided to have in addition to BitBoards a traditional array where I maintain the content of each square.

So for whatever information I need, I either look at the BitBoards or the square array. Of course there is an overhead in move execution as both data structures must be maintained, but I decided it is worth it.

## Thursday, August 5, 2010

### Getting started

I'm starting this blog to document the steps for creating my own engine. I kicked off this project in December 2009 and dedicate since then a part of my spare time to it. I always liked programming and as a student I created Checkers and Othello engines just for fun. Chess I considered to difficult in regards of the ability of hardware back then and my favor for more complex programming languages like Pascal or C instead of Assembler.

Now hardware is much more powerful and I guessed probably powerful enough to allow me to create an engine that plays a decent game of chess. So I started this project. I decided to use Free Pascal for it. Most of the engines out there are written in C or C++ but I wanted I nice little GUI too. For that I always liked Delphi but I only have a 16 bit version of it available (Delphi 1.0) which is not suitable for that purpose and I didn't want to invest into a current version for my fun project either, so I took Free Pascal (Lazarus).

The first version of my engine was no engine at all. It was a GUI able to play chess. It did not use any protocol as I was not aware that something like chess protocols exist.

It used an array structure of 12x12 as board representation (12x12 not 8x8 as I thought it might be easier to implement an "off the board" detection when I place 2 rings of invalid squares around the 8x8 board and do not generate a move when the target square is a invalid square). Moves where generated by looping through all the pieces and jumping, moving or sliding them until they hit a piece or invalid square. It worked by it was awfully slow.

To generate the 119.060.324 nodes that are 6 half moves (plys) deep from the initial position it took 96 sec. For search it used a plain Alpha-Beta Mini-Max algorithm with a short quiescence search (stops after 4 plys). This version was able to calculate 4-5 plys deep to get a move in a reasonable amount of time, which is enough to play not a totally stupid game of chess but not good enough to satisfy me.

I decided to start the project all over again and doing some research first. I decided to use only some of the old user interface code for the new GUI and create a real engine as standalone executable using those things I leaned while doing my chess engine programming research.