Wednesday, August 11, 2010

Coupling the Engine with the GUI - Part 2

In part 1 I described the engine interface on the GUI side. This is simpler as the GUI starts the engine as process and therefore has a handle to it, knows its process id and so on.

Programming the interface on the engine side is more difficult. The engine is started as a process and all it knows is, that it reads input from Standard In and writes output to Standard Out.

In Free Pascal you use the classes TIOStream for both purposes.


Now letting the engine write some output to Standard Out is performed by a simple call. As the UCI protocol demands that lines end with a CRLF I wrapped a procedure around the output that appends #13#10 to the string and write to to the output stream.

  procedure TEngine.writeOutput(anOutputStr : String);

Now to the really tricky part, Input reading from Standard In. There are several challenges

  1. You can't expect to receive the input line by line. All you receive is a bunch of bytes. You have to scan the received bytes for New Line characters which mark the end of one line and the start of another.
  2. You might receive input without any New Line characters at all. This happens when you try to read when the sending side was not yet finished sending. You can't just return that half completed line. You have to store it and complete it when new input becomes available.
  3. New Line characters are not always #13#10. The UCI protocol allows different combination for that, so you have to handle all possible New Line character variants.
  4. As far as I know there is no way to check for new input before you actually try to read it. Good old Pascal had this "keypressed" function, which returned true if the user pressed a key and you could use "readkey" after wards to actually read it. This does not exist anymore and actually Standard In is a stream, input might be available without any key pressed at all. The nasty thing with this is that reading from an empty stream blocks your execution until input becomes available.
Points 1-3 are just things you have to be aware of and at the end it is just a bit more programming effort to handle them. Point 4 causes some headache.

The UCI protocol demands that the engine is responsive even when calculating. In order to do that you have to poll for new input from time to time when doing a longer search for a best move. If you do that and no input is available the engine hangs and does not proceed with the search.

After trying and failing a lot I finally decided to run a different thread in my engine that does nothing than polling for new input. So it tries to read, blocks until something becomes available and if it receives input it sets a flag to let the engine know, something was received. All the engine does then is polling the flag from time to time.

The execution loop of the thread looks like this. (streamReader.Readln implements a blocking read from the input stream handling all the New Line variations) 

    while (not Terminated) do
      if newInputAvailable=False then
        if S<>'' then

The tread implements also a read method the engine uses to retrieve the input. Whenever this method is called the tread knows it is safe to try to read again from the stream and maybe overwrite what he has read so far.

  function TWatchStdIn.getNewInput:String;

    if newInputAvailable = False then Exit;


This works very nicely but there is a problem with it. Using threads slows down the engine a lot. It is not the work the thread does, it is the pure existence of a thread. The performance is hit even if the thread is just declared and not even told to start executing. The reason is that the compiler builds a different program when he realizes that it runs more than one thread. Memory management within the program is more complex to make it thread safe and this costs performance.

To just outline how big the impact is I list the numbers for the perft calculation with threads and without it.

Depth       Perft                Time w/o Threads           Time with Threads
5               4.865.609             1,265 sec                             2,343 sec
6           119.060.324           29,687 sec                           56,781 sec                      

So using threads almost doubles the time for the same work. For a chess engine this is a problem. Fortunately the impact seems much smaller when considering the complete search and evaluation process and as I see no real alternative to threads my engine is using them for the IO handling.