To try whether the Intel compiler ICC is able to produce better code I installed a demo version and ran it. It compiled without error but the resulting executable behaved very strange. It reached a high search depth very fast but examined only a small fraction of the nodes the gcc compiled executable did.
Wow, what an optimizer ! Unfortunately not. The resulting engine played much worse despite its high search depths. So the engine contained a bug.
This is nasty, because I debug with MSVC and there the bug does not show up, also gcc compiled fine and to my surprise ICC compiled a correct but slow engine when using the /Od (no optimization) compiler flag.
I then introduced debug code to see where the different version make different decisions and it showed that the ICC optimized code messed up the generated move lists when sorting them (good moves first ...). Most of the moves just vanished and so it had a low branching factor in search and reached high depths but as it skipped a lot of moves played terrible chess.
The root cause of this error was a singe line in the sort mechanism. I use a variation of insert-sort to sort the moves and this algorithm requires to make room when it finds the right spot for an element. To do that quickly I use memcpy instead of using a dedicated loop that moves every element one slot to the right.
memcpy (&moves[j + 1], &moves[j], (i - j) * sizeof (TMove));
This worked great until the Intel compiler optimized that line. I looked the command up in the c++ documentation and was surprised to read that the behavior of memcpy is undefined when source and destination overlap (which they do in this case).
So I just replaced memcpy with memmove, the error was gone and I did learn an interesting lesson.