For a little analyses project for my chess engine evaluation I was looking for a collection of chess positions to perform some statistics with that I can use later in my eval.
First I looked for some games collections. Ed Schroeder is hosting a large one (2.2 Mio games) from human players at his site. After a first look I realized it is useless for my purpose. There are games in it where the players agreed to a draw after 17 moves or less. The outcome of chess games by human players is to noisy for statistics. It is influenced to much by things unrelated to the actual game (e.g. is the player happy with a draw because it is enough to secure its position in the tournament).
So I used chess engines games from the CCRL collection. There are also a lot available. I filtered them to remove the short games and to split them into won and drawn games with pgn-extract.
Then I used pgn2fen to serialize the pgn games into a list of consecutive FEN positions. This resulted in a long long list (like the one below)
2q5/4Q1bk/5pp1/2B1p2p/p2p3P/P2P1PP1/1P2P1K1/8 w - - 4 47
8/1q4bk/3Q1pp1/2B1p2p/p2p3P/P2P1PP1/1P2P1K1/8 w - - 6 48
2q5/6bk/3Q1pp1/4p2p/pB1p3P/P2P1PP1/1P2P1K1/8 w - - 8 49
8/1q4bk/3Q1pp1/2B1p2p/p2p3P/P2P1PP1/1P2P1K1/8 w - - 10 50
2q5/6bk/3Q1pp1/4p2p/pB1p3P/P2P1PP1/1P2P1K1/8 w - - 12 51
8/6bk/5pp1/2q1p2p/pB1p3P/P2P1PP1/1P2P1K1/8 w - - 0 52
From all those positions I was only interested in the ones at move number 50 with White to move.
In UNIX this would be pretty easy with just a cat and grep combination, but I don't have a UNIX system and so I looked for a solution in Windows.
It is hard to believe but it can be done in a small batch file
FOR %%X in (*.fen) do (
FOR /F "tokens=1,2,3,4,6* delims=; " %%i in (%%X) do (
if /I %%m EQU 50 (
if /I %%j EQU w @echo %%i %%j %%k %%l %%m >> 50-%%X
The first for statement loops over all my input files (*.fen).
The second for loop processes every line in each file. It splits the line into different parts and compares the part the represents the move number with 50. If it maches the line is copied to the result file.
The 3rd test verifies that the side to move is White. This could be skipped as I told pgn2fen already to output only positions with White to move.
Now I have something to analyze. Maybe I can detect some interesting patterns in the positions.