Learning Chess Openings from GMs Speedruns

Mon 20 January 2025

Table of contents:

Introduction
Methodology
Technical Implementation
Results
- Side quest 1
- Side quest 2
Conclusion
Code

Introduction

Have you ever spent hours memorizing chess opening lines, only to fall apart when your opponent deviates from the book? You’re not alone.

Chess books teach openings assuming perfect play from both sides. Reality is different, and chess books are boring anyway.

I’ve stumbled upon GMs such as Hansen and Aman (chessbrah) doing speedruns on chess.com. They create new accounts starting at lower ratings (400 or 800 Elo) and climb their way up to 2000 Elo while streaming their games to Twitch / YouTube.

Since these games are still available on chess.com, we’ve got a goldmine of data we can use to practice and speed up our improvement.

Now you might be wondering two things:

What’s the point of checking out these moves instead of just using the Game Review?
How do you practice from a PGN such as this anyway?

And my answers are:

These are human moves with understandable ideas behind them.
Either import it into a Lichess study, or stay tuned for my mobile app that is coming out (soon^tm) :D.

Indeed, the moves we’ll be practicing against will have a higher likelihood of being relevant to our games since they are moves played intuitively by players at our level.

Besides, playing the “best” move isn’t worth much if you don’t understand it: depending on your Elo, you’ll probably end up in a sharp position with only 1 or 2 good moves to keep the advantage, and you may not have the skill / time to find it.

So, the goal of this blog post is to show how I downloaded all the games from the account vonamiat (GM Aman Hambleton’s speedrun account for the Taimanov Sicilian), filtered for games where he played as black and actually played the Taimanov.

In addition, I’ve analyzed the first 15 moves with stockfish to check if Aman actually played a good move. The reason is that I want to avoid building a repertoire with mistakes in it. While Aman is perfectly able to win against a 1500 Elo player down a rook, I am not.

Methodology

Download games from chess.com Public API
Cache them to disk in case I have to fix stuff in my script and re-run it.
Filter games by opening
Run Stockfish analysis on the player’s moves
Build a repertoire in PGN format
Handle move conflicts

Technical Implementation

The script takes a username and color as input. For each game:

Check if the target player played target color
Extract moves up to move 15 (this can be configured)
Run engine analysis on the player’s moves
Store CPL (centipawn loss) and better moves if any

Example of analysis output:

Processed 182 games
Found 106 Sicilian games

Non-Sicilian games (74)

PGN:
1. e2e4 c7c5 {CPL: 0.0} {Better was e7e5} 2. g1f3 e7e6 {CPL: 5.0} 3. f1c4 a7a6 {CPL: 14.0} {Better was g8f6} 4. d2d4 b7b5 {CPL: 0.0} 5. c4e2 c5d4 {CPL: 17.0} 6. d1d4 b8c6 {CPL: 0.0} 7. d4d1 c8b7 {CPL: 3.0} {Better was g8f6} 8. c1g5 d8c7 {CPL: 32.0} {Better was g8f6} 9. b1d2 a8c8 {CPL: 41.0} {Better was g8f6} 10. e1g1 g8f6 {CPL: 0.0} 11. g2g3 f8e7 {CPL: 5.0} {Better was h7h6} 12. g5f4 d7d6 {CPL: 18.0} {Better was c7d8} 13. a2a4 b5b4 {CPL: 0.0} 14. a4a5 e6e5 {CPL: 50.0} {Better was c6a5} 15. f4e3 c6a5 {CPL: 4.0}
...(105 more lines)

To be honest, I had a brain fart and forgot to filter games before analyzing them:

python3 repertoire.py --username vonamiat --color black --max-moves 15 

INFO:root:Loaded 360 games from cache
Processing archives: 100%|██████████| 5/5 [00:03<00:00,  1.36it/s]
Building repertoire:  15%|█▌        | 54/360 [02:06<15:25,  3.02s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire:  38%|███▊      | 136/360 [04:48<09:01,  2.42s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire:  40%|███▉      | 143/360 [05:00<06:40,  1.85s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire:  59%|█████▉    | 214/360 [07:43<05:22,  2.21s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire: 100%|██████████| 360/360 [13:26<00:00,  2.24s/it]

Unique lines in repertoire:
...

Since it took 13 minutes to analyze the 360 games played by Aman and I didn’t want to wait again, I decided to ask Claude to write a parser for the resulting PGN and filter out games where Aman did not play C7C5 (a quick heuristic to find the Taimanov games). However, for more realistic scenarios, you might want to write a proper filter to classify openings, since many games can start with C7C5 and not be a Sicilian Taimanov at all.

Results

For each position, we get:

The move played
Centipawn loss
Better alternatives if the move wasn’t optimal

In total, I got 106 lines ready to be memorized. The usefulness of memorizing 106 lines 15 moves deep is questionable, but hey everything in the script is configurable. To improve it, it’s probably more useful to run intermediate steps:

Analyze your own games and find out after which move number, in average, you’re starting to loose in your favorite opening.
Build a repertoire this move number deep and only increase when you’re familiar enough with it.

Side quest 1

For my own curiosity, I checked the mistakes made by Aman in the first 15 moves across his 106 Taimanov games. I used uniq to aggregate them and have a quick overview of the different scores:

grep -o "CPL: [0-9.]*" output.pgn | cut -d' ' -f2 | sort -nr | uniq                                                                                                                                                  
425.0
276.0
264.0
251.0
228.0
161.0
158.0
154.0
153.0
149.0
138.0
135.0
126.0
122.0
120.0
119.0
115.0
113.0
111.0
110.0
108.0

And some more interesting stats below:

grep -o "CPL: [0-9.]*" output.pgn | cut -d' ' -f2 | sort -n | awk '                                                                                                                                                   
BEGIN { print "CPL Statistics:" }
{
    sum += $1
    sumsq += $1 * $1
    values[NR] = $1
}
END {
    print "Count:", NR
    print "Average:", sum/NR
    print "Max:", values[NR]
    print "Min:", values[1]
    print "StdDev:", sqrt(sumsq/NR - (sum/NR)^2)
}'
CPL Statistics:
Count: 1545
Average: 15.1676
Max: 425.0
Min: 0.0
StdDev: 27.8667

An average of 15 CPL is quite an achievement. That’s a grandmaster for you!

Side quest 2

I was also curious to analyze the game with the 425 CPL. It’s the following one:

1. e2e4 c7c5 {CPL: 9.0} {Better was e7e5} 2. g1f3 e7e6 {CPL: 2.0} {Better was d7d6} 3. d2d4 c5d4 {CPL: 0.0} 4. f3d4 b8c6 {CPL: 9.0} {Better was g8f6} 5. b1c3 d8c7 {CPL: 0.0} {Better was g8f6} 6. c1e3 a7a6 {CPL: 0.0} 7. f1d3 b7b5 {CPL: 2.0} {Better was g8f6} 8. a2a3 c8b7 {CPL: 6.0} 9. d1e2 a8c8 {CPL: 24.0} {Better was c6e5} 10. e1g1 c6e5 {CPL: 27.0} {Better was g8f6} 11. a1c1 e5c4 {CPL: 3.0} {Better was g8f6} 12. d4b5 a6b5 {CPL: 29.0} 13. c3b5 c7c6 {CPL: 425.0} {Better was c7e5} 14. b5a7 c6c7 {CPL: 0.0} 15. a7c8 c4e3 {CPL: 6.0}

Now that I see it like that, I actually remember this game and the chuckle I had to Aman’s reactions. I had to find it on YouTube again. Aman makes the mistake here and then discusses it in more detail after the game, including his surprise at the engine’s funny solution: “The old engine move heh…it’s kind of gross”.

Before the move Qc6, the evaluation was +2.61 for Aman, and after it was 1.75 in favor of his opponent. This shows our local game analysis is reliable and matches the conclusion of the game review. Now, in the game, Aman managed to find a nice move and equalize no long after, before winning the game.

Conclusion

This approach bridges the gap between theory and practice. Instead of memorizing perfect lines, we learn:

Practical responses to common moves below grandmaster level
How to punish inaccuracies
Real game examples from 800 to 2300
Learn moves from a player that matches our playing style.

Code

The complete script is shown below.

import time

import chess.pgn
import chess
from chessdotcom import get_player_game_archives, get_player_games_by_month, Client
from io import StringIO
from collections import defaultdict
import argparse
import logging
from tqdm import tqdm
import json
import os
from datetime import datetime

logging.basicConfig(level=logging.INFO)


class RepertoireBuilder:
    def __init__(self, username, color, max_moves=15, cache_dir="cache", engine_path="/opt/homebrew/bin/stockfish", depth=16):
        self.username = username
        self.color = color.lower()
        self.max_moves = max_moves
        self.variations = defaultdict(list)
        self.positions = defaultdict(list)
        self.cache_dir = cache_dir
        self.cache_file = os.path.join(cache_dir, f"{username}_games.json")

        # Create cache directory if it doesn't exist
        if not os.path.exists(cache_dir):
            os.makedirs(cache_dir)

        self.engine_path = engine_path
        self.depth = depth
        self.engine = None

    def analyze_position(self, board):
        """Analyze a position and return best move and evaluation."""
        try:
            result = self.engine.analyse(board, chess.engine.Limit(depth=self.depth))
            score = result['score'].white().score(mate_score=1000)
            best_move = result['pv'][0]
            return score, best_move
        except Exception as e:
            logging.error(f"Engine analysis error: {e}")
            return None, None

    def calculate_centipawn_loss(self, board, move_made, prev_score):
        """Calculate centipawn loss for a move."""
        if prev_score is None:
            return 0, None

        # Analyze position after the move
        board.push(move_made)
        after_score, _ = self.analyze_position(board)
        board.pop()

        if after_score is None:
            return 0, None

        # Calculate loss based on player's color
        is_white = board.turn
        if is_white:
            loss = max(0, prev_score - after_score)
        else:
            loss = max(0, after_score - prev_score)

        return loss, after_score

    def process_game(self, game):
        """Process a single game with engine analysis."""
        pgn = StringIO(game['pgn'])
        chess_game = chess.pgn.read_game(pgn)

        # Determine if our player is white or black
        is_white = game['white']['username'].lower() == self.username.lower()
        if (is_white and self.color == 'black') or (not is_white and self.color == 'white'):
            return

        board = chess.Board()
        moves = []
        annotations = []
        prev_score = None

        for move_num, move in enumerate(chess_game.mainline_moves()):
            if move_num >= self.max_moves * 2:  # Both players' moves
                break

            # Only analyze our player's moves
            is_our_move = (is_white and board.turn) or (not is_white and not board.turn)

            if is_our_move:
                # Analyze position before move
                curr_score, best_move = self.analyze_position(board)

                # Calculate loss
                loss, new_score = self.calculate_centipawn_loss(board, move, curr_score)

                annotation = f" {{CPL: {loss:.1f}}}"
                if best_move != move:
                    annotation += f" {{Better was {best_move}}}"
                annotations.append((move_num // 2 + 1, annotation))

                prev_score = new_score

            moves.append(move)
            board.push(move)

        # Store the variation with annotations
        move_str = self.moves_to_pgn(moves, annotations)
        position_key = board.fen()
        self.variations[position_key].append({
            'moves': move_str,
            'url': game['url'],
            'date': game['end_time'],
            'eval': prev_score
        })
        self.positions[len(moves)].append(move_str)

    def load_cached_games(self):
        """Load games from cache if available."""
        if os.path.exists(self.cache_file):
            try:
                with open(self.cache_file, 'r') as f:
                    cached_data = json.load(f)
                logging.info(f"Loaded {len(cached_data['games'])} games from cache")
                return cached_data
            except Exception as e:
                logging.error(f"Error loading cache: {e}")
                return {'last_update': None, 'games': []}
        return {'last_update': None, 'games': []}

    def save_games_to_cache(self, games):
        """Save games to cache file."""
        cache_data = {
            'last_update': datetime.now().isoformat(),
            'games': games
        }
        try:
            with open(self.cache_file, 'w') as f:
                json.dump(cache_data, f)
            logging.info(f"Saved {len(games)} games to cache")
        except Exception as e:
            logging.error(f"Error saving to cache: {e}")

    def get_player_archives(self):
        """Fetch all game archives for the player."""
        try:
            archives = get_player_game_archives(self.username).json['archives']
            return archives
        except Exception as e:
            logging.error(f"Error fetching archives: {e}")
            return []

    def download_new_games(self, cached_data):
        """Download new games and merge with cached games."""
        archives = self.get_player_archives()
        all_games = cached_data['games']

        # Create a set of existing game IDs for quick lookup
        existing_game_ids = {game['url'] for game in all_games}
        new_games = []

        for archive in tqdm(archives, desc="Processing archives"):
            try:
                year_month = archive.split('/')[-2:]
                games = get_player_games_by_month(self.username,
                                                  int(year_month[0]),
                                                  int(year_month[1])).json['games']

                for game in games:
                    if game['url'] not in existing_game_ids:
                        new_games.append(game)
                        all_games.append(game)
                        existing_game_ids.add(game['url'])
                time.sleep(0.5)


            except Exception as e:
                logging.error(f"Error processing archive {archive}: {e}")
                continue


        if new_games:
            logging.info(f"Downloaded {len(new_games)} new games")
            self.save_games_to_cache(all_games)

        return all_games

    def moves_to_pgn(self, moves, annotations):
        """Convert moves to PGN with annotations."""
        pgn_moves = []
        for i, move in enumerate(moves):
            if i % 2 == 0:
                pgn_moves.append(f"{i // 2 + 1}. {move.uci()}")
            else:
                pgn_moves.append(move.uci())

            # Add annotation if it exists for this move
            for move_num, annotation in annotations:
                if move_num == (i // 2 + 1):
                    pgn_moves[-1] += annotation

        return " ".join(pgn_moves)

    def print_repertoire(self):
        """Print repertoire with analysis."""
        print("\nUnique lines in repertoire:")
        for move_number in sorted(self.positions.keys()):
            unique_lines = set(self.positions[move_number])
            for line in unique_lines:
                print(f"{line}")

        print("\nConflicts found:")
        for position, variations in self.variations.items():
            if len(variations) > 1:
                # Group variations by next move
                moves_dict = defaultdict(list)
                for var in variations:
                    moves = var['moves'].split()
                    last_move = moves[-1]
                    moves_dict[last_move].append({
                        'full_line': var['moves'],
                        'url': var['url'],
                        'date': var['date'],
                        'eval': var.get('eval')
                    })

                if len(moves_dict) > 1:
                    print(f"\nPosition after: {list(moves_dict.values())[0][0]['full_line'].rsplit(' ', 1)[0]}")
                    print("Different moves played in this position:")

                    # Sort variations by evaluation
                    sorted_moves = sorted(moves_dict.items(),
                                          key=lambda x: max(g['eval'] for g in x[1]) if any(
                                              g['eval'] is not None for g in x[1]) else float('-inf'),
                                          reverse=True)

                    for move, games in sorted_moves:
                        print(
                            f"\n  Move: {move} (Evaluation: {max(g['eval'] for g in games if g['eval'] is not None):.2f})")
                        print("  Games:")
                        for game in games:
                            date_str = datetime.fromtimestamp(int(game['date'])).strftime('%Y-%m-%d') if game[
                                'date'] else 'Unknown date'
                            print(f"    - {game['url']} ({date_str})")

    def build_repertoire(self):
        """Build repertoire with engine analysis."""
        try:
            self.engine = chess.engine.SimpleEngine.popen_uci(self.engine_path)
            cached_data = self.load_cached_games()
            all_games = self.download_new_games(cached_data)

            for game in tqdm(all_games, desc="Building repertoire"):
                self.process_game(game)
        finally:
            if self.engine:
                self.engine.quit()


def main():
    parser = argparse.ArgumentParser(description='Build a chess repertoire from a player\'s games')
    parser.add_argument('--username', help='Chess.com username')
    parser.add_argument('--color', choices=['white', 'black'], required=True,
                        help='Color to build repertoire for')
    parser.add_argument('--max-moves', type=int, default=15,
                        help='Maximum number of moves to include (default: 15)')
    parser.add_argument('--cache-dir', default='cache',
                        help='Directory to store cached games (default: cache)')

    args = parser.parse_args()

    """Client.request_config["headers"]["User-Agent"] = (
        "xxx "
        "xxx
    )"""

    builder = RepertoireBuilder(args.username, args.color, args.max_moves, args.cache_dir)
    builder.build_repertoire()
    builder.print_repertoire()


if __name__ == "__main__":
    main()

And the hacky PGN filter:

import chess.pgn
import io
import argparse
import re

from dataclasses import dataclass
from typing import Optional, List
import re


@dataclass
class Move:
    move: str
    cpl: Optional[str] = None
    better_move: Optional[str] = None


@dataclass
class MovePair:
    number: int
    white: Move
    black: Move


class PGNParser:
    def __init__(self, color='black'):
        self.color = color

    def tokenize(self, pgn_str: str) -> List[str]:
        """Split into tokens while preserving annotations."""
        # First, normalize the string
        pgn_str = pgn_str.replace('}}.', '} }.')

        # Split on spaces while preserving annotation structure
        tokens = []
        current_token = []
        in_annotation = False

        for char in pgn_str:
            if char.isspace() and not in_annotation:
                if current_token:
                    tokens.append(''.join(current_token))
                    current_token = []
            else:
                current_token.append(char)
                if char == '{':
                    in_annotation = True
                elif char == '}':
                    in_annotation = False

        if current_token:
            tokens.append(''.join(current_token))

        return tokens

    def parse_move_with_annotations(self, tokens: List[str], start_idx: int) -> tuple[Move, int]:
        """Parse a move and its annotations."""
        move = tokens[start_idx]
        idx = start_idx + 1
        cpl = None
        better_move = None

        while idx < len(tokens):
            token = tokens[idx]
            if token.startswith('{CPL:'):
                cpl = token.split('CPL:')[1].rstrip('}').strip()
                idx += 1
            elif token.startswith('{Better'):
                better_move = token.split('was')[1].rstrip('}').strip()
                idx += 1
            else:
                break

        return Move(move, cpl, better_move), idx

    def parse(self, pgn_str: str) -> str:
        tokens = self.tokenize(pgn_str)
        result = []
        i = 0

        while i < len(tokens):
            # Handle move numbers
            if '.' in tokens[i]:
                move_num = tokens[i].rstrip('.')
                result.append(f"{move_num}.")
                i += 1

                # Parse white's move
                if i < len(tokens):
                    white_move, i = self.parse_move_with_annotations(tokens, i)
                    result.append(white_move.move)  # No annotations for white if we're black

                # Parse black's move
                if i < len(tokens):
                    black_move, i = self.parse_move_with_annotations(tokens, i)
                    move_str = black_move.move
                    if self.color == 'black':
                        if black_move.cpl:
                            move_str += f" {{CPL: {black_move.cpl}}}"
                        if black_move.better_move:
                            move_str += f" {{Better was {black_move.better_move}}}"
                    result.append(move_str)
            else:
                i += 1

        return ' '.join(result)


def clean_annotations(pgn_str: str, color: str = 'black') -> str:
    parser = PGNParser(color)
    return parser.parse(pgn_str)
class PGNFilter:
    def __init__(self, color='black'):
        self.color = color.lower()

    def is_sicilian(self, game_str):
        """Check if our first move was c5 (if Black) or opponent's first move was c5 (if White)"""
        # Remove all annotations first
        clean_str = re.sub(r'\{[^}]*\}', '', game_str)

        if self.color == 'black':
            # Look for Black's first move being c5
            pattern = r'1\.\s*\S+\s+(c7c5|c5)'
        else:
            # Look for White's first move being e4 and Black responding with c5
            pattern = r'1\.\s*(e2e4|e4)\s+(c7c5|c5)'

        return bool(re.search(pattern, clean_str))

def main():
    parser = argparse.ArgumentParser(description='Filter PGN file for Sicilian Defense games')
    parser.add_argument('input_file', help='Input PGN file')
    parser.add_argument('output_file', help='Output PGN file')
    parser.add_argument('--color', choices=['white', 'black'], default='black',
                        help='Color to preserve annotations for')

    args = parser.parse_args()

    pgn_filter = PGNFilter(args.color)

    with open(args.input_file, 'r') as f:
        content = f.read()

    # Split into individual games if multiple games exist
    games = content.split('\n')
    sicilian_games = []
    non_sicilian_games = []

    for game in games:
        if game.strip():  # Only process non-empty games
            if pgn_filter.is_sicilian(game):
                # Here's where we were missing the call to clean_annotations
                cleaned_game = clean_annotations(game)
                sicilian_games.append(cleaned_game)
            else:
                non_sicilian_games.append(game)

    # Write filtered games to output file
    with open(args.output_file, 'w') as f:
        f.write('\n\n'.join(sicilian_games))

    print(f"Processed {len(games)} games")
    print(f"Found {len(sicilian_games)} Sicilian games")
    print(f"\nNon-Sicilian games ({len(non_sicilian_games)}):")
    """for i, game in enumerate(non_sicilian_games, 1):
        print(f"\nGame {i}:")
        print(game)
        print("-" * 80)
    """


if __name__ == "__main__":
    main()

Note: This is a proof of concept. The script requires a local Stockfish installation.