Learning Chess Openings from GMs Speedruns
Introduction
Have you ever spent hours memorizing chess opening lines, only to fall apart when your opponent deviates from the book? You’re not alone.
Chess books teach openings assuming perfect play from both sides. Reality is different, and chess books are boring anyway.
I’ve stumbled upon GMs such as Hansen and Aman (chessbrah) doing speedruns on chess.com. They create new accounts starting at lower ratings (400 or 800 Elo) and climb their way up to 2000 Elo while streaming their games to Twitch / YouTube.
Since these games are still available on chess.com, we’ve got a goldmine of data we can use to practice and speed up our improvement.
Now you might be wondering two things:
- What’s the point of checking out these moves instead of just using the Game Review?
- How do you practice from a PGN such as this anyway?
And my answers are:
- These are human moves with understandable ideas behind them.
- Either import it into a Lichess study, or stay tuned for my mobile app that is coming out (soon^tm) :D.
Indeed, the moves we’ll be practicing against will have a higher likelihood of being relevant to our games since they are moves played intuitively by players at our level.
Besides, playing the “best” move isn’t worth much if you don’t understand it: depending on your Elo, you’ll probably end up in a sharp position with only 1 or 2 good moves to keep the advantage, and you may not have the skill / time to find it.
So, the goal of this blog post is to show how I downloaded all the games from the account vonamiat (GM Aman Hambleton’s speedrun account for the Taimanov Sicilian), filtered for games where he played as black and actually played the Taimanov.
In addition, I’ve analyzed the first 15 moves with stockfish to check if Aman actually played a good move. The reason is that I want to avoid building a repertoire with mistakes in it. While Aman is perfectly able to win against a 1500 Elo player down a rook, I am not.
Methodology
- Download games from chess.com Public API
- Cache them to disk in case I have to fix stuff in my script and re-run it.
- Filter games by opening
- Run Stockfish analysis on the player’s moves
- Build a repertoire in PGN format
- Handle move conflicts
Technical Implementation
The script takes a username and color as input. For each game:
- Check if the target player played target color
- Extract moves up to move 15 (this can be configured)
- Run engine analysis on the player’s moves
- Store CPL (centipawn loss) and better moves if any
Example of analysis output:
Processed 182 games
Found 106 Sicilian games
Non-Sicilian games (74)
PGN:
1. e2e4 c7c5 {CPL: 0.0} {Better was e7e5} 2. g1f3 e7e6 {CPL: 5.0} 3. f1c4 a7a6 {CPL: 14.0} {Better was g8f6} 4. d2d4 b7b5 {CPL: 0.0} 5. c4e2 c5d4 {CPL: 17.0} 6. d1d4 b8c6 {CPL: 0.0} 7. d4d1 c8b7 {CPL: 3.0} {Better was g8f6} 8. c1g5 d8c7 {CPL: 32.0} {Better was g8f6} 9. b1d2 a8c8 {CPL: 41.0} {Better was g8f6} 10. e1g1 g8f6 {CPL: 0.0} 11. g2g3 f8e7 {CPL: 5.0} {Better was h7h6} 12. g5f4 d7d6 {CPL: 18.0} {Better was c7d8} 13. a2a4 b5b4 {CPL: 0.0} 14. a4a5 e6e5 {CPL: 50.0} {Better was c6a5} 15. f4e3 c6a5 {CPL: 4.0}
...(105 more lines)
To be honest, I had a brain fart and forgot to filter games before analyzing them:
python3 repertoire.py --username vonamiat --color black --max-moves 15
INFO:root:Loaded 360 games from cache
Processing archives: 100%|██████████| 5/5 [00:03<00:00, 1.36it/s]
Building repertoire: 15%|█▌ | 54/360 [02:06<15:25, 3.02s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire: 38%|███▊ | 136/360 [04:48<09:01, 2.42s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire: 40%|███▉ | 143/360 [05:00<06:40, 1.85s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire: 59%|█████▉ | 214/360 [07:43<05:22, 2.21s/it]ERROR:root:Engine analysis error: 'pv'
Building repertoire: 100%|██████████| 360/360 [13:26<00:00, 2.24s/it]
Unique lines in repertoire:
...
Since it took 13 minutes to analyze the 360 games played by Aman and I didn’t want to wait again, I decided to ask Claude to write a parser for the resulting PGN and filter out games where Aman did not play C7C5 (a quick heuristic to find the Taimanov games). However, for more realistic scenarios, you might want to write a proper filter to classify openings, since many games can start with C7C5 and not be a Sicilian Taimanov at all.
Results
For each position, we get:
- The move played
- Centipawn loss
- Better alternatives if the move wasn’t optimal
In total, I got 106 lines ready to be memorized. The usefulness of memorizing 106 lines 15 moves deep is questionable, but hey everything in the script is configurable. To improve it, it’s probably more useful to run intermediate steps:
- Analyze your own games and find out after which move number, in average, you’re starting to loose in your favorite opening.
- Build a repertoire this move number deep and only increase when you’re familiar enough with it.
Side quest 1
For my own curiosity, I checked the mistakes made by Aman in the first 15 moves across his 106 Taimanov games. I used uniq
to aggregate them and have a quick overview of the different scores:
grep -o "CPL: [0-9.]*" output.pgn | cut -d' ' -f2 | sort -nr | uniq
425.0
276.0
264.0
251.0
228.0
161.0
158.0
154.0
153.0
149.0
138.0
135.0
126.0
122.0
120.0
119.0
115.0
113.0
111.0
110.0
108.0
And some more interesting stats below:
grep -o "CPL: [0-9.]*" output.pgn | cut -d' ' -f2 | sort -n | awk '
BEGIN { print "CPL Statistics:" }
{
sum += $1
sumsq += $1 * $1
values[NR] = $1
}
END {
print "Count:", NR
print "Average:", sum/NR
print "Max:", values[NR]
print "Min:", values[1]
print "StdDev:", sqrt(sumsq/NR - (sum/NR)^2)
}'
CPL Statistics:
Count: 1545
Average: 15.1676
Max: 425.0
Min: 0.0
StdDev: 27.8667
An average of 15 CPL is quite an achievement. That’s a grandmaster for you!
Side quest 2
I was also curious to analyze the game with the 425 CPL. It’s the following one:
1. e2e4 c7c5 {CPL: 9.0} {Better was e7e5} 2. g1f3 e7e6 {CPL: 2.0} {Better was d7d6} 3. d2d4 c5d4 {CPL: 0.0} 4. f3d4 b8c6 {CPL: 9.0} {Better was g8f6} 5. b1c3 d8c7 {CPL: 0.0} {Better was g8f6} 6. c1e3 a7a6 {CPL: 0.0} 7. f1d3 b7b5 {CPL: 2.0} {Better was g8f6} 8. a2a3 c8b7 {CPL: 6.0} 9. d1e2 a8c8 {CPL: 24.0} {Better was c6e5} 10. e1g1 c6e5 {CPL: 27.0} {Better was g8f6} 11. a1c1 e5c4 {CPL: 3.0} {Better was g8f6} 12. d4b5 a6b5 {CPL: 29.0} 13. c3b5 c7c6 {CPL: 425.0} {Better was c7e5} 14. b5a7 c6c7 {CPL: 0.0} 15. a7c8 c4e3 {CPL: 6.0}
Now that I see it like that, I actually remember this game and the chuckle I had to Aman’s reactions. I had to find it on YouTube again. Aman makes the mistake here and then discusses it in more detail after the game, including his surprise at the engine’s funny solution: “The old engine move heh…it’s kind of gross”.
Before the move Qc6, the evaluation was +2.61 for Aman, and after it was 1.75 in favor of his opponent. This shows our local game analysis is reliable and matches the conclusion of the game review. Now, in the game, Aman managed to find a nice move and equalize no long after, before winning the game.
Conclusion
This approach bridges the gap between theory and practice. Instead of memorizing perfect lines, we learn:
- Practical responses to common moves below grandmaster level
- How to punish inaccuracies
- Real game examples from 800 to 2300
- Learn moves from a player that matches our playing style.
Code
The complete script is shown below.
import time
import chess.pgn
import chess
from chessdotcom import get_player_game_archives, get_player_games_by_month, Client
from io import StringIO
from collections import defaultdict
import argparse
import logging
from tqdm import tqdm
import json
import os
from datetime import datetime
logging.basicConfig(level=logging.INFO)
class RepertoireBuilder:
def __init__(self, username, color, max_moves=15, cache_dir="cache", engine_path="/opt/homebrew/bin/stockfish", depth=16):
self.username = username
self.color = color.lower()
self.max_moves = max_moves
self.variations = defaultdict(list)
self.positions = defaultdict(list)
self.cache_dir = cache_dir
self.cache_file = os.path.join(cache_dir, f"{username}_games.json")
# Create cache directory if it doesn't exist
if not os.path.exists(cache_dir):
os.makedirs(cache_dir)
self.engine_path = engine_path
self.depth = depth
self.engine = None
def analyze_position(self, board):
"""Analyze a position and return best move and evaluation."""
try:
result = self.engine.analyse(board, chess.engine.Limit(depth=self.depth))
score = result['score'].white().score(mate_score=1000)
best_move = result['pv'][0]
return score, best_move
except Exception as e:
logging.error(f"Engine analysis error: {e}")
return None, None
def calculate_centipawn_loss(self, board, move_made, prev_score):
"""Calculate centipawn loss for a move."""
if prev_score is None:
return 0, None
# Analyze position after the move
board.push(move_made)
after_score, _ = self.analyze_position(board)
board.pop()
if after_score is None:
return 0, None
# Calculate loss based on player's color
is_white = board.turn
if is_white:
loss = max(0, prev_score - after_score)
else:
loss = max(0, after_score - prev_score)
return loss, after_score
def process_game(self, game):
"""Process a single game with engine analysis."""
pgn = StringIO(game['pgn'])
chess_game = chess.pgn.read_game(pgn)
# Determine if our player is white or black
is_white = game['white']['username'].lower() == self.username.lower()
if (is_white and self.color == 'black') or (not is_white and self.color == 'white'):
return
board = chess.Board()
moves = []
annotations = []
prev_score = None
for move_num, move in enumerate(chess_game.mainline_moves()):
if move_num >= self.max_moves * 2: # Both players' moves
break
# Only analyze our player's moves
is_our_move = (is_white and board.turn) or (not is_white and not board.turn)
if is_our_move:
# Analyze position before move
curr_score, best_move = self.analyze_position(board)
# Calculate loss
loss, new_score = self.calculate_centipawn_loss(board, move, curr_score)
annotation = f" {{CPL: {loss:.1f}}}"
if best_move != move:
annotation += f" {{Better was {best_move}}}"
annotations.append((move_num // 2 + 1, annotation))
prev_score = new_score
moves.append(move)
board.push(move)
# Store the variation with annotations
move_str = self.moves_to_pgn(moves, annotations)
position_key = board.fen()
self.variations[position_key].append({
'moves': move_str,
'url': game['url'],
'date': game['end_time'],
'eval': prev_score
})
self.positions[len(moves)].append(move_str)
def load_cached_games(self):
"""Load games from cache if available."""
if os.path.exists(self.cache_file):
try:
with open(self.cache_file, 'r') as f:
cached_data = json.load(f)
logging.info(f"Loaded {len(cached_data['games'])} games from cache")
return cached_data
except Exception as e:
logging.error(f"Error loading cache: {e}")
return {'last_update': None, 'games': []}
return {'last_update': None, 'games': []}
def save_games_to_cache(self, games):
"""Save games to cache file."""
cache_data = {
'last_update': datetime.now().isoformat(),
'games': games
}
try:
with open(self.cache_file, 'w') as f:
json.dump(cache_data, f)
logging.info(f"Saved {len(games)} games to cache")
except Exception as e:
logging.error(f"Error saving to cache: {e}")
def get_player_archives(self):
"""Fetch all game archives for the player."""
try:
archives = get_player_game_archives(self.username).json['archives']
return archives
except Exception as e:
logging.error(f"Error fetching archives: {e}")
return []
def download_new_games(self, cached_data):
"""Download new games and merge with cached games."""
archives = self.get_player_archives()
all_games = cached_data['games']
# Create a set of existing game IDs for quick lookup
existing_game_ids = {game['url'] for game in all_games}
new_games = []
for archive in tqdm(archives, desc="Processing archives"):
try:
year_month = archive.split('/')[-2:]
games = get_player_games_by_month(self.username,
int(year_month[0]),
int(year_month[1])).json['games']
for game in games:
if game['url'] not in existing_game_ids:
new_games.append(game)
all_games.append(game)
existing_game_ids.add(game['url'])
time.sleep(0.5)
except Exception as e:
logging.error(f"Error processing archive {archive}: {e}")
continue
if new_games:
logging.info(f"Downloaded {len(new_games)} new games")
self.save_games_to_cache(all_games)
return all_games
def moves_to_pgn(self, moves, annotations):
"""Convert moves to PGN with annotations."""
pgn_moves = []
for i, move in enumerate(moves):
if i % 2 == 0:
pgn_moves.append(f"{i // 2 + 1}. {move.uci()}")
else:
pgn_moves.append(move.uci())
# Add annotation if it exists for this move
for move_num, annotation in annotations:
if move_num == (i // 2 + 1):
pgn_moves[-1] += annotation
return " ".join(pgn_moves)
def print_repertoire(self):
"""Print repertoire with analysis."""
print("\nUnique lines in repertoire:")
for move_number in sorted(self.positions.keys()):
unique_lines = set(self.positions[move_number])
for line in unique_lines:
print(f"{line}")
print("\nConflicts found:")
for position, variations in self.variations.items():
if len(variations) > 1:
# Group variations by next move
moves_dict = defaultdict(list)
for var in variations:
moves = var['moves'].split()
last_move = moves[-1]
moves_dict[last_move].append({
'full_line': var['moves'],
'url': var['url'],
'date': var['date'],
'eval': var.get('eval')
})
if len(moves_dict) > 1:
print(f"\nPosition after: {list(moves_dict.values())[0][0]['full_line'].rsplit(' ', 1)[0]}")
print("Different moves played in this position:")
# Sort variations by evaluation
sorted_moves = sorted(moves_dict.items(),
key=lambda x: max(g['eval'] for g in x[1]) if any(
g['eval'] is not None for g in x[1]) else float('-inf'),
reverse=True)
for move, games in sorted_moves:
print(
f"\n Move: {move} (Evaluation: {max(g['eval'] for g in games if g['eval'] is not None):.2f})")
print(" Games:")
for game in games:
date_str = datetime.fromtimestamp(int(game['date'])).strftime('%Y-%m-%d') if game[
'date'] else 'Unknown date'
print(f" - {game['url']} ({date_str})")
def build_repertoire(self):
"""Build repertoire with engine analysis."""
try:
self.engine = chess.engine.SimpleEngine.popen_uci(self.engine_path)
cached_data = self.load_cached_games()
all_games = self.download_new_games(cached_data)
for game in tqdm(all_games, desc="Building repertoire"):
self.process_game(game)
finally:
if self.engine:
self.engine.quit()
def main():
parser = argparse.ArgumentParser(description='Build a chess repertoire from a player\'s games')
parser.add_argument('--username', help='Chess.com username')
parser.add_argument('--color', choices=['white', 'black'], required=True,
help='Color to build repertoire for')
parser.add_argument('--max-moves', type=int, default=15,
help='Maximum number of moves to include (default: 15)')
parser.add_argument('--cache-dir', default='cache',
help='Directory to store cached games (default: cache)')
args = parser.parse_args()
"""Client.request_config["headers"]["User-Agent"] = (
"xxx "
"xxx
)"""
builder = RepertoireBuilder(args.username, args.color, args.max_moves, args.cache_dir)
builder.build_repertoire()
builder.print_repertoire()
if __name__ == "__main__":
main()
And the hacky PGN filter:
import chess.pgn
import io
import argparse
import re
from dataclasses import dataclass
from typing import Optional, List
import re
@dataclass
class Move:
move: str
cpl: Optional[str] = None
better_move: Optional[str] = None
@dataclass
class MovePair:
number: int
white: Move
black: Move
class PGNParser:
def __init__(self, color='black'):
self.color = color
def tokenize(self, pgn_str: str) -> List[str]:
"""Split into tokens while preserving annotations."""
# First, normalize the string
pgn_str = pgn_str.replace('}}.', '} }.')
# Split on spaces while preserving annotation structure
tokens = []
current_token = []
in_annotation = False
for char in pgn_str:
if char.isspace() and not in_annotation:
if current_token:
tokens.append(''.join(current_token))
current_token = []
else:
current_token.append(char)
if char == '{':
in_annotation = True
elif char == '}':
in_annotation = False
if current_token:
tokens.append(''.join(current_token))
return tokens
def parse_move_with_annotations(self, tokens: List[str], start_idx: int) -> tuple[Move, int]:
"""Parse a move and its annotations."""
move = tokens[start_idx]
idx = start_idx + 1
cpl = None
better_move = None
while idx < len(tokens):
token = tokens[idx]
if token.startswith('{CPL:'):
cpl = token.split('CPL:')[1].rstrip('}').strip()
idx += 1
elif token.startswith('{Better'):
better_move = token.split('was')[1].rstrip('}').strip()
idx += 1
else:
break
return Move(move, cpl, better_move), idx
def parse(self, pgn_str: str) -> str:
tokens = self.tokenize(pgn_str)
result = []
i = 0
while i < len(tokens):
# Handle move numbers
if '.' in tokens[i]:
move_num = tokens[i].rstrip('.')
result.append(f"{move_num}.")
i += 1
# Parse white's move
if i < len(tokens):
white_move, i = self.parse_move_with_annotations(tokens, i)
result.append(white_move.move) # No annotations for white if we're black
# Parse black's move
if i < len(tokens):
black_move, i = self.parse_move_with_annotations(tokens, i)
move_str = black_move.move
if self.color == 'black':
if black_move.cpl:
move_str += f" {{CPL: {black_move.cpl}}}"
if black_move.better_move:
move_str += f" {{Better was {black_move.better_move}}}"
result.append(move_str)
else:
i += 1
return ' '.join(result)
def clean_annotations(pgn_str: str, color: str = 'black') -> str:
parser = PGNParser(color)
return parser.parse(pgn_str)
class PGNFilter:
def __init__(self, color='black'):
self.color = color.lower()
def is_sicilian(self, game_str):
"""Check if our first move was c5 (if Black) or opponent's first move was c5 (if White)"""
# Remove all annotations first
clean_str = re.sub(r'\{[^}]*\}', '', game_str)
if self.color == 'black':
# Look for Black's first move being c5
pattern = r'1\.\s*\S+\s+(c7c5|c5)'
else:
# Look for White's first move being e4 and Black responding with c5
pattern = r'1\.\s*(e2e4|e4)\s+(c7c5|c5)'
return bool(re.search(pattern, clean_str))
def main():
parser = argparse.ArgumentParser(description='Filter PGN file for Sicilian Defense games')
parser.add_argument('input_file', help='Input PGN file')
parser.add_argument('output_file', help='Output PGN file')
parser.add_argument('--color', choices=['white', 'black'], default='black',
help='Color to preserve annotations for')
args = parser.parse_args()
pgn_filter = PGNFilter(args.color)
with open(args.input_file, 'r') as f:
content = f.read()
# Split into individual games if multiple games exist
games = content.split('\n')
sicilian_games = []
non_sicilian_games = []
for game in games:
if game.strip(): # Only process non-empty games
if pgn_filter.is_sicilian(game):
# Here's where we were missing the call to clean_annotations
cleaned_game = clean_annotations(game)
sicilian_games.append(cleaned_game)
else:
non_sicilian_games.append(game)
# Write filtered games to output file
with open(args.output_file, 'w') as f:
f.write('\n\n'.join(sicilian_games))
print(f"Processed {len(games)} games")
print(f"Found {len(sicilian_games)} Sicilian games")
print(f"\nNon-Sicilian games ({len(non_sicilian_games)}):")
"""for i, game in enumerate(non_sicilian_games, 1):
print(f"\nGame {i}:")
print(game)
print("-" * 80)
"""
if __name__ == "__main__":
main()
Note: This is a proof of concept. The script requires a local Stockfish installation.