Automated Chess.com Game Review Extraction Using Python

Thu 05 September 2024

Table of contents:

Chess.com Game Analysis

Chess.com Game Analysis

chess.c*m exposes a Public API to download information about players and games, but the Game Review is not part of it. This blog posts shows how one can interact with this undocumented API and, as a Proof-of-Concept, extract the estimated Elo.

Note 1: the official terminology for this metric is “Effective Elo”, so this article will switch to that from now on.
Note 2: Game Review is paywalled and the script below does not try to go around that.

Methodology

I mainly used Burp Suite to reverse-engineer the workflow from Login to Game Review. A summary of it is available below.

Authenticate and get valid cookies.
Use the official API to retrieve recent game IDs.
Get a Game Analysis Token.
Websocket-based communication with the analysis server to trigger a game review.
Parse the JSON output and extract relevant data.

Cookies

The step 3 Get a Game Analysis Token requires calling the unofficial API endpoint /callback/auth/service/analysis. It returns a JSON object with a token attribute:

HTTP/2 200 OK
Content-Type: application/json


{"token":"XXXXX"}

This endpoint enforces a cookie-based authentication. We get cookies when we login on chess.c*m. However, doing this programmatically is not straightforward, as they try to prevent it. I will not include my solution to bypass this in the script at the end because I don’t want to cause them any harm, but for reproducibility reasons you can find a high level overview of my approach.

CAPTCHA bypass

I gave myself 30 minutes to break their captcha. I used Pillow and OpenCV to try and remove some noise before reading the text with pytesseract, but didn’t succeed quickly. I then resorted to some CAPTCHA bypass SaaS.

HTTP client fingerprinting

Getting past the CAPTCHA is not sufficient as there is the CloudFlare Security Check afterwards. It was at this step I decided to rethink my approach and do this differently.

Final solution

Instead, I opted for a frontend-testing tool that can automate a legitimate browser programmatically (I’m intentionally vague about this as I mentioned earlier).

Game Review

The Game Review works over a Websocket. It requires a Game Analysis Token, a Game ID, a PGN, the user’s color and some optional settings for the chess engine:

{
    "action": "gameAnalysis",
    "game": {
        "pgn": "[Event \"Live Chess\"]\n[Site \"Chess.com\"]...."
    },
    "options": {
        "caps2": true,
        "depth": 18,
        "engineType": "stockfish16 nnue",
        "source": {
            "gameId": "XXX",
            "gameType": "live",
            "url": "",
            "token": "XXX",
            "client": "web",
            "userTimeZone": "Europe/Berlin"
        },
        "strength": "Fast",
        "tep": {
            "ceeDebug": false,
            "classificationv3": true,
            "userColor": "white",
            "lang": "en_US",
            "speechv3": true
        }
    }
}

The server replies with several messages, indicating the analysis’ progress and then 50k lines of JSON:

DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.01,"engineTyp...nue","strength":"Fast"}' [87 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.02,"engineTyp...nue","strength":"Fast"}' [87 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.1428571428571...nue","strength":"Fast"}' [102 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.1632653061224...nue","strength":"Fast"}' [102 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.2040816326530...nue","strength":"Fast"}' [102 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.4081632653061...nue","strength":"Fast"}' [102 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.6938775510204...nue","strength":"Fast"}' [101 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.8367346938775...nue","strength":"Fast"}' [101 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.9387755102040...nue","strength":"Fast"}' [101 bytes]
DEBUG:websockets.client:< TEXT '{"action":"progress","progress":0.9999,"engineT...nue","strength":"Fast"}' [89 bytes]
DEBUG:websockets.client:< TEXT '{"action":"analyzeGame","data":{"startingFen":"...},"userId":xxx}}}}' [175818 bytes]
DEBUG:websockets.client:< TEXT '{"action":"done"}' [17 bytes]

Among the information available are:

Game shape (“arc”) such as throwaway, sharp, sudden, balanced, wild…
Coach’s explanations if you’re a Diamond member.
List of blunders, mistakes, inaccuracies and so on.
Principal variations for each move.
Pieces stats (how many times each player moved each piece).
Per piece CAPS!.
CAPS for the game stages.
Outcome prediction (white, black or draw)!
Themes: pins, forks, and so on.

Excerpt:

                "threat_evals": [],
                "suggestedMove": null,
                "playedMove": null,
                "nullMove": null,
                "bestMove": {
                    "speech": [
                        {
                            "sentence": [
                                "Opening with the King's pawn controls the center and opens up the light-squared bishop and queen, often leading to sharp games."
                            ],
                            "arrowsSquaresStringIndex": -1,
                            "arrows": [],
                            "squares": []
                        }
                    ],
                    "score": 0.16,
                    "depth": 40,
                    "mateIn": null,
                    "moveLan": "e2e4",
                    "eval": {
                        "cp": 16,
                        "pv": [

Technical Implementation

The (almost) complete script is available below. It takes care of all the steps described above, except for the login, left as an exercise to the reader.

import traceback
from datetime import datetime
from os import environ

import asyncio
import websockets
import json
import chessdotcom
import requests
import logging


logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

session = requests.session()

proxy = {
    "http": "http://localhost:8080",
    "https": "http://localhost:8080"
}

session = requests.session()

async def analyze_game(token, gameId, userColor, pgn):
    effective_elo = None
    websocket_url = "wss://analysis.chess.com/"

    message = {
        "action": "gameAnalysis",
        "game": {"pgn": pgn},
        "options": {
            "caps2": True,
            "depth": 18,
            "engineType": "stockfish16 nnue",
            "source": {
                "gameId": gameId,
                "gameType": "live",
                "url": "",
                "token": token,
                "client": "web",
                "userTimeZone": "Europe/Berlin"
            },
            "strength": "Fast",
            "tep": {
                "ceeDebug": False,
                "classificationv3": True,
                "userColor": userColor,
                "lang": "en_US",
                "speechv3": True
            }
        }
    }

    async with websockets.connect(websocket_url, extra_headers={"Origin": "https://www.chess.com"}) as websocket:
        await websocket.send(json.dumps(message))
        logging.debug("Message sent")

        try:
            while True:
                response = await websocket.recv()
                try:
                    parsed_response = json.loads(response)
                    if 'action' in parsed_response and parsed_response['action'] == 'analyzeGame':
                        report_card = parsed_response['data']['reportCard']
                        effective_elo = report_card[userColor]['effectiveElo']
                        return effective_elo
                except json.JSONDecodeError:
                    pass
        except websockets.exceptions.ConnectionClosed:
            logging.debug("WebSocket connection closed")

    return effective_elo


def get_analysis_token(game_id):

    burp0_url = f"https://www.chess.com:443/callback/auth/service/analysis?game_id={game_id}&game_type=live"
    r = session.get(burp0_url)
    logging.debug(r.json())
    return r.json()['token']


def get_latest_games(username):

    contact_me_at = "Contact me at xxx"
    contact_me_at.replace("xxx", environ.get("EMAIL"))
    chessdotcom.Client.request_config["headers"]["User-Agent"] = contact_me_at

    current_year = datetime.now().year
    current_month = datetime.now().month
    games = chessdotcom.get_player_games_by_month(username, current_year, current_month)
    return games.json['games']


async def analyze_games(username, games):
    total_elo = 0
    analyzed_games = 0

    for game in games[:100]:
        game_id = game['url'].split('/')[-1]
        user_color = "white" if game['white']['username'] == username else "black"

        token = get_analysis_token(game_id)
        logging.debug(f"Analysis token:  {token}")
        pgn = game['pgn']

        effective_elo = await analyze_game(token, game_id, user_color, pgn)
        if effective_elo:
            total_elo += effective_elo
            analyzed_games += 1
            logging.info(f"Game {game_id} analyzed. Effective Elo: {effective_elo}")


    if analyzed_games > 0:
        average_elo = total_elo / analyzed_games
        logging.info(f"Average Effective Elo over {analyzed_games} games: {average_elo}")
    else:
        logging.info("No games were successfully analyzed.")


def do_login(username, password):
    cookies = []

    """
    TODO: exercise left to the reader
    """

    return cookies


def main():

    username = "volodjah"
    login = environ.get("CHESSCOM_LOGIN")
    password = environ.get("CHESSCOM_PASSWORD")

    if not login or not password or not environ.get("EMAIL"):
        logging.critical("Please set the CHESSCOM_LOGIN, CHESSCOM_PASSWORD and EMAIL environment variables.")
        return

    cookies = selenium_login(login, password)
    logging.info("Cookies: ", cookies)

    for cookie in cookies:
        session.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])

    games = get_latest_games(username)

    asyncio.run(analyze_games(username, games))


if __name__ == '__main__':

    main()

Data Analysis

The script calculates the mean effective Elo over a set of analyzed games. This provides a quantitative measure of recent performance that we can then plot to provide a little bit of insights beyond traditional Elo ratings.

Results

Preliminary testing shows successful extraction of effective Elo ratings from individual games. Further statistical analysis of these results could yield interesting patterns in performance fluctuations. Chess.c*om‘s Insights has done some work in this direction but they didn’t account for imbalances in the data.

Conclusion

The current implementation is limited by the user’s membership level. However leveraging this data opens up quite a few possibilities:

Developing a time-series analysis of effective Elo trends.
Correlation study of move/game performance against various factors.
Extract features / metrics for chess coaches.