PAWN-Large

PAWN (Playstyle-Agnostic World-model Network for Chess) is a causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from uniformly random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.

This is the large variant (~68.4M parameters). PAWN is designed as a frozen backbone for parameter-efficient finetuning into player models with arbitrary playstyles.

GitHub Repository -- full source code, training scripts, adapter implementations, and documentation.

All Variants

Variant Parameters Link
PAWN-Small ~9.5M thomas-schweich/pawn-small
PAWN (Base) ~35.8M thomas-schweich/pawn-base
PAWN-Large ~68.4M thomas-schweich/pawn-large

Headline Metrics

Metric Value
Legal move rate 99.89%
Top-1 accuracy 6.95%
Top-5 accuracy 27.73%
Val loss 3.092

Accuracy Ratios

PAWN is trained on uniformly random chess games, so top-1 accuracy has a hard theoretical ceiling. Ratios above 100% on the unconditioned ceiling indicate the model has learned structure beyond simply identifying legal moves. See Accuracy Ceiling Analysis.

Ceiling Ratio
Unconditioned (E[1/N_legal] = 6.43%) 108%
Naive-conditioned (1-ply filter = 6.44%) 108%
Bayes-optimal conditioned (MCTS, 32 rollouts = 7.92%) 88%

Probe Results

Linear probes trained on frozen hidden states measure how well the model's internal representations encode board-level features.

Probe Accuracy Description
Piece type 90.3% Per-square piece type (13 classes x 64 squares)
Side to move 100.0% Whose turn it is
Is check 93.9% Whether the side to move is in check
Castling rights 96.8% KQkq castling availability
En passant square 99.7% En passant target square (64 + none)
Material count 86.9% (MAE 5.1) Piece counts per type per color
Legal move count 43.9% (MAE 6.5) Number of legal moves available
Halfmove clock 11.0% (MAE 4.0) Plies since last capture or pawn move
Game phase 91.3% Opening / middlegame / endgame

Diagnostic Results

Edge-case diagnostics measure the model's legal move rate in specific tactical situations.

Category Positions Legal Rate
In check 1000 98.4%
Double check 71 95.0%
Pin restricts movement 1000 97.9%
En passant available 940 99.4%
Castling legal (kingside) 1000 99.8%
Castling legal (queenside) 1000 99.7%
Castling blocked by check 892 99.5%
Promotion available 1000 99.6%
Checkmate (terminal) 276 92.2%
Stalemate (terminal) 41 94.9%

Architecture

Parameter Value
Architecture Decoder-only transformer
d_model 640
Layers 10
Attention heads 8
Head dimension 80
d_ff 2560
Parameters ~68.4M
Vocabulary 4,284 tokens
Context length 256 tokens
Normalization Pre-norm RMSNorm
FFN SwiGLU (4x expansion)
Positional encoding Rotary (RoPE, base 10000)
Embeddings Factored (src + dst + promo)
Dropout 0.0

Training Details

Parameter Value
Training data On-the-fly uniformly random legal games (no external dataset)
Objective Next-token cross-entropy (non-padding positions only)
Total steps 100,000
Batch size 256
Games seen 25,600,000
Learning rate 3e-4 (cosine decay with 1,000-step warmup)
Optimizer AdamW (weight decay 0.01)
Precision Mixed (AMP)
Hardware NVIDIA H200

Usage

Loading the model

import torch
from safetensors.torch import load_file
from pawn.config import CLMConfig
from pawn.model import PAWNCLM

cfg = CLMConfig.large()
model = PAWNCLM(cfg).cuda().eval()
weights = load_file("model.safetensors", device="cuda")
model.load_state_dict(weights)

Or load directly from HuggingFace:

from pawn.checkpoint import load_backbone_weights
from pawn.config import CLMConfig
from pawn.model import PAWNCLM

weights, config = load_backbone_weights("thomas-schweich/pawn-large")
cfg = CLMConfig.large()
model = PAWNCLM(cfg).eval()
model.load_state_dict(weights)

Finetuning with an adapter

uv run python scripts/train_bottleneck.py \
    --checkpoint thomas-schweich/pawn-large \
    --pgn thomas-schweich/pawn-lichess-full \
    --bottleneck-dim 32 --lr 1e-4 --local-checkpoints

Acknowledgments

PAWN builds on ideas and tools from the following projects and publications:

Component Reference
Transformer Vaswani et al., "Attention Is All You Need", NeurIPS 2017
RMSNorm Zhang & Sennrich, "Root Mean Square Layer Normalization", NeurIPS 2019
RoPE Su et al., "RoFormer: Enhanced Transformer with Rotary Position Embedding", 2021
SwiGLU Shazeer, "GLU Variants Improve Transformer", 2020
AdamW Loshchilov & Hutter, "Decoupled Weight Decay Regularization", ICLR 2019
Cosine schedule Loshchilov & Hutter, "SGDR: Stochastic Gradient Descent with Warm Restarts", ICLR 2017
Mixed precision Micikevicius et al., "Mixed Precision Training", ICLR 2018
Bottleneck adapters Houlsby et al., "Parameter-Efficient Transfer Learning for NLP", ICML 2019
LoRA Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models", ICLR 2022
FiLM Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer", AAAI 2018
RoSA Nikdan et al., "RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation", 2024
Linear probes Alain & Bengio, "Understanding Intermediate Layers Using Linear Classifier Probes", ICLR Workshop 2017
Intrinsic dimensionality Aghajanyan et al., "Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning", ACL 2021
MAIA McIlroy-Young et al., "Aligning Superhuman AI with Human Behavior: Chess as a Model System", KDD 2020
AlphaZero Silver et al., "A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play", Science 2018
Leela Chess Zero github.com/LeelaChessZero/lc0
shakmaty github.com/niklasf/shakmaty
PyO3 github.com/PyO3/pyo3
Lichess lichess.org / database.lichess.org

Citation

@software{schweich2026pawn,
  author = {Schweich, Thomas},
  title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
  year = {2026},
  url = {https://github.com/thomas-schweich/PAWN},
  license = {Apache-2.0}
}

License

Apache 2.0. See LICENSE.

Downloads last month
360
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including thomas-schweich/pawn-large

Papers for thomas-schweich/pawn-large

Evaluation results