PAWN-Small

PAWN (Playstyle-Agnostic World-model Network for Chess) is a causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from uniformly random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.

This is the small variant (~9.5M parameters). PAWN is designed as a frozen backbone for parameter-efficient finetuning into player models with arbitrary playstyles.

GitHub Repository -- full source code, training scripts, adapter implementations, and documentation.

All Variants

Variant	Parameters	Link
PAWN-Small	~9.5M	thomas-schweich/pawn-small
PAWN (Base)	~35.8M	thomas-schweich/pawn-base
PAWN-Large	~68.4M	thomas-schweich/pawn-large

Headline Metrics

Metric	Value
Legal move rate	99.18%
Top-1 accuracy	6.75%
Top-5 accuracy	27.40%
Val loss	3.159

Accuracy Ratios

PAWN is trained on uniformly random chess games, so top-1 accuracy has a hard theoretical ceiling. Ratios above 100% on the unconditioned ceiling indicate the model has learned structure beyond simply identifying legal moves. See Accuracy Ceiling Analysis.

Ceiling	Ratio
Unconditioned (E[1/N_legal] = 6.43%)	105%
Naive-conditioned (1-ply filter = 6.44%)	105%
Bayes-optimal conditioned (MCTS, 32 rollouts = 7.92%)	85%

Probe Results

Linear probes trained on frozen hidden states measure how well the model's internal representations encode board-level features.

Probe	Accuracy	Description
Piece type	89.1%	Per-square piece type (13 classes x 64 squares)
Side to move	100.0%	Whose turn it is
Is check	94.3%	Whether the side to move is in check
Castling rights	96.5%	KQkq castling availability
En passant square	99.8%	En passant target square (64 + none)
Material count	86.5% (MAE 4.9)	Piece counts per type per color
Legal move count	30.7% (MAE 7.4)	Number of legal moves available
Halfmove clock	13.3% (MAE 3.9)	Plies since last capture or pawn move
Game phase	91.1%	Opening / middlegame / endgame

Diagnostic Results

Edge-case diagnostics measure the model's legal move rate in specific tactical situations.

Category	Positions	Legal Rate
In check	1000	82.4%
Double check	71	65.1%
Pin restricts movement	1000	86.2%
En passant available	940	97.1%
Castling legal (kingside)	1000	98.8%
Castling legal (queenside)	1000	98.2%
Castling blocked by check	892	95.7%
Promotion available	1000	96.2%
Checkmate (terminal)	276	66.4%
Stalemate (terminal)	41	53.8%

Architecture

Parameter	Value
Architecture	Decoder-only transformer
d_model	256
Layers	8
Attention heads	4
Head dimension	64
d_ff	1024
Parameters	~9.5M
Vocabulary	4,284 tokens
Context length	256 tokens
Normalization	Pre-norm RMSNorm
FFN	SwiGLU (4x expansion)
Positional encoding	Rotary (RoPE, base 10000)
Embeddings	Factored (src + dst + promo)
Dropout	0.0

Training Details

Parameter	Value
Training data	On-the-fly uniformly random legal games (no external dataset)
Objective	Next-token cross-entropy (non-padding positions only)
Total steps	100,000
Batch size	256
Games seen	25,600,000
Learning rate	3e-4 (cosine decay with 1,000-step warmup)
Optimizer	AdamW (weight decay 0.01)
Precision	Mixed (AMP)
Hardware	NVIDIA H200

Usage

Loading the model

import torch
from safetensors.torch import load_file
from pawn.config import CLMConfig
from pawn.model import PAWNCLM

cfg = CLMConfig.small()
model = PAWNCLM(cfg).cuda().eval()
weights = load_file("model.safetensors", device="cuda")
model.load_state_dict(weights)

Or load directly from HuggingFace:

from pawn.checkpoint import load_backbone_weights
from pawn.config import CLMConfig
from pawn.model import PAWNCLM

weights, config = load_backbone_weights("thomas-schweich/pawn-small")
cfg = CLMConfig.small()
model = PAWNCLM(cfg).eval()
model.load_state_dict(weights)

Finetuning with an adapter

uv run python scripts/train_bottleneck.py \
    --checkpoint thomas-schweich/pawn-small \
    --pgn thomas-schweich/pawn-lichess-full \
    --bottleneck-dim 32 --lr 1e-4 --local-checkpoints

Acknowledgments

PAWN builds on ideas and tools from the following projects and publications:

Component	Reference
Transformer	Vaswani et al., "Attention Is All You Need", NeurIPS 2017
RMSNorm	Zhang & Sennrich, "Root Mean Square Layer Normalization", NeurIPS 2019
RoPE	Su et al., "RoFormer: Enhanced Transformer with Rotary Position Embedding", 2021
SwiGLU	Shazeer, "GLU Variants Improve Transformer", 2020
AdamW	Loshchilov & Hutter, "Decoupled Weight Decay Regularization", ICLR 2019
Cosine schedule	Loshchilov & Hutter, "SGDR: Stochastic Gradient Descent with Warm Restarts", ICLR 2017
Mixed precision	Micikevicius et al., "Mixed Precision Training", ICLR 2018
Bottleneck adapters	Houlsby et al., "Parameter-Efficient Transfer Learning for NLP", ICML 2019
LoRA	Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models", ICLR 2022
FiLM	Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer", AAAI 2018
RoSA	Nikdan et al., "RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation", 2024
Linear probes	Alain & Bengio, "Understanding Intermediate Layers Using Linear Classifier Probes", ICLR Workshop 2017
Intrinsic dimensionality	Aghajanyan et al., "Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning", ACL 2021
MAIA	McIlroy-Young et al., "Aligning Superhuman AI with Human Behavior: Chess as a Model System", KDD 2020
AlphaZero	Silver et al., "A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play", Science 2018
Leela Chess Zero	github.com/LeelaChessZero/lc0
shakmaty	github.com/niklasf/shakmaty
PyO3	github.com/PyO3/pyo3
Lichess	lichess.org / database.lichess.org

Citation

@software{schweich2026pawn,
  author = {Schweich, Thomas},
  title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
  year = {2026},
  url = {https://github.com/thomas-schweich/PAWN},
  license = {Apache-2.0}
}