PAWN-Large

PAWN (Playstyle-Agnostic World-model Network for Chess) is a causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from uniformly random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.

This is the large variant (~68.4M parameters). PAWN is designed as a frozen backbone for parameter-efficient finetuning into player models with arbitrary playstyles.

GitHub Repository -- full source code, training scripts, adapter implementations, and documentation.

All Variants

Variant	Parameters	Link
PAWN-Small	~9.5M	thomas-schweich/pawn-small
PAWN (Base)	~35.8M	thomas-schweich/pawn-base
PAWN-Large	~68.4M	thomas-schweich/pawn-large

Headline Metrics

Metric	Value
Legal move rate	99.89%
Top-1 accuracy	6.95%
Top-5 accuracy	27.73%
Val loss	3.092

Accuracy Ratios

PAWN is trained on uniformly random chess games, so top-1 accuracy has a hard theoretical ceiling. Ratios above 100% on the unconditioned ceiling indicate the model has learned structure beyond simply identifying legal moves. See Accuracy Ceiling Analysis.

Ceiling	Ratio
Unconditioned (E[1/N_legal] = 6.43%)	108%
Naive-conditioned (1-ply filter = 6.44%)	108%
Bayes-optimal conditioned (MCTS, 32 rollouts = 7.92%)	88%

Probe Results

Linear probes trained on frozen hidden states measure how well the model's internal representations encode board-level features.

Probe	Accuracy	Description
Piece type	90.3%	Per-square piece type (13 classes x 64 squares)
Side to move	100.0%	Whose turn it is
Is check	93.9%	Whether the side to move is in check
Castling rights	96.8%	KQkq castling availability
En passant square	99.7%	En passant target square (64 + none)
Material count	86.9% (MAE 5.1)	Piece counts per type per color
Legal move count	43.9% (MAE 6.5)	Number of legal moves available
Halfmove clock	11.0% (MAE 4.0)	Plies since last capture or pawn move
Game phase	91.3%	Opening / middlegame / endgame

Diagnostic Results

Edge-case diagnostics measure the model's legal move rate in specific tactical situations.

Category	Positions	Legal Rate
In check	1000	98.4%
Double check	71	95.0%
Pin restricts movement	1000	97.9%
En passant available	940	99.4%
Castling legal (kingside)	1000	99.8%
Castling legal (queenside)	1000	99.7%
Castling blocked by check	892	99.5%
Promotion available	1000	99.6%
Checkmate (terminal)	276	92.2%
Stalemate (terminal)	41	94.9%

Architecture

Parameter	Value
Architecture	Decoder-only transformer
d_model	640
Layers	10
Attention heads	8
Head dimension	80
d_ff	2560
Parameters	~68.4M
Vocabulary	4,284 tokens
Context length	256 tokens
Normalization	Pre-norm RMSNorm
FFN	SwiGLU (4x expansion)
Positional encoding	Rotary (RoPE, base 10000)
Embeddings	Factored (src + dst + promo)
Dropout	0.0

Training Details

Parameter	Value
Training data	On-the-fly uniformly random legal games (no external dataset)
Objective	Next-token cross-entropy (non-padding positions only)
Total steps	100,000
Batch size	256
Games seen	25,600,000
Learning rate	3e-4 (cosine decay with 1,000-step warmup)
Optimizer	AdamW (weight decay 0.01)
Precision	Mixed (AMP)
Hardware	NVIDIA H200

Usage

Loading the model

import torch
from safetensors.torch import load_file
from pawn.config import CLMConfig
from pawn.model import PAWNCLM

cfg = CLMConfig.large()
model = PAWNCLM(cfg).cuda().eval()
weights = load_file("model.safetensors", device="cuda")
model.load_state_dict(weights)

Or load directly from HuggingFace:

from pawn.checkpoint import load_backbone_weights
from pawn.config import CLMConfig
from pawn.model import PAWNCLM

weights, config = load_backbone_weights("thomas-schweich/pawn-large")
cfg = CLMConfig.large()
model = PAWNCLM(cfg).eval()
model.load_state_dict(weights)

Finetuning with an adapter

uv run python scripts/train_bottleneck.py \
    --checkpoint thomas-schweich/pawn-large \
    --pgn thomas-schweich/pawn-lichess-full \
    --bottleneck-dim 32 --lr 1e-4 --local-checkpoints

Acknowledgments

PAWN builds on ideas and tools from the following projects and publications:

Component	Reference
Transformer	Vaswani et al., "Attention Is All You Need", NeurIPS 2017
RMSNorm	Zhang & Sennrich, "Root Mean Square Layer Normalization", NeurIPS 2019
RoPE	Su et al., "RoFormer: Enhanced Transformer with Rotary Position Embedding", 2021
SwiGLU	Shazeer, "GLU Variants Improve Transformer", 2020
AdamW	Loshchilov & Hutter, "Decoupled Weight Decay Regularization", ICLR 2019
Cosine schedule	Loshchilov & Hutter, "SGDR: Stochastic Gradient Descent with Warm Restarts", ICLR 2017
Mixed precision	Micikevicius et al., "Mixed Precision Training", ICLR 2018
Bottleneck adapters	Houlsby et al., "Parameter-Efficient Transfer Learning for NLP", ICML 2019
LoRA	Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models", ICLR 2022
FiLM	Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer", AAAI 2018
RoSA	Nikdan et al., "RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation", 2024
Linear probes	Alain & Bengio, "Understanding Intermediate Layers Using Linear Classifier Probes", ICLR Workshop 2017
Intrinsic dimensionality	Aghajanyan et al., "Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning", ACL 2021
MAIA	McIlroy-Young et al., "Aligning Superhuman AI with Human Behavior: Chess as a Model System", KDD 2020
AlphaZero	Silver et al., "A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play", Science 2018
Leela Chess Zero	github.com/LeelaChessZero/lc0
shakmaty	github.com/niklasf/shakmaty
PyO3	github.com/PyO3/pyo3
Lichess	lichess.org / database.lichess.org

Citation

@software{schweich2026pawn,
  author = {Schweich, Thomas},
  title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
  year = {2026},
  url = {https://github.com/thomas-schweich/PAWN},
  license = {Apache-2.0}
}