PAWN-Small
PAWN (Playstyle-Agnostic World-model Network for Chess) is a causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from uniformly random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.
This is the small variant (~9.5M parameters). PAWN is designed as a frozen backbone for parameter-efficient finetuning into player models with arbitrary playstyles.
GitHub Repository -- full source code, training scripts, adapter implementations, and documentation.
All Variants
| Variant | Parameters | Link |
|---|---|---|
| PAWN-Small | ~9.5M | thomas-schweich/pawn-small |
| PAWN (Base) | ~35.8M | thomas-schweich/pawn-base |
| PAWN-Large | ~68.4M | thomas-schweich/pawn-large |
Headline Metrics
| Metric | Value |
|---|---|
| Legal move rate | 99.18% |
| Top-1 accuracy | 6.75% |
| Top-5 accuracy | 27.40% |
| Val loss | 3.159 |
Accuracy Ratios
PAWN is trained on uniformly random chess games, so top-1 accuracy has a hard theoretical ceiling. Ratios above 100% on the unconditioned ceiling indicate the model has learned structure beyond simply identifying legal moves. See Accuracy Ceiling Analysis.
| Ceiling | Ratio |
|---|---|
| Unconditioned (E[1/N_legal] = 6.43%) | 105% |
| Naive-conditioned (1-ply filter = 6.44%) | 105% |
| Bayes-optimal conditioned (MCTS, 32 rollouts = 7.92%) | 85% |
Probe Results
Linear probes trained on frozen hidden states measure how well the model's internal representations encode board-level features.
| Probe | Accuracy | Description |
|---|---|---|
| Piece type | 89.1% | Per-square piece type (13 classes x 64 squares) |
| Side to move | 100.0% | Whose turn it is |
| Is check | 94.3% | Whether the side to move is in check |
| Castling rights | 96.5% | KQkq castling availability |
| En passant square | 99.8% | En passant target square (64 + none) |
| Material count | 86.5% (MAE 4.9) | Piece counts per type per color |
| Legal move count | 30.7% (MAE 7.4) | Number of legal moves available |
| Halfmove clock | 13.3% (MAE 3.9) | Plies since last capture or pawn move |
| Game phase | 91.1% | Opening / middlegame / endgame |
Diagnostic Results
Edge-case diagnostics measure the model's legal move rate in specific tactical situations.
| Category | Positions | Legal Rate |
|---|---|---|
| In check | 1000 | 82.4% |
| Double check | 71 | 65.1% |
| Pin restricts movement | 1000 | 86.2% |
| En passant available | 940 | 97.1% |
| Castling legal (kingside) | 1000 | 98.8% |
| Castling legal (queenside) | 1000 | 98.2% |
| Castling blocked by check | 892 | 95.7% |
| Promotion available | 1000 | 96.2% |
| Checkmate (terminal) | 276 | 66.4% |
| Stalemate (terminal) | 41 | 53.8% |
Architecture
| Parameter | Value |
|---|---|
| Architecture | Decoder-only transformer |
| d_model | 256 |
| Layers | 8 |
| Attention heads | 4 |
| Head dimension | 64 |
| d_ff | 1024 |
| Parameters | ~9.5M |
| Vocabulary | 4,284 tokens |
| Context length | 256 tokens |
| Normalization | Pre-norm RMSNorm |
| FFN | SwiGLU (4x expansion) |
| Positional encoding | Rotary (RoPE, base 10000) |
| Embeddings | Factored (src + dst + promo) |
| Dropout | 0.0 |
Training Details
| Parameter | Value |
|---|---|
| Training data | On-the-fly uniformly random legal games (no external dataset) |
| Objective | Next-token cross-entropy (non-padding positions only) |
| Total steps | 100,000 |
| Batch size | 256 |
| Games seen | 25,600,000 |
| Learning rate | 3e-4 (cosine decay with 1,000-step warmup) |
| Optimizer | AdamW (weight decay 0.01) |
| Precision | Mixed (AMP) |
| Hardware | NVIDIA H200 |
Usage
Loading the model
import torch
from safetensors.torch import load_file
from pawn.config import CLMConfig
from pawn.model import PAWNCLM
cfg = CLMConfig.small()
model = PAWNCLM(cfg).cuda().eval()
weights = load_file("model.safetensors", device="cuda")
model.load_state_dict(weights)
Or load directly from HuggingFace:
from pawn.checkpoint import load_backbone_weights
from pawn.config import CLMConfig
from pawn.model import PAWNCLM
weights, config = load_backbone_weights("thomas-schweich/pawn-small")
cfg = CLMConfig.small()
model = PAWNCLM(cfg).eval()
model.load_state_dict(weights)
Finetuning with an adapter
uv run python scripts/train_bottleneck.py \
--checkpoint thomas-schweich/pawn-small \
--pgn thomas-schweich/pawn-lichess-full \
--bottleneck-dim 32 --lr 1e-4 --local-checkpoints
Acknowledgments
PAWN builds on ideas and tools from the following projects and publications:
Citation
@software{schweich2026pawn,
author = {Schweich, Thomas},
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
year = {2026},
url = {https://github.com/thomas-schweich/PAWN},
license = {Apache-2.0}
}
License
Apache 2.0. See LICENSE.
- Downloads last month
- 366
Collection including thomas-schweich/pawn-small
Papers for thomas-schweich/pawn-small
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
LoRA: Low-Rank Adaptation of Large Language Models
RoFormer: Enhanced Transformer with Rotary Position Embedding
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Aligning Superhuman AI with Human Behavior: Chess as a Model System
Evaluation results
- Legal Move Rateself-reported0.992
- Top-1 Accuracyself-reported0.068
- Top-5 Accuracyself-reported0.274
- Val Lossself-reported3.159
- Games Seenself-reported25600000.000