Papers
arxiv:2605.09630

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

Published on May 10
· Submitted by
Lin Zheng
on May 12
Authors:
,
,
,
,
,

Abstract

Tokenizer-free language models using patch-based approaches face a trade-off between compute efficiency and modeling quality due to patch lag, which Scratchpad Patching addresses by dynamically refreshing context within patches based on prediction entropy.

AI-generated summary

Tokenizer-free language models eliminate the tokenizer step of the language modeling pipeline by operating directly on bytes; patch-based variants further aggregate contiguous byte spans into patches for efficiency. However, the average patch size chosen at the model design stage governs a tight trade-off: larger patches reduce compute and KV-cache footprint, but degrade modeling quality. We trace this trade-off to patch lag: until a patch is fully observed, byte predictions within it must rely on a stale representation from the previous patch to preserve causality; this lag widens as patches grow larger. We introduce Scratchpad Patching (SP), which inserts transient scratchpads inside each patch to aggregate the bytes seen so far and refresh patch-level context for subsequent predictions. SP triggers scratchpads using next-byte prediction entropy, selectively allocating compute to information-dense regions and enabling post-hoc adjustment of inference-time compute. Across experiments on natural language and code, SP improves model quality at the same patch size; for example, even at 16 bytes per patch, SP-augmented models match or closely approach the byte-level baseline on downstream evaluations while using a 16times smaller KV cache over patches and 3-4times less inference compute.

Community

Paper submitter

This work introduces Scratchpad Patching for byte-level language models, which inserts transient, entropy-triggered scratchpads inside patches and significantly improves modeling quality at much larger patch sizes.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.09630
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09630 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09630 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09630 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.