Key Result

O(N) learned causal convolution beats O(N²) softmax attention on both perplexity AND throughput, with the advantage growing at longer sequences:

Model PPL Change TPS (128) TPS (2048) Speedup Learned Conv O(N) 8.08 -3.2% 378,066 1,009,622 5.5x Standard QKV O(N²) 8.34 baseline 317,968 183,408 1.0x At 2048 tokens, the O(N) model is 5.5x faster while achieving better perplexity. The gap widens with sequence length because O(N) scales linearly while O(N²) scales quadratically.

https://github.com/MikeyBeez/DifferentialLR https://medium.com/p/6659a3793322 https://doi.org/10.5281/zenodo.18498944

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support