AGILLM-4
AGILLM-4 is the next training target after AGILLM-3. The current code is a production-oriented starting point, copied from the proven single-file trainer and extended for:
- ~1.5B parameter main preset (
agillm4_main) - 100 tokens per parameter target ratio
- longer block-size work on 24GB, B200, and B300 class GPUs
- AR+SAT every step with sequential backward to reduce peak VRAM
- SDPA and experimental sublinear local+landmark attention backends
- exact M-fold expansion attention harvested from n1.py, with local verifier
- fused QKV projection harvested from n1.py, with legacy checkpoint loading
- profiling tools for memory, throughput, AR cost, SAT cost, and optimizer cost
- synthetic long-context curriculum generation for recall and multi-hop tests
Start with AGILLM-4.md for the training plan and command recipes. The current sublinear backend is intentionally experimental: profile it against SDPA before using it for a real run.
Current harvest status from n1.py is tracked in N1_HARVEST.md.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support