MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published 14 days ago • 45
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 14 days ago • 108
CohereLabs/cohere-transcribe-03-2026 Automatic Speech Recognition • Updated about 3 hours ago • 241k • 892
Mistral Small 4 Collection A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated Mar 16 • 66
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 Text Generation • 67B • Updated 9 days ago • 1.34M • 273