Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published Mar 2 • 33
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day sionic-ai • Dec 8, 2025 • 57
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 lysandre, ArthurZ, cyrilvallez, reach-vb • Dec 1, 2025 • 310
Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published Nov 13, 2025 • 53
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix codelion • Nov 3, 2025 • 65
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 187
view article Article Exploring Environments Hub: Your Language Model needs better (open) environments to learn anakin87 • Sep 4, 2025 • 30
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6, 2025 • 129
Tool Use Reasoning Collection A collection of tool use reasoning dataset in Hermes format • 5 items • Updated Jul 23, 2025 • 9
view article Article ChatML vs Harmony: Understanding the new Format from OpenAI 🔍 kuotient • Aug 9, 2025 • 57
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 smohammadi, siro1, winglian, marcsun13, djsaunde • Aug 8, 2025 • 98
view article Article Training Large Language Models with Interpreter Feedback using WebAssembly axolotl-ai-co • Apr 3, 2025 • 14
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7, 2025 • 27
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140