R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning Paper • 2601.19620 • Published Jan 27 • 2
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17, 2025 • 26
MaziyarPanahi/Llama-Nemotron-Post-Training-Dataset-v1-ShareGPT Viewer • Updated Jun 2, 2025 • 30.2M • 162 • 41
Running 593 Scaling test-time compute 📈 593 Boost LLM answers with search‑guided test‑time compute