Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria Paper • 2605.08354 • Published 6 days ago • 20
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published Oct 28, 2025 • 46
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning Paper • 2510.22543 • Published Oct 26, 2025 • 14