Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization Paper • 2604.08476 • Published 8 days ago • 8