VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction Paper • 2602.12579 • Published 26 days ago • 2
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers Paper • 2510.00915 • Published Oct 1, 2025 • 2