TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published Sep 25, 2025 • 29
GLM-4.5 Collection GLM-4.5: An open-source large language model designed for intelligent agents by Z.ai • 11 items • Updated Aug 11, 2025 • 252
Exploring the Evolution of Physics Cognition in Video Generation: A Survey Paper • 2503.21765 • Published Mar 27, 2025 • 11
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving Paper • 2502.20238 • Published Feb 27, 2025 • 23
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published Nov 4, 2024 • 36