When Benchmarks Age: Temporal Misalignment through Large Language Model Factuality Evaluation Paper • 2510.07238 • Published Oct 8, 2025 • 14
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses Paper • 2510.00232 • Published Sep 30, 2025 • 15
Representation & Optimization Collection Understanding about representation sheds light on optimization • 114 items • Updated about 17 hours ago • 5
Who's Your Judge? On the Detectability of LLM-Generated Judgments Paper • 2509.25154 • Published Sep 29, 2025 • 29
Mem-α: Learning Memory Construction via Reinforcement Learning Paper • 2509.25911 • Published Sep 30, 2025 • 14
Mem-α: Learning Memory Construction via Reinforcement Learning Paper • 2509.25911 • Published Sep 30, 2025 • 14
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 134
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning Paper • 2509.04744 • Published Sep 5, 2025 • 11
Running 3.63k The Ultra-Scale Playbook 🌌 3.63k The ultimate guide to training LLM on large GPU Clusters
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7, 2025 • 64