RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published 1 day ago • 25
WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora Paper • 2602.02053 • Published 1 day ago • 39
Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles Paper • 2602.01590 • Published 2 days ago • 30
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published 2 days ago • 38
NativeTok: Native Visual Tokenization for Improved Image Generation Paper • 2601.22837 • Published 5 days ago • 9
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10, 2025 • 16
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13, 2025 • 74
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Paper • 2506.03968 • Published Jun 4, 2025 • 15
WebDancer: Towards Autonomous Information Seeking Agency Paper • 2505.22648 • Published May 28, 2025 • 33