PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research Paper • 2604.15411 • Published 4 days ago
VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects Paper • 2604.16272 • Published 3 days ago
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems Paper • 2604.14228 • Published 6 days ago • 17
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories Paper • 2604.15311 • Published 4 days ago • 9
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published 4 days ago • 31
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 5 days ago • 93
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 5 days ago • 93
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation Paper • 2604.15309 • Published 4 days ago • 5
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 12 days ago • 111
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published 5 days ago • 62
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Paper • 2604.14113 • Published 5 days ago • 10
InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis Paper • 2604.13201 • Published 6 days ago • 2
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 5 days ago • 141
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 5 days ago • 141