-
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam
Paper • 2602.13964 • Published • 11 -
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Paper • 2603.03823 • Published • 7 -
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
Paper • 2602.16742 • Published • 12 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 108
AI & ML interests
None defined yet.
Recent Activity
Papers
View all Papers-
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam
Paper • 2602.13964 • Published • 11 -
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Paper • 2603.03823 • Published • 7 -
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
Paper • 2602.16742 • Published • 12 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 108
models 0
None public yet