agent

zzfive 's Collections

world_model

VLA

RolePlaying

dLLM

industry

RAG

ssm

safety

inference optimization

updated Feb 7

Upvote

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23, 2024 • 17
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Paper • 2404.03648 • Published Apr 4, 2024 • 29
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30, 2024 • 33
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Paper • 2405.19888 • Published May 30, 2024 • 7
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Paper • 2406.01014 • Published Jun 3, 2024 • 33
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Paper • 2406.04151 • Published Jun 6, 2024 • 24
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Paper • 2406.12045 • Published Jun 17, 2024 • 9
Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1, 2024 • 65
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Paper • 2407.07061 • Published Jul 9, 2024 • 28
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Paper • 2407.10956 • Published Jul 15, 2024 • 7
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

Paper • 2407.10718 • Published Jul 15, 2024 • 19
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation

Paper • 2407.14931 • Published Jul 20, 2024 • 22
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Paper • 2407.15711 • Published Jul 22, 2024 • 9
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

Paper • 2407.13301 • Published Jul 18, 2024 • 55
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23, 2024 • 77
LAMBDA: A Large Model Based Data Agent

Paper • 2407.17535 • Published Jul 24, 2024 • 37
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Paper • 2407.20183 • Published Jul 29, 2024 • 43
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS

Paper • 2408.01584 • Published Aug 2, 2024 • 10
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

Paper • 2408.03615 • Published Aug 7, 2024 • 31
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases

Paper • 2408.03910 • Published Aug 7, 2024 • 18
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published Sep 6, 2024 • 26
Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 32
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Paper • 2409.07440 • Published Sep 11, 2024 • 8
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

Paper • 2409.16299 • Published Sep 9, 2024 • 11
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making

Paper • 2409.16686 • Published Sep 25, 2024 • 10
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Paper • 2410.03017 • Published Oct 3, 2024 • 29
Agent S: An Open Agentic Framework that Uses Computers Like a Human

Paper • 2410.08164 • Published Oct 10, 2024 • 26
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Paper • 2410.11710 • Published Oct 15, 2024 • 20
Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 23
Revealing the Barriers of Language Agents in Planning

Paper • 2410.12409 • Published Oct 16, 2024 • 27
MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Paper • 2410.13757 • Published Oct 17, 2024 • 32
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 44
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24, 2024 • 32
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Paper • 2410.20424 • Published Oct 27, 2024 • 40
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

Paper • 2410.19609 • Published Oct 25, 2024 • 18
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use

Paper • 2410.24218 • Published Oct 31, 2024 • 6
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Paper • 2411.00412 • Published Nov 1, 2024 • 10
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper • 2410.24024 • Published Oct 31, 2024 • 49
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 36
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model

Paper • 2411.04496 • Published Nov 7, 2024 • 22
GazeGen: Gaze-Driven User Interaction for Visual Content Generation

Paper • 2411.04335 • Published Nov 7, 2024 • 15
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published Nov 15, 2024 • 34
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Paper • 2411.06559 • Published Nov 10, 2024 • 16
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Paper • 2411.13543 • Published Nov 20, 2024 • 19
SketchAgent: Language-Driven Sequential Sketch Generation

Paper • 2411.17673 • Published Nov 26, 2024 • 18
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment

Paper • 2411.17188 • Published Nov 26, 2024 • 20
Large Language Model-Brained GUI Agents: A Survey

Paper • 2411.18279 • Published Nov 27, 2024 • 30
MALT: Improving Reasoning with Multi-Agent LLM Training

Paper • 2412.01928 • Published Dec 2, 2024 • 46
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 71
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 72
The BrowserGym Ecosystem for Web Agent Research

Paper • 2412.05467 • Published Dec 6, 2024 • 24
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Paper • 2412.09605 • Published Dec 12, 2024 • 30
Large Action Models: From Inception to Implementation

Paper • 2412.10047 • Published Dec 13, 2024 • 36
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Paper • 2412.09645 • Published Dec 10, 2024 • 36
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Paper • 2412.13194 • Published Dec 17, 2024 • 12
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 51
GUI Agents: A Survey

Paper • 2412.13501 • Published Dec 18, 2024 • 30
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

Paper • 2412.17589 • Published Dec 23, 2024 • 14
Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published Dec 19, 2024 • 13
Training Software Engineering Agents and Verifiers with SWE-Gym

Paper • 2412.21139 • Published Dec 30, 2024 • 26
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 87
A3: Android Agent Arena for Mobile GUI Agents

Paper • 2501.01149 • Published Jan 2, 2025 • 22
Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8, 2025 • 95
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published Jan 9, 2025 • 103
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Paper • 2501.04575 • Published Jan 8, 2025 • 25
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17, 2025 • 55
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published Jan 20, 2025 • 109
UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21, 2025 • 64
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Paper • 2501.11733 • Published Jan 20, 2025 • 28
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper • 2501.12909 • Published Jan 22, 2025 • 74
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Paper • 2501.11067 • Published Jan 19, 2025 • 13
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Paper • 2501.16609 • Published Jan 28, 2025 • 7
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Paper • 2502.02584 • Published Feb 4, 2025 • 16
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?

Paper • 2502.00674 • Published Feb 2, 2025 • 13
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents

Paper • 2502.05957 • Published Feb 9, 2025 • 15
InSTA: Towards Internet-Scale Training For Agents

Paper • 2502.06776 • Published Feb 10, 2025 • 9
Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

Paper • 2502.06589 • Published Feb 10, 2025 • 21
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Paper • 2502.09560 • Published Feb 13, 2025 • 35
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

Paper • 2502.11271 • Published Feb 16, 2025 • 18
Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Paper • 2502.13965 • Published Feb 19, 2025 • 19
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Paper • 2502.15425 • Published Feb 21, 2025 • 9
Self-Taught Agentic Long Context Understanding

Paper • 2502.15920 • Published Feb 21, 2025 • 3
WebGames: Challenging General-Purpose Web-Browsing AI Agents

Paper • 2502.18356 • Published Feb 25, 2025 • 14
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Paper • 2502.18017 • Published Feb 25, 2025 • 21
PodAgent: A Comprehensive Framework for Podcast Generation

Paper • 2503.00455 • Published Mar 1, 2025 • 6
MPO: Boosting LLM Agents with Meta Plan Optimization

Paper • 2503.02682 • Published Mar 4, 2025 • 29
Agent models: Internalizing Chain-of-Action Generation into Reasoning models

Paper • 2503.06580 • Published Mar 9, 2025 • 20
API Agents vs. GUI Agents: Divergence and Convergence

Paper • 2503.11069 • Published Mar 14, 2025 • 36
STEVE: AStep Verification Pipeline for Computer-use Agent Training

Paper • 2503.12532 • Published Mar 16, 2025 • 17
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 96
Verbal Process Supervision Elicits Better Coding Agents

Paper • 2503.18494 • Published Mar 24, 2025 • 2
Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published Mar 27, 2025 • 83
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27, 2025 • 62
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

Paper • 2503.18809 • Published Mar 24, 2025 • 9
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Paper • 2504.00906 • Published Apr 1, 2025 • 27
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 305
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 28
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Paper • 2504.10127 • Published Apr 14, 2025 • 17
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users

Paper • 2504.10157 • Published Apr 14, 2025 • 17
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Paper • 2504.08066 • Published Apr 10, 2025 • 16
TextArena

Paper • 2504.11442 • Published Apr 15, 2025 • 30
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

Paper • 2504.09702 • Published Apr 13, 2025 • 18
Exploring Expert Failures Improves LLM Agent Tuning

Paper • 2504.13145 • Published Apr 17, 2025 • 12
UFO2: The Desktop AgentOS

Paper • 2504.14603 • Published Apr 20, 2025 • 29
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

Paper • 2504.14239 • Published Apr 19, 2025 • 14
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22, 2025 • 21
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24, 2025 • 124
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

Paper • 2504.19838 • Published Apr 28, 2025 • 23
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Paper • 2504.19413 • Published Apr 28, 2025 • 48
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

Paper • 2504.20073 • Published Apr 24, 2025 • 12
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28, 2025 • 39
Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents

Paper • 2505.02156 • Published May 4, 2025 • 18
Multi-Agent System for Comprehensive Soccer Understanding

Paper • 2505.03735 • Published May 6, 2025 • 25
OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents

Paper • 2505.03570 • Published May 6, 2025 • 8
LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Paper • 2505.04253 • Published May 7, 2025 • 14
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge

Paper • 2505.10468 • Published May 15, 2025 • 10
Creating General User Models from Computer Use

Paper • 2505.10831 • Published May 16, 2025 • 5
Visual Agentic Reinforcement Fine-Tuning

Paper • 2505.14246 • Published May 20, 2025 • 32
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

Paper • 2505.16938 • Published May 22, 2025 • 121
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23, 2025 • 81
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Paper • 2505.21496 • Published May 27, 2025 • 38
WebDancer: Towards Autonomous Information Seeking Agency

Paper • 2505.22648 • Published May 28, 2025 • 33
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Paper • 2505.24878 • Published May 30, 2025 • 23
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3, 2025 • 53
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Paper • 2506.04133 • Published Jun 4, 2025 • 3
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development

Paper • 2506.05010 • Published Jun 5, 2025 • 80
Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

Paper • 2506.02865 • Published Jun 3, 2025 • 33
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Paper • 2506.04405 • Published Jun 4, 2025 • 7
Agents of Change: Self-Evolving LLM Agents for Strategic Planning

Paper • 2506.04651 • Published Jun 5, 2025 • 8
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13, 2025 • 74
Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15, 2025 • 63
OAgents: An Empirical Study of Building Effective Agents

Paper • 2506.15741 • Published Jun 17, 2025 • 35
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30, 2025 • 51
WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3, 2025 • 126
PresentAgent: Multimodal Agent for Presentation Video Generation

Paper • 2507.04036 • Published Jul 5, 2025 • 11
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Paper • 2507.06229 • Published Jul 8, 2025 • 76
MIRIX: Multi-Agent Memory System for LLM-Based Agents

Paper • 2507.07957 • Published Jul 10, 2025 • 80
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 135
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

Paper • 2507.12806 • Published Jul 17, 2025 • 21
LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

Paper • 2507.15815 • Published Jul 21, 2025 • 7
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 33
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28, 2025 • 85
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Paper • 2507.21035 • Published Jul 28, 2025 • 3
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30, 2025 • 101
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1, 2025 • 94
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Paper • 2507.23348 • Published Jul 31, 2025 • 12
CellForge: Agentic Design of Virtual Cell Models

Paper • 2508.02276 • Published Aug 4, 2025 • 39
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

Paper • 2508.01415 • Published Aug 2, 2025 • 8
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Paper • 2508.00890 • Published Jul 26, 2025 • 7
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

Paper • 2508.01780 • Published Aug 3, 2025 • 21
HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents

Paper • 2508.02629 • Published Aug 4, 2025 • 6
Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published Jul 24, 2025 • 86
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6, 2025 • 52
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5, 2025 • 59
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success

Paper • 2508.04280 • Published Aug 6, 2025 • 35
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 138
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

Paper • 2508.01858 • Published Aug 3, 2025 • 20
CoAct-1: Computer-using Agents with Coding as Actions

Paper • 2508.03923 • Published Aug 5, 2025 • 13
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Paper • 2508.04482 • Published Aug 6, 2025 • 9
WideSearch: Benchmarking Agentic Broad Info-Seeking

Paper • 2508.07999 • Published Aug 11, 2025 • 111
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10, 2025 • 99
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Paper • 2508.06600 • Published Aug 8, 2025 • 41
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 142
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Paper • 2508.07976 • Published Aug 11, 2025 • 52
OpenCUA: Open Foundations for Computer-Use Agents

Paper • 2508.09123 • Published Aug 12, 2025 • 33
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13, 2025 • 58
AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving

Paper • 2508.09889 • Published Aug 13, 2025 • 32
UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14, 2025 • 45
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
CAMAR: Continuous Actions Multi-Agent Routing

Paper • 2508.12845 • Published Aug 18, 2025 • 7
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Paper • 2508.12800 • Published Aug 18, 2025 • 6
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20, 2025 • 43
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 65
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs

Paper • 2508.17188 • Published Aug 24, 2025 • 17
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo

Paper • 2508.18370 • Published Aug 25, 2025 • 3
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Paper • 2508.15804 • Published Aug 14, 2025 • 15
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published Aug 28, 2025 • 63
AWorld: Orchestrating the Training Recipe for Agentic AI

Paper • 2508.20404 • Published Aug 28, 2025 • 38
UItron: Foundational GUI Agent with Advanced Perception and Planning

Paper • 2508.21767 • Published Aug 29, 2025 • 12
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 235
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2, 2025 • 127
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Paper • 2509.08755 • Published Sep 10, 2025 • 57
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

Paper • 2509.09734 • Published Sep 10, 2025 • 16
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

Paper • 2509.09995 • Published Sep 12, 2025 • 16
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Paper • 2509.13312 • Published Sep 16, 2025 • 106
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16, 2025 • 117
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16, 2025 • 91
Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 72
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Paper • 2509.13309 • Published Sep 16, 2025 • 67
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Paper • 2509.13313 • Published Sep 16, 2025 • 80
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Paper • 2509.15221 • Published Sep 18, 2025 • 111
Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech

Paper • 2509.14627 • Published Sep 18, 2025 • 1
LIMI: Less is More for Agency

Paper • 2509.17567 • Published Sep 22, 2025 • 104
ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21, 2025 • 21
Mano Report

Paper • 2509.17336 • Published Sep 22, 2025 • 10
GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 91
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution

Paper • 2509.25301 • Published Sep 29, 2025 • 20
JoyAgent-JDGenie: Technical Report on the GAIA

Paper • 2510.00510 • Published Oct 1, 2025 • 4
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31
Don't Just Fine-tune the Agent, Tune the Environment

Paper • 2510.10197 • Published Oct 11, 2025 • 30
AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading

Paper • 2510.14264 • Published Oct 16, 2025 • 10
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Paper • 2510.16872 • Published Oct 19, 2025 • 112
DeepAgent: A General Reasoning Agent with Scalable Toolsets

Paper • 2510.21618 • Published Oct 24, 2025 • 102
Tongyi DeepResearch Technical Report

Paper • 2510.24701 • Published Oct 28, 2025 • 103
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 214
HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Paper • 2511.03506 • Published Nov 5, 2025 • 95
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Paper • 2511.14460 • Published Nov 18, 2025 • 21
General Agentic Memory Via Deep Research

Paper • 2511.18423 • Published Nov 23, 2025 • 170
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17, 2025 • 19
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 302
Deep Research: A Systematic Survey

Paper • 2512.02038 • Published Nov 24, 2025 • 73
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

Paper • 2512.04082 • Published Dec 3, 2025 • 14
Step-GUI Technical Report

Paper • 2512.15431 • Published Dec 17, 2025 • 133
Memory in the Age of AI Agents

Paper • 2512.13564 • Published Dec 15, 2025 • 155
Adaptation of Agentic AI

Paper • 2512.16301 • Published Dec 18, 2025 • 108
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Paper • 2512.24618 • Published Dec 31, 2025 • 152
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Paper • 2512.22047 • Published Dec 26, 2025 • 30
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Paper • 2512.24615 • Published Dec 31, 2025 • 119
Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 202
Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 259

Upvote

Collection guide
Browse collections