Papers
arxiv:2601.15703

Agentic Uncertainty Quantification

Published on Jan 22
ยท Submitted by
Jiaxin Zhang
on Jan 23
Authors:
,
,
,

Abstract

A unified dual-process framework transforms verbalized uncertainty into active control signals for improved reasoning reliability in AI agents.

AI-generated summary

Although AI agents have demonstrated impressive capabilities in long-horizon reasoning, their reliability is severely hampered by the ``Spiral of Hallucination,'' where early epistemic errors propagate irreversibly. Existing methods face a dilemma: uncertainty quantification (UQ) methods typically act as passive sensors, only diagnosing risks without addressing them, while self-reflection mechanisms suffer from continuous or aimless corrections. To bridge this gap, we propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals. Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary. This enables the agent to balance efficient execution and deep deliberation dynamically. Extensive experiments on closed-loop benchmarks and open-ended deep research tasks demonstrate that our training-free approach achieves superior performance and trajectory-level calibration. We believe this principled framework AUQ represents a significant step towards reliable agents.

Community

Paper author Paper submitter

๐Ÿ›‘ Stop the "Spiral of Hallucination" in Autonomous Agents!

Long-horizon agents often fail because minor early errors snowball into irreversible failures. We introduce Agentic Uncertainty Quantification (AUQ), a training-free Dual-Process framework inspired by System 1/System 2 thinking:

  • ๐Ÿง  System 1 (Fast): Uncertainty-Aware Memory propagates doubt to prevent blind commitment.
  • ๐Ÿค” System 2 (Slow): Triggers active reflection only when confidence drops below a specific threshold.

Key Wins:

  • โœ… SOTA Performance: Outperforms ReAct & Reflexion on ALFWorld, WebShop, and the new DeepResearch Bench.
  • โœ… Efficiency: Prevents long, futile failure loops, making it more token-efficient than standard methods.
  • โœ… Plug-and-Play: No fine-tuning required.

From "Passive Diagnosis" to "Active Control" โ€” make your agents reliable! ๐Ÿš€

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.15703 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.15703 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.15703 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.