Paper Detail

Causal Evidence that Language Models use Confidence to Drive Behavior语言模型利用置信度驱动行为的因果证据

cs.CL大语言模型Transformer热门获取End-to-End

Author 1, Author 2

2026年03月24日

arXiv: 2603.22161v1

作者人数

2

标签数量

4

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Metacognition, the ability to assess one's own cognitive performance, is a fundamental capability documented across species where internal confidence estimates guide adaptive behavior. This research investigates whether Large Language Models (LLMs) actively utilize confidence signals to regulate their behavior through a four-phase abstention paradigm. The study first establishes internal confidence estimates without abstention options, then reveals that LLMs apply implicit thresholds to these estimates when deciding whether to answer or abstain. Findings demonstrate that confidence serves as the dominant predictor of behavior, with effect sizes an order of magnitude larger than knowledge retrieval accessibility or semantic features. Causal evidence is provided through activation steering experiments, where manipulating internal confidence signals correspondingly shifts abstention rates, demonstrating a direct causal relationship between confidence estimation and behavioral regulation.

摘要 / Abstract

分类 / Categories

深度分析