返回论文列表
Paper Detail
Causal Evidence that Language Models use Confidence to Drive Behavior语言模型利用置信度驱动行为的因果证据
cs.CL大语言模型Transformer热门获取End-to-End
Author 1, Author 2
2026年03月24日
arXiv: 2603.22161v1

作者人数

2

标签数量

4

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Metacognition, the ability to assess one's own cognitive performance, is a fundamental capability documented across species where internal confidence estimates guide adaptive behavior. This research investigates whether Large Language Models (LLMs) actively utilize confidence signals to regulate their behavior through a four-phase abstention paradigm. The study first establishes internal confidence estimates without abstention options, then reveals that LLMs apply implicit thresholds to these estimates when deciding whether to answer or abstain. Findings demonstrate that confidence serves as the dominant predictor of behavior, with effect sizes an order of magnitude larger than knowledge retrieval accessibility or semantic features. Causal evidence is provided through activation steering experiments, where manipulating internal confidence signals correspondingly shifts abstention rates, demonstrating a direct causal relationship between confidence estimation and behavioral regulation.

元认知是一种跨物种的基本能力,其内部置信度估计能够指导适应性行为。本研究通过四阶段放弃范式探究大型语言模型是否利用置信度信号调节自身行为。研究发现,置信度是预测行为的主导因素,其效应量比知识检索可及性或语义特征高出一个数量级。通过激活导向实验操纵内部置信度信号可相应改变放弃率,从而为置信度估计与行为调节之间的直接因果关系提供了证据。

PDF 预览
1
在 arXiv 查看下载 PDF

分类 / Categories

cs.CLcs.AI

深度分析

AI 深度理解论文内容,生成具有洞见性的总结