作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
We propose a novel architectural modification and post-training pipeline for enhancing large language model reasoning capabilities by teaching models to truncate forward passes early. Our approach augments the standard transformer architecture with an early-exit mechanism at intermediate layers, enabling the model to exit at shallower layers when tokens can be predicted without deep computation. Through a calibration stage followed by reinforcement learning, we incentivize the model to exit as early as possible while preserving task performance. Preliminary experiments on small reasoning models demonstrate adaptive computation reduction across tokens, suggesting that at appropriate scale, this approach can minimize excess computation for non-myopic planning using internal activations, reserving deep computation only for difficult-to-predict tokens.
我们提出一种新颖的架构修改和后训练流程,通过教导大语言模型提前终止前向传播来增强其推理能力。我们的方法在标准Transformer架构中引入早期退出机制,使模型能够在无需深层计算时于浅层退出。通过校准阶段与强化学习,我们激励模型在保持任务性能的同时尽可能早地退出。在小型推理模型上的初步实验表明,该方法能够实现跨Token的自适应计算减少,表明在适当规模下可最小化非短视规划的冗余计算,仅对难以预测的Token保留深层计算。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结