作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
Multimodal Large Language Models (MLLMs) have demonstrated remarkable potential in medical image analysis. However, their application in gastrointestinal endoscopy is currently hindered by two critical limitations: the misalignment between general model reasoning and standardized clinical cognitive pathways, and the lack of causal association between visual features and diagnostic outcomes. In this paper, we propose a novel Clinical-Cognitive-Aligned (CogAlign) framework to address these challenges. First, we endow the model with rigorous clinical analytical capabilities by constructing the hierarchical clinical cognition dataset and employing Supervised Fine-Tuning (SFT). Unlike conventional approaches, this strategy internalizes the hierarchical diagnostic logic of experts, ranging from anatomical localization and morphological evaluation to microvascular analysis, directly into the model. Second, to eliminate visual bias, we provide a theoretical analysis demonstrating that standard supervised tuning inevitably converges to spurious background correlations. Guided by this insight, we propose a counterfactual-driven reinforcement learning strategy to enforce causal rectification. By generating counterfactual normal samples via lesion masking and optimizing through clinical-cognition-centric rewards, we constrain the model to strictly ground its diagnosis in causal lesion features. Extensive experiments demonstrate that our approach achieves State-of-the-Art (SoTA) performance across multiple benchmarks, significantly enhancing diagnostic accuracy in complex clinical scenarios. All source code and datasets will be made publicly available.
多模态大语言模型(MLLMs)在医学影像分析中展现出巨大潜力,但在胃肠道内镜应用中面临模型推理与临床认知路径不对齐、视觉特征与诊断结果缺乏因果关联两大关键限制。本文提出临床认知对齐(CogAlign)框架,通过构建分层临床认知数据集并采用监督微调(SFT)将专家分层诊断逻辑内化到模型中,并设计反事实驱动的强化学习策略消除视觉偏差,使诊断严格基于因果病变特征。实验表明该方法在多个基准上达到最优性能(SOTA),显著提升复杂临床场景中的诊断准确性。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结