Paper Detail

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action ModelsDualCoT-VLA：面向视觉-语言-动作模型的视觉-语言思维链并行推理方法

cs.CVCVTransformer热门获取具身智能多模态

DualCoT-VLA Authors

2026年03月24日

arXiv: 2603.22280v1

作者人数

1

标签数量

5

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Vision-Language-Action (VLA) models enable robots to map visual observations and language instructions directly to robotic actions. However, existing VLA models struggle with complex multi-step tasks requiring logical planning and precise manipulations. Current Chain-of-Thought approaches have limitations in simultaneously capturing low-level visual details and high-level logical planning, as well as suffering from high inference latency with compounding errors. This paper proposes DualCoT-VLA, a novel visual-linguistic CoT method with parallel reasoning mechanism that integrates visual CoT for comprehensive multi-modal reasoning, enabling robots to perform effective thinking before acting for manipulation tasks.

摘要 / Abstract

分类 / Categories

深度分析