返回论文列表
Paper Detail
DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action ModelsDualCoT-VLA:面向视觉-语言-动作模型的视觉-语言思维链并行推理方法
cs.CVCVTransformer热门获取具身智能多模态
DualCoT-VLA Authors
2026年03月24日
arXiv: 2603.22280v1

作者人数

1

标签数量

5

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Vision-Language-Action (VLA) models enable robots to map visual observations and language instructions directly to robotic actions. However, existing VLA models struggle with complex multi-step tasks requiring logical planning and precise manipulations. Current Chain-of-Thought approaches have limitations in simultaneously capturing low-level visual details and high-level logical planning, as well as suffering from high inference latency with compounding errors. This paper proposes DualCoT-VLA, a novel visual-linguistic CoT method with parallel reasoning mechanism that integrates visual CoT for comprehensive multi-modal reasoning, enabling robots to perform effective thinking before acting for manipulation tasks.

视觉-语言-动作(VLA)模型使机器人能够将视觉观测和语言指令直接映射到机器人动作。然而,现有VLA模型在处理需要逻辑规划和精确操作的复杂多步骤任务时仍存在困难。当前思维链方法在同时捕获低层视觉细节与高层逻辑规划方面存在局限,且面临推理延迟高和误差累积的问题。本文提出DualCoT-VLA,一种创新的视觉-语言思维链方法,采用并行推理机制整合视觉思维链以实现全面的多模态推理,使机器人能够在执行操作任务前进行有效思考。

PDF 预览
1
在 arXiv 查看下载 PDF

分类 / Categories

cs.CVcs.ROcs.AI

深度分析

分析时间: 2026/3/24 15:04:12