作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
View transformers process multi-view observations to predict actions and have shown impressive performance in robotic manipulation. Existing methods typically extract static visual representations in a view-specific manner, leading to inadequate 3D spatial reasoning ability and a lack of dynamic adaptation. Taking inspiration from how the human brain integrates static and dynamic views, we propose Cortical Policy, a novel dual-stream view transformer for robotic manipulation that jointly reasons from static-view and dynamic-view streams. The static-view stream enhances spatial understanding by aligning features of geometrically consistent keypoints extracted from a pretrained 3D foundation model. The dynamic-view stream achieves adaptive adjustment through position-aware pretraining of an egocentric gaze estimation model, computationally replicating the human cortical dorsal pathway. The complementary view representations enable improved robotic manipulation performance.
视图Transformer通过处理多视角观测来预测动作,在机器人操控领域展现了优异的性能。现有方法通常以视图特定方式提取静态视觉表征,导致3D空间推理能力不足且缺乏动态适应性。本文借鉴人脑整合静态与动态视图的机制,提出皮层策略——一种新颖的双流视图Transformer,通过静态视图流与动态视图流的联合推理实现机器人操控。静态视图流利用预训练3D基础模型提取的几何一致关键点特征对齐来增强空间理解,动态视图流则通过位置感知预训练的自我中心视线估计模型实现自适应调整,模拟人类皮层背侧通路的计算机制。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结