Paper Detail

OmniVTA: Visuo-Tactile World Modeling for Contact-Rich Robotic ManipulationOmniVTA：面向接触丰富机器人操作的视觉-触觉世界建模

cs.CVCVTransformer热门获取具身智能多模态

OmniVTA Authors

2026年03月20日

arXiv: 2603.19201v2

作者人数

1

标签数量

5

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

This paper presents OmniVTA, a world-model-based visuo-tactile manipulation framework designed for contact-rich robotic manipulation tasks such as wiping and assembly. The work introduces OmniViTac, a large-scale dataset comprising 21,000+ trajectories across 86 tasks and 100+ objects with six physics-grounded interaction patterns. The framework integrates four tightly coupled modules including a self-supervised tactile encoder and a two-stream visuo-tactile world model for predicting contact dynamics. The research addresses limitations in existing methods by treating tactile signals actively to model contact dynamics and enable explicit closed-loop control, moving beyond passive observation approaches.

摘要 / Abstract

分类 / Categories

深度分析