Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis

Paper Detail

Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis基于机器人中心视频合成的形态一致人形机器人交互

cs.CVCVTransformer热门获取具身智能多模态

Dream2Act Team

2026年03月20日

arXiv: 2603.19709v2

作者人数

标签数量

内容状态

元数据

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

This paper presents Dream2Act, a robot-centric framework enabling zero-shot interaction for humanoid robots through generative video synthesis. By taking a third-person image of the robot and target object, the framework leverages video generation models to synthesize morphology-consistent motion for task completion. The approach employs a high-fidelity pose extraction system to recover physically feasible, robot-native joint trajectories from the synthesized videos. These trajectories are subsequently executed via a general-purpose whole-body controller, eliminating the need for extensive policy training or explicit motion retargeting that suffers from morphology gaps.

本文提出Dream2Act，一个基于生成式视频合成的机器人中心框架，使人形机器人能够实现零样本交互。该框架接收机器人第三人称图像和目标物体，利用视频生成模型合成形态一致的运动序列。通过高保真姿态提取系统从合成视频中恢复物理可行的机器人原生关节轨迹，再由全身控制器执行，无需大量策略训练或显式运动重定向。

在 arXiv 查看

分类 / Categories

cs.CVcs.RO

深度分析

AI 深度理解论文内容，生成具有洞见性的总结