作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
ABot-PhysWorld is a 14B Diffusion Transformer model designed for interactive world modeling in robotics that generates visually realistic, physically plausible, and action-controllable videos. The model addresses common physical implausibility issues like object penetration and anti-gravity motion by using a novel DPO-based post-training framework with decoupled discriminators trained on a curated dataset of three million physics-aware manipulation clips. A parallel context block enables precise spatial action injection for cross-embodiment robot control. To evaluate generalization, the system introduces EZSbench, the first training-independent embodied zero-shot benchmark combining real and synthetic unseen robot-task-scene combinations.
ABot-PhysWorld是一个140亿参数的Diffusion Transformer模型,专为机器人交互式世界建模而设计,能够生成视觉真实、物理合理且动作可控的视频。该模型通过一种基于DPO的后训练框架和分离式判别器,利用包含300万条物理感知操控片段的精选数据集,解决了物体穿透和反重力运动等常见物理不合理问题。并行上下文模块能够实现跨本体机器人控制的精确空间动作注入。为评估泛化能力,该系统提出了EZSbench,这是首个结合真实和合成未见机器人-任务-场景组合的训练无关具身零样本基准。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结