返回论文列表
Paper Detail
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
cs.CVCV热门获取3D检测具身智能多模态
WorldAgents Team
2026年03月20日
arXiv: 2603.19708v1

作者人数

1

标签数量

5

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

This paper investigates whether 2D foundation image models inherently possess 3D world model capabilities by evaluating their performance on 3D world synthesis tasks. The authors propose a multi-agent architecture consisting of a VLM-based director, an image synthesizer, and a two-step verifier that evaluates outputs from both 2D image and 3D reconstruction spaces. Through systematic benchmarking of state-of-the-art image generation models and Vision-Language Models, they demonstrate that their agentic approach achieves coherent and robust 3D reconstruction, enabling exploration through novel view rendering. The research provides insights into leveraging implicit 3D knowledge from 2D foundation models for world-level scene understanding and generation.

PDF 预览
1
在 arXiv 查看下载 PDF

分类 / Categories

cs.CVcs.AI

深度分析

AI 深度理解论文内容,生成具有洞见性的总结