作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
This paper investigates whether 2D foundation image models inherently possess 3D world model capabilities by evaluating their performance on 3D world synthesis tasks. The authors propose a multi-agent architecture consisting of a VLM-based director, an image synthesizer, and a two-step verifier that evaluates outputs from both 2D image and 3D reconstruction spaces. Through systematic benchmarking of state-of-the-art image generation models and Vision-Language Models, they demonstrate that their agentic approach achieves coherent and robust 3D reconstruction, enabling exploration through novel view rendering. The research provides insights into leveraging implicit 3D knowledge from 2D foundation models for world-level scene understanding and generation.
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结