Paper Detail

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing3D-Layout-R1：面向语言指令空间编辑的结构化推理方法

cs.CV大语言模型Transformer热门获取分割多模态

3D-Layout-R1 Authors

2026年03月24日

arXiv: 2603.22279v1

作者人数

1

标签数量

5

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Large Language Models and Vision Language Models demonstrate strong general reasoning capabilities but face challenges in spatial understanding and layout consistency for fine-grained visual editing tasks. This paper presents a Structured Reasoning framework that enables text-conditioned spatial layout editing through scene-graph reasoning. The system takes an input scene graph and natural-language instruction, then reasons over the graph structure to generate an updated scene graph satisfying the text condition while maintaining spatial coherence. By leveraging structured relational representations, the approach enhances both interpretability and control over spatial relationships. Evaluations on a text-guided layout editing benchmark covering sorting, spatial alignment, and room-editing tasks show that the training paradigm achieves an average 15% improvement in IoU and 25% reduction in center-distance error compared to Chain of Thought Fine-tuning baselines.

摘要 / Abstract

分类 / Categories

深度分析