作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
Large Language Models and Vision Language Models demonstrate strong general reasoning capabilities but face challenges in spatial understanding and layout consistency for fine-grained visual editing tasks. This paper presents a Structured Reasoning framework that enables text-conditioned spatial layout editing through scene-graph reasoning. The system takes an input scene graph and natural-language instruction, then reasons over the graph structure to generate an updated scene graph satisfying the text condition while maintaining spatial coherence. By leveraging structured relational representations, the approach enhances both interpretability and control over spatial relationships. Evaluations on a text-guided layout editing benchmark covering sorting, spatial alignment, and room-editing tasks show that the training paradigm achieves an average 15% improvement in IoU and 25% reduction in center-distance error compared to Chain of Thought Fine-tuning baselines.
大语言模型和视觉语言模型在细粒度视觉编辑任务中展现出强大的通用推理能力,但在空间理解和布局一致性方面仍面临挑战。本论文提出一种基于场景图推理的结构化推理框架,通过在图结构上进行推理来生成满足文本条件且保持空间一致性的更新场景图。在文本引导的布局编辑基准测试(涵盖排序、空间对齐和房间编辑任务)上的评估表明,与思维链微调基线相比,该方法在IoU上平均提升15%,中心距离误差降低25%。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结