作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
This paper addresses the fundamental question of whether complete 3D scene reconstruction is necessary for object localization in embodied tasks. The authors propose a map-free pipeline that stores only posed RGB-D keyframes as a lightweight visual memory, eliminating the need for global 3D representations. At query time, the method retrieves candidate views and re-ranks them using a vision-language model for semantic reasoning. A sparse on-demand 3D estimate of the target is constructed through depth backprojection, enabling efficient localization without expensive reconstruction. This approach significantly reduces mapping time, storage overhead, and scalability limitations while maintaining effective performance for navigation and manipulation tasks.
本文探讨了在具身任务中进行物体定位是否必须进行完整的3D场景重建。作者提出了一种无地图管线,仅存储带位姿的RGB-D关键帧作为轻量级视觉记忆,无需全局3D表示。在查询时,该方法利用视觉-语言模型进行语义推理,检索并重排序候选视角。实验表明,该方法显著降低了建图时间、存储开销和可扩展性限制,同时在导航和操纵任务中保持了有效的性能。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结