作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
Latent diffusion models enable high-fidelity synthesis by operating in learned latent spaces. However, current approaches require complex multi-stage training where tokenizers must be trained separately before diffusion models. This paper proposes UNITE, an autoencoder architecture that unifies tokenization and latent diffusion through a single Generative Encoder with shared weights. The key innovation is treating tokenization and generation as the same latent inference problem under different conditioning regimes. The method introduces a single-stage training procedure that jointly optimizes both tokenization and generation tasks via two forward passes, enabling gradients to jointly shape the latent space for improved visual representation learning.
潜在扩散模型通过在学习的潜在空间中进行运算,实现高保真合成。然而,现有方法需要复杂的多阶段训练流程,标记化器必须先于扩散模型单独训练。本文提出UNITE,一种通过具有共享权重的单一生成编码器统一标记化与潜在扩散的自编码器架构。其核心创新在于将标记化与生成视为同一潜在推理问题在不同条件机制下的变体。该方法引入单阶段训练程序,通过两次前向传播联合优化标记化和生成任务,使梯度共同塑造潜在空间,从而提升视觉表征学习效果。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结