返回论文列表
Paper Detail
End-to-End Training for Unified Tokenization and Latent Denoising面向统一标记化与潜在去噪的端到端训练
cs.CV大语言模型端到端CVTransformer热门获取
UNITE Authors
2026年03月24日
arXiv: 2603.22283v1

作者人数

1

标签数量

5

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Latent diffusion models enable high-fidelity synthesis by operating in learned latent spaces. However, current approaches require complex multi-stage training where tokenizers must be trained separately before diffusion models. This paper proposes UNITE, an autoencoder architecture that unifies tokenization and latent diffusion through a single Generative Encoder with shared weights. The key innovation is treating tokenization and generation as the same latent inference problem under different conditioning regimes. The method introduces a single-stage training procedure that jointly optimizes both tokenization and generation tasks via two forward passes, enabling gradients to jointly shape the latent space for improved visual representation learning.

潜在扩散模型通过在学习的潜在空间中进行运算,实现高保真合成。然而,现有方法需要复杂的多阶段训练流程,标记化器必须先于扩散模型单独训练。本文提出UNITE,一种通过具有共享权重的单一生成编码器统一标记化与潜在扩散的自编码器架构。其核心创新在于将标记化与生成视为同一潜在推理问题在不同条件机制下的变体。该方法引入单阶段训练程序,通过两次前向传播联合优化标记化和生成任务,使梯度共同塑造潜在空间,从而提升视觉表征学习效果。

PDF 预览
1
在 arXiv 查看下载 PDF

分类 / Categories

cs.CVcs.LG

深度分析

AI 深度理解论文内容,生成具有洞见性的总结