返回论文列表
Paper Detail
Autoregressive vs. Masked Diffusion Language Models: A Controlled Comparison
cs.CL大语言模型Transformer热门获取多模态
Anonymous Authors
2026年03月23日
arXiv: 2603.22075v1

作者人数

1

标签数量

4

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

This paper presents a controlled empirical comparison between autoregressive (AR) and masked diffusion (MDLM) language models trained on identical data (50M tokens from TinyStories), compute budget (20,000 steps, batch size 32, sequence length 512), and hardware (NVIDIA H100 80GB). The study reveals three key findings: both paradigms achieve comparable training throughput (~50K tokens/second), AR converges faster but overfits earlier while MDLM improves more gradually, and there exists a structural diversity-fluency trade-off with AR producing fluent but repetitive outputs and MDLM generating more diverse narratives.

PDF 预览
1
在 arXiv 查看下载 PDF

分类 / Categories

cs.CLcs.LG

深度分析

AI 深度理解论文内容,生成具有洞见性的总结