Paper Detail
Autoregressive vs. Masked Diffusion Language Models: A Controlled Comparison
cs.CL大语言模型Transformer热门获取多模态
Anonymous Authors
2026年03月23日
arXiv: 2603.22075v1
作者人数
1
标签数量
4
内容状态
含 PDF
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
This paper presents a controlled empirical comparison between autoregressive (AR) and masked diffusion (MDLM) language models trained on identical data (50M tokens from TinyStories), compute budget (20,000 steps, batch size 32, sequence length 512), and hardware (NVIDIA H100 80GB). The study reveals three key findings: both paradigms achieve comparable training throughput (~50K tokens/second), AR converges faster but overfits earlier while MDLM improves more gradually, and there exists a structural diversity-fluency trade-off with AR producing fluent but repetitive outputs and MDLM generating more diverse narratives.
分类 / Categories
cs.CLcs.LG
深度分析
AI 深度理解论文内容,生成具有洞见性的总结