Paper Detail

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels规模化DoRA：基于分解范数与融合核的高秩适应方法

cs.CL大语言模型Transformer热门获取

DoRA Research Team

2026年03月24日

arXiv: 2603.22276v1

作者人数

1

标签数量

3

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction for more efficient LLM fine-tuning. However, the computational overhead of computing row-wise norms of the adapted weight matrices creates significant memory challenges, especially at high ranks and across hundreds of adapted modules. This work introduces a factored norm decomposition that eliminates dense matrix materialization by computing squared norms through base, cross, and Gram terms with O(d_out r + r^2) complexity. Additionally, fused Triton kernels combine the four-kernel DoRA composition into a single pass, achieving approximately 4x memory traffic reduction and numerical stability in near-unity rescaling regimes. These optimizations make high-rank DoRA feasible on common single-GPU setups for large language model adaptation.

摘要 / Abstract

分类 / Categories

深度分析