返回论文列表
Paper Detail
Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels规模化DoRA:基于分解范数与融合核的高秩适应方法
cs.CL大语言模型Transformer热门获取
DoRA Research Team
2026年03月24日
arXiv: 2603.22276v1

作者人数

1

标签数量

3

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction for more efficient LLM fine-tuning. However, the computational overhead of computing row-wise norms of the adapted weight matrices creates significant memory challenges, especially at high ranks and across hundreds of adapted modules. This work introduces a factored norm decomposition that eliminates dense matrix materialization by computing squared norms through base, cross, and Gram terms with O(d_out r + r^2) complexity. Additionally, fused Triton kernels combine the four-kernel DoRA composition into a single pass, achieving approximately 4x memory traffic reduction and numerical stability in near-unity rescaling regimes. These optimizations make high-rank DoRA feasible on common single-GPU setups for large language model adaptation.

权重分解低秩适应(DoRA)通过将权重幅度与方向解耦来扩展LoRA,以实现更高效的大型语言模型微调。然而,计算适应权重矩阵行范数的计算开销带来了显著的内存挑战,特别是在高秩和跨数百个适应模块的情况下。本工作引入了一种分解范数方法,通过基项、交叉项和Gram项计算平方范数,消除密集矩阵实例化,计算复杂度为O(d_out r + r^2)。此外,融合的Triton核将四核DoRA组合简化为单次传递,实现了约4倍的内存流量减少和接近统一重缩放区域的数值稳定性。这些优化使得在常见单GPU配置上进行大型语言模型适应时,高秩DoRA变得可行。

PDF 预览
1
在 arXiv 查看下载 PDF

分类 / Categories

cs.CLcs.LG

深度分析

AI 深度理解论文内容,生成具有洞见性的总结