作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
This paper addresses the challenge of improving document-level machine translation using Large Language Models. The authors propose a two-stage fine-tuning strategy that augments training data by converting summarization data into document-level parallel data using LLMs. To ensure data quality, they filter the synthetic corpus using multiple metrics including sacreBLEU, COMET, and LaBSE-based cosine similarity. The approach tackles two key challenges: the scarcity of large-scale document-level parallel data and the tendency of LLMs to generate hallucinations and omissions. By leveraging LLMs' strength in modeling contextual information, this method aims to improve coherence across sentences in translation tasks.
本文研究了利用大语言模型提升文档级机器翻译的挑战。作者提出一种两阶段微调策略,通过大语言模型将摘要数据转换为文档级平行数据以扩充训练数据。为确保数据质量,采用sacreBLEU、COMET和基于LaBSE的余弦相似度等多指标对合成语料库进行过滤。该方法旨在解决大规模文档级平行数据稀缺以及大语言模型易产生幻觉和遗漏的问题。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结