作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
Large Language Models are increasingly being applied for automated unit test generation in software engineering. This paper presents a comprehensive empirical study evaluating the effectiveness of LLM-generated tests under program code evolution. Through a mutation-driven framework analyzing 22,374 program variants and 8 different LLMs, the researchers assess how generated tests respond to semantic-altering and semantic-preserving changes. The study reveals that while LLMs achieve strong baseline performance with 79% line coverage and 76% branch coverage on original programs, test quality degrades significantly during software evolution. This work provides critical insights into the limitations of current LLM-based testing approaches and highlights the need for more robust test generation methods that can adapt to code changes.
本文通过突变驱动框架对22,374个程序变体和8种不同大语言模型进行综合实证研究,评估生成的测试在语义变更和语义保持变化下的表现。研究发现,尽管LLM在原始程序上达到了79%的行覆盖率和76%的分支覆盖率,但在软件演化过程中测试质量显著下降。该工作揭示了当前基于LLM测试方法的局限性,强调需要开发能够适应代码变化的更具鲁棒性的测试生成方法。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结