Paper Detail

RubricRAG: Towards Interpretable and Reliable LLM Evaluation via Domain Knowledge Retrieval for Rubric GenerationRubricRAG：通过领域知识检索生成评分标准实现可解释且可靠的LLM评估

cs.CL大语言模型端到端Transformer热门获取

RubricRAG Authors

2026年03月22日

arXiv: 2603.20882v1

作者人数

1

标签数量

4

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Large language models are increasingly evaluated using automated graders that output scalar scores or preferences, but these approaches lack interpretability as a single score cannot explain why an answer is good or bad. Rubric-based evaluation offers a more transparent alternative by decomposing quality into explicit, checkable criteria. However, manually designing high-quality, query-specific rubrics is labor-intensive and cognitively demanding. This work investigates whether LLMs can generate interpretable and effective rubrics through domain knowledge retrieval for automated evaluation. The proposed RubricRAG framework aims to enhance the interpretability and reliability of LLM evaluation by leveraging retrieved domain knowledge to generate task-specific evaluation rubrics.

摘要 / Abstract

分类 / Categories

深度分析