作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
Predicting narrative similarity can be understood as an inherently interpretive task: different, equally valid readings of the same text can produce divergent interpretations and thus different similarity judgments, posing a fundamental challenge for semantic evaluation benchmarks that encode a single ground truth. Rather than treating this multiperspectivity as a challenge to overcome, we propose to incorporate it in the decision making process of predictive systems. To explore this strategy, we created an ensemble of 31 LLM personas. These range from practitioners following interpretive frameworks to more intuitive, lay-style characters. Our experiments were conducted on the SemEval-2026 Task 4 dataset, where the system achieved an accuracy score of 0.705. Accuracy improves with ensemble size, consistent with Condorcet Jury Theorem-like dynamics under weakened independence. Practitioner personas perform worse individually but produce less correlated errors, yielding larger ensemble gains under majority voting. Our error analysis reveals a consistent negative association between gender-focused interpretive vocabulary and accuracy across all persona categories, suggesting either attention to dimensions not relevant for the benchmark or valid interpretations absent from the ground truth. This finding underscores the need for evaluation frameworks that account for interpretive plurality.
叙事相似性预测本质上是一项解释性任务:同一文本的不同解读会产生分歧的解释与相似性判断,这给基于单一真值的语义评估基准带来了根本性挑战。我们没有将多视角性视为需克服的难题,而是将其整合到预测系统的决策过程中,通过构建一个包含31个LLM人格的集成模型进行探索,涵盖从遵循解释性框架的从业者到直觉型外行等不同角色。实验在SemEval-2026 Task 4数据集上进行,系统准确率达0.705,且随集成规模增大而提升,表现出弱独立性条件下类似孔多塞陪审团定理的动态特性。错误分析揭示了性别相关解释性词汇与准确率之间的一致性负相关,表明需要能够容纳多元解释的评估框架。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结