作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
This paper investigates whether large language models demonstrate genuine moral reasoning capabilities or merely produce superficially convincing reasoning-like outputs. The study analyzes responses from 13 different LLMs across six classical moral dilemmas using Kohlberg's stages of moral development as an evaluation framework. Through an LLM-as-judge scoring pipeline validated across three judge models, over 600 responses were classified and analyzed. The research reveals a significant finding that LLM responses predominantly exhibit post-conventional reasoning patterns (Stages 5-6), which contradicts typical human developmental trajectories where such reasoning emerges later in moral development. This inversion suggests that alignment training may produce outputs that mimic advanced moral reasoning without the underlying developmental progression characteristic of human moral cognition.
本文探讨大型语言模型是否具备真正的道德推理能力,还是仅产生表面上有说服力的推理类输出。研究以科尔伯格道德发展阶段为评估框架,分析了13种不同LLM在六个经典道德困境中的响应,并通过对三个评判模型验证的LLM-as-judge评分流程,对超过600条响应进行了分类分析。研究发现,LLM响应主要呈现后习俗推理模式(第5-6阶段),这与人类道德发展轨迹相反——在人类中此类推理通常出现在道德发展的较晚阶段。这种反转表明,对齐训练可能产生了模仿高级道德推理的输出,却缺乏人类道德认知所具有的底层发展过程。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结