作者人数
标签数量
内容状态
原文 + 中文
同页查看标题和摘要的双语信息
PDF 预览
直接在详情页阅读或下载论文全文
深度分析
继续下钻到 AI 生成的结构化解读
摘要 / Abstract
Large language models are becoming pervasive core components in many real-world applications. As a consequence, security alignment represents a critical requirement for their safe deployment. Although previous related works focused primarily on model architectures and alignment methodologies, these approaches alone cannot ensure the complete elimination of harmful generations. This concern is reinforced by the growing body of scientific literature showing that attacks, such as jailbreaking and prompt injection, can bypass existing security alignment mechanisms. As a consequence, additional security strategies are needed both to provide qualitative feedback on the robustness of the obtained security alignment at the training stage, and to create an ultimate defense layer to block unsafe outputs possibly produced by deployed models. To provide a contribution in this scenario, this paper introduces SecureBreak, a safety-oriented dataset designed to support the development of AI-driven security solutions for language models.
大型语言模型正成为众多实际应用的核心组件,安全对齐对其安全部署至关重要。尽管先前研究主要关注模型架构和对齐方法,但这些方法仍无法完全消除有害生成。攻击者可通过越狱和提示注入等方式绕过现有安全对齐机制。为此,本文提出SecureBreak——一个面向安全的数据集,旨在支持语言模型AI驱动安全解决方案的研发。
分类 / Categories
深度分析
AI 深度理解论文内容,生成具有洞见性的总结