返回论文列表
Paper Detail
SecureBreak: A Dataset Towards Safe and Secure ModelsSecureBreak:面向安全语言模型的基准数据集
cs.CL大语言模型端到端Transformer热门获取
SecureBreak Team
2026年03月23日
arXiv: 2603.21975v1

作者人数

1

标签数量

4

内容状态

含 PDF

原文 + 中文

同页查看标题和摘要的双语信息

PDF 预览

直接在详情页阅读或下载论文全文

深度分析

继续下钻到 AI 生成的结构化解读

摘要 / Abstract

Large language models are becoming pervasive core components in many real-world applications. As a consequence, security alignment represents a critical requirement for their safe deployment. Although previous related works focused primarily on model architectures and alignment methodologies, these approaches alone cannot ensure the complete elimination of harmful generations. This concern is reinforced by the growing body of scientific literature showing that attacks, such as jailbreaking and prompt injection, can bypass existing security alignment mechanisms. As a consequence, additional security strategies are needed both to provide qualitative feedback on the robustness of the obtained security alignment at the training stage, and to create an ultimate defense layer to block unsafe outputs possibly produced by deployed models. To provide a contribution in this scenario, this paper introduces SecureBreak, a safety-oriented dataset designed to support the development of AI-driven security solutions for language models.

大型语言模型正成为众多实际应用的核心组件,安全对齐对其安全部署至关重要。尽管先前研究主要关注模型架构和对齐方法,但这些方法仍无法完全消除有害生成。攻击者可通过越狱和提示注入等方式绕过现有安全对齐机制。为此,本文提出SecureBreak——一个面向安全的数据集,旨在支持语言模型AI驱动安全解决方案的研发。

PDF 预览
1
在 arXiv 查看下载 PDF

分类 / Categories

cs.CLcs.AI

深度分析

AI 深度理解论文内容,生成具有洞见性的总结