基于句法结构的大语言模型后门攻击方案
首发时间:2024-04-15
摘要:随着人工智能的迅速发展,大语言模型成为该领域的研究热点。大语言模型的安全性也逐渐引起学界和工业界的重视。对大语言模型通常会使用"预训练+微调"的应用方式,但是如果使用了存在后门的数据集对预训练模型进行了微调,就会导致整体模型的不安全。基于此,本文针对将以大语言模型为环境下微调阶段进行的后门攻击开展研究。本文将通过使用相同句法结构特征生成的训练数据作为后门触发器,提出一个适合大模型环境的后门攻击,在与目前的大模型环境的后门攻击进行对比后,结果显示本方法的攻击方式生成的中毒样本在大模型环境中有更高的攻击成功率与干净准确度。
关键词: 网络空间安全 自然语言处理 大语言模型 后门攻击 大模型微调
For information in English, please click here
Large language model backdoor attack scheme based on syntactic structure
Abstract:With the rapid development of artificial intelligence, large language models have become a focal point of research in the field. The security of large language models has gradually garnered attention from both academia and the industry. These models typically employ a "pretraining plus fine-tuning" approach for applications. However, if a dataset containing backdoors is used for fine-tuning the pretrained model, it can compromise the overall security of the model. Based on this, this thesis focuses on conducting research on backdoor attacks during the fine-tuning phase within the context of large language models.This paper proposes a backdoor attack on large language models based on covert characters. Through analyzing the backdoor attacks in the current large language model environment, it is found that the triggers used for attacks are abrupt and singular characters. This paper will use the training data generated with the same syntactic structure features as a backdoor trigger to propose a backdoor attack suitable for large model environments. After comparing with the current backdoor attacks in large model environments, the results show that the attack method generated by this method Poisoned samples have higher attack success rate and clean accuracy in large model environments.
Keywords: Cyber security Natural Language Processing Large Language Models Backdoor Attacks Fine-tuning of Large Models
基金:
引用
No.****
同行评议
勘误表
基于句法结构的大语言模型后门攻击方案
评论
全部评论