基于小样本学习的概念漂移自适应方案
首发时间:2026-05-12
摘要:机器学习方法能够以非常高的准确度检测Android恶意软件,但这些分类器存在致命弱点--概念漂移,即随着恶意和良性应用的持续演化,已部署的检测模型性能迅速衰退。现有基于主动学习的概念漂移自适应方法在新型恶意软件初期样本极少、呈现长尾分布的场景下适应性严重不足,导致新型恶意软件在最优方法中的误分类持续时间平均长达数月。为此,本文提出一种基于原型网络的小样本持续学习框架,通过构建良性样本与恶意样本的类别原型,融合多种主动学习采样策略,在有限标注预算下动态筛选高价值漂移样本,并结合历史数据回放与热启动策略实现模型的高效持续更新。在APIGraph、AndroZoo、BODMAS及Content等多个真实世界概念漂移数据集上的实验评估表明,本文方法在保持宏观检测性能的同时,显著提升了对新型小样本及长尾分布恶意家族的检测能力,将新型恶意软件的误分类持续时间平均降低50%以上,且在细化评估指标上优于现有主流方法。
关键词: 信息安全 恶意软件检测 概念漂移自适应 主动学习 小样本学习 原型网络
For information in English, please click here
Scheme of Concept Drift Self-Adaptation Based on Few-Shot Learning
Abstract:Machine Learning methods can detect Android malware with very high accuracy, but these classifiers have a fatal weakness - concept drift, which means that as malicious and benign applications continue to evolve, the performance of deployed detection models rapidly deteriorates. Existing concept drift self-adaptation methods based on active learning are severely inadequate in scenarios where initial samples of new malware are extremely scarce and exhibit long-tailed distributions, resulting in the misclassification of new malware in the optimal method lasting an average of several months. To address this, this paper proposes a few-shot continual learning framework based on prototype networks. By constructing class prototypes of benign and malicious samples and integrating multiple active learning sampling strategies, it dynamically selects high-value drifting samples under limited annotation budgets, and combines historical data replay and warm-start strategies to achieve efficient and continual model updates. Experimental evaluations on multiple real-world concept drift datasets such as APIGraph, AndroZoo, BODMAS, and Content show that the proposed method significantly improves the detection ability of new few-shot samples and low-frequency families with long-tailed distributions while maintaining macroscopic detection performance, reducing the average misclassification duration of new malware by over 50%, and outperforming existing mainstream methods on refined evaluation metrics.
Keywords: Information Security Malware Detection Concept Drift Adaptation Active Learning Few-Shot Learning Prototypical Network
基金:
引用

No.****
动态公开评议
共计0人参与
勘误表
基于小样本学习的概念漂移自适应方案
评论
全部评论