基于自监督Transformer的小数据集识别网络
首发时间:2023-02-20
摘要:Vision Transformer(ViT),一种与卷积神经网络截然不同的架构网络,具有多种优势,包括设计简单性,健壮性,并且已经在许多视觉任务上取得sota。然而与卷积神经网络比,ViT缺少归纳偏置,因此需要大量的数据集预训练从中学习归纳偏置,使得在小型数据集上从头开始训练效果并不好。本文目的是设计一个鲁棒的训练小规模数据集的方案。采用两阶段的方式。第一阶段,设计一种自监督学习方案,从小数据集上进行训练,从中学习归纳偏置,作为初始化权重。第二阶段,对ViT图片分割阶段进行优化,并使用初始化权重在优化的ViT模型上,使用小数据集进行微调。通过在多种公开小数据集上进行广泛的实验证明,与现有算法相比,本文提出的方法有更好的表现。
关键词: 计算机应用技术 图像识别 ViT 自监督学习 小数据集
For information in English, please click here
Small data set recognition network based on self-supervised Transformer
Abstract:Vision Transformer (ViT), an architectural network distinct from convolutional neural networks, has multiple advantages including design simplicity, robustness, and achieving sota on many vision tasks. However, compared with the convolutional neural network, ViT lacks an inductive bias, so a large amount of data set pre-training is required to learn the inductive bias from it, making it difficult to train from scratch on a small data set. The purpose of this paper is to design a robust training scheme for small-scale datasets. Take a two-stage approach. In the first stage, a self-supervised learning scheme is designed to train on a small data set and learn inductive biases from it as initial weights. In the second stage, the ViT image segmentation stage is optimized and fine-tuned with a small dataset using the initialized weights on the optimized ViT model. Through extensive experiments on a variety of public small data sets, it is proved that the method proposed in this paper has better performance compared with existing algorithms.
Keywords: computer application technology image recognition ViT self-supervised learning small dataset
基金:
引用
No.****
同行评议
勘误表
基于自监督Transformer的小数据集识别网络
评论
全部评论