基于特征增强的轻量化高效扩散模型
首发时间:2025-03-24
摘要:扩散模型在图像生成领域取得了显基于特征增强的轻量化高效扩散模型著进展,但其在潜在空间的图像生成质量仍有改进空间。U-DiT作为一种结合 U-Net 和 Transformer 的高效扩散模型,在图像生成任务中表现出色,然而其特征表示能力尚待进一步优化。本文提出了一种基于通道间依赖的特征增强方法Channel-Correlation Adaptive Recalibration (CCAR),通过在Transformers块的自注意力层之前引入这种方法,可以自适应调整通道权重,与U-DiT的下采样自注意力(强调全局低频信息)形成互补,优化特征图的通道维度,增强模型对细节和结构的捕捉能力,从而提升对任务关键特征的表达能力。此外,CCAR的残差连接设计避免了额外的参数开销,提高特征提取能力的同时保证了模型的轻量化。在 ImageNet 数据集上的实验表明,改进后的 U-DiT 模型在Frechet Inception Distance(FID)指标上从 10.08 降低至 8.08,Inception Score(IS)从112.44提升至121.33,生成图像质量显著优于原始模型。本研究为扩散模型的改进提供了新的思路,并突显了通道间依赖建模在提升图像生成性能方面的潜力。
关键词: 人工智能 扩散模型 Transformer 通道注意力 图像生成
For information in English, please click here
Lightweight and Efficient Diffusion Model Based on Feature Enhancement
Abstract:Diffusion models have made significant progress in the field of image generation, but there is still room for improvement in the quality of image generation within latent space. U-DiT, an efficient diffusion model that combines U-Net and Transformer architectures, excels in image generation tasks. However, its feature representation capabilities require further enhancement. This paper introduces a feature enhancement method based on inter-channel dependencies named Channel-Correlation Adaptive Recalibration (CCAR). By incorporating CCAR before the self-attention layers within the Transformers blocks, it adaptively adjusts channel-wise weights, complementing the downsampled self-attention mechanism of U-DiT, whiLightweight and Efficient Diffusion Model Based on Feature Enhancementch emphasizes global low-frequency information. This approach optimizes the channel dimensions of feature maps, enhancing the model\'s ability to capture details and structures, thus improving its expression of task-critical features. Moreover, the residual connection design of CCAR avoids additional parameter overhead, enhancing feature extraction capabilities while ensuring the lightweight nature of the model. Experiments on the ImageNet dataset show that the improved U-DiT model reduces the Fréchet Inception Distance (FID) from 10.08 to 8.08 and increases the Inception Score (IS) from 112.44 to 121.33, demonstrating significantly better image generation quality compared to the original model. This study offers new insights into the enhancement of diffusion models and highlights the potential of modeling inter-channel dependencies to boost image generation performance.
Keywords: Artificial Intelligence;Diffusion Models;Transformer;Channel Attention;Image Generation
引用
No.****
动态公开评议
共计0人参与
勘误表
基于特征增强的轻量化高效扩散模型
评论
全部评论