基于特征增强的轻量化高效扩散模型

刘晓斌; 张师林

0
0
浏览
下载

摘要
关键词
基金信息
论文图表
动态公开评议
相关论文
评论

基于特征增强的轻量化高效扩散模型

首发时间：2025-03-24

刘晓斌 ¹
刘晓斌（2001-），男，现为北方工业大学大学电气与控制工程学院控制科学与工程方向的硕士研究生，导师为张师林教授。研究主要集中在世界模型、自动驾驶以及神经渲染的应用，特别是基于大模型多代理协作的交通场景仿真方法研究。
张师林 ¹
张师林（1980-），男，副教授、硕导，计算机视觉

1、北方工业大学电气与控制工程学院，北京市 100144

摘要：扩散模型在图像生成领域取得了显基于特征增强的轻量化高效扩散模型著进展，但其在潜在空间的图像生成质量仍有改进空间。U-DiT作为一种结合 U-Net 和 Transformer 的高效扩散模型，在图像生成任务中表现出色，然而其特征表示能力尚待进一步优化。本文提出了一种基于通道间依赖的特征增强方法Channel-Correlation Adaptive Recalibration (CCAR)，通过在Transformers块的自注意力层之前引入这种方法，可以自适应调整通道权重，与U-DiT的下采样自注意力（强调全局低频信息）形成互补，优化特征图的通道维度,增强模型对细节和结构的捕捉能力，从而提升对任务关键特征的表达能力。此外，CCAR的残差连接设计避免了额外的参数开销，提高特征提取能力的同时保证了模型的轻量化。在 ImageNet 数据集上的实验表明，改进后的 U-DiT 模型在Frechet Inception Distance（FID）指标上从 10.08 降低至 8.08，Inception Score（IS）从112.44提升至121.33，生成图像质量显著优于原始模型。本研究为扩散模型的改进提供了新的思路，并突显了通道间依赖建模在提升图像生成性能方面的潜力。

关键词：人工智能扩散模型 Transformer 通道注意力图像生成

For information in English, please click here

Lightweight and Efficient Diffusion Model Based on Feature Enhancement

liuxiaobin ¹
刘晓斌（2001-），男，现为北方工业大学大学电气与控制工程学院控制科学与工程方向的硕士研究生，导师为张师林教授。研究主要集中在世界模型、自动驾驶以及神经渲染的应用，特别是基于大模型多代理协作的交通场景仿真方法研究。
zhangshilin ¹
张师林（1980-），男，副教授、硕导，计算机视觉

1、North China University of Technology, School of Electrical and Control Engineering, Beijing 100144, China

Abstract：Diffusion models have made significant progress in the field of image generation, but there is still room for improvement in the quality of image generation within latent space. U-DiT, an efficient diffusion model that combines U-Net and Transformer architectures, excels in image generation tasks. However, its feature representation capabilities require further enhancement. This paper introduces a feature enhancement method based on inter-channel dependencies named Channel-Correlation Adaptive Recalibration (CCAR). By incorporating CCAR before the self-attention layers within the Transformers blocks, it adaptively adjusts channel-wise weights, complementing the downsampled self-attention mechanism of U-DiT, whiLightweight and Efficient Diffusion Model Based on Feature Enhancementch emphasizes global low-frequency information. This approach optimizes the channel dimensions of feature maps, enhancing the model\'s ability to capture details and structures, thus improving its expression of task-critical features. Moreover, the residual connection design of CCAR avoids additional parameter overhead, enhancing feature extraction capabilities while ensuring the lightweight nature of the model. Experiments on the ImageNet dataset show that the improved U-DiT model reduces the Fréchet Inception Distance (FID) from 10.08 to 8.08 and increases the Inception Score (IS) from 112.44 to 121.33, demonstrating significantly better image generation quality compared to the original model. This study offers new insights into the enhancement of diffusion models and highlights the potential of modeling inter-channel dependencies to boost image generation performance.

Keywords： Artificial Intelligence；Diffusion Models；Transformer；Channel Attention；Image Generation

基金：

1. 北方工业大学毓秀创新项目资助（2024NCUTYXCX108）

论文图表：

引用

导出参考文献

.txt

.ris

.doc

刘晓斌，张师林. 基于特征增强的轻量化高效扩散模型[EB/OL]. 北京：中国科技论文在线 [2025-03-24]. https://www.paper.edu.cn/releasepaper/content/202503-245.

No.****

动态公开评议

共计0人参与

动态评论进行中

全部评论

0/1000

论文编号	202503-245
论文题目	基于特征增强的轻量化高效扩散模型
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：Albert Einstein,编入参考文献时写法：Einstein A. 示例2：原姓名写法：李时珍；编入参考文献时写法：LI S Z. 示例3：YELLAND R L,JONES S C,EASTON K S,et al.