您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2024, Vol. 54 ›› Issue (6): 49-56.doi: 10.6040/j.issn.1672-3961.0.2023.113

• 机器学习与数据挖掘 • 上一篇    

基于空间注意力及条件增强的文本生成图像方法

马军1,2,车进1,2*,贺愉婷1,2,马鹏森1,2   

  1. 1.宁夏大学电子与电气工程学院, 宁夏 银川 750021;2.宁夏沙漠信息智能感知重点实验室, 宁夏 银川 750021
  • 发布日期:2024-12-26
  • 作者简介:马军(1996— ),男,宁夏吴忠人,硕士研究生,主要研究方向为计算机视觉、图像生成及深度学习. E-mail:1229012138@qq.com. *通信作者简介:车进(1973— ),男,宁夏银川人,教授,硕士生导师,博士,主要研究方向为智能信息处理与模式识别以及多模态智能. E-mail:koalache@126.com
  • 基金资助:
    国家自然科学基金资助项目(61861037);宁夏大学研究生创新研究基金资助项目(CXXM202223)

Text-to-image synthesis method based on spatial attention and conditional augmentation

MA Jun1,2, CHE Jin1,2*, HE Yuting1,2, MA Pengsen1,2   

  1. 1. School of Electronic and Electrical Engineering, Ningxia University, Yinchuan 750021, Ningxia, China;
    2. Key Laboratory of Intelligent Sensing for Desert Information, Yinchuan 750021, Ningxia, China
  • Published:2024-12-26

摘要: 针对文本生成图像语义不一致、训练不稳定、生成图像单一等问题,在一种简单有效的文本生成图像基准模型上提出基于空间注意力及条件增强的文本生成图像模型。为提高训练过程的稳定性、增加生成图像的多样性,在原有模型基础上增加条件增强模型;从文本分布出发拟合图像分布,增加视觉特征的多样性,扩大表现空间,在原有的DF-Block模块中增加一层Affine仿射块。在判别器中加入空间注意力模型,提高文本与合成图像的语义一致性。试验结果表明,在CUB和Oxford-102数据集上,初始得分分别提高了2.05%和2.63%;在CUB和COCO数据集上,特征空间距离分别降低了20.73%和9.25%。本研究提出的模型生成的图像更具多样性且更接近真实图像。

关键词: 文本生成图像, DF-GAN, 条件增强模型, Affine仿射块, 空间注意力模型

中图分类号: 

  • TP391
[1] YI X, WALIA E, BABYN P. Generative adversarial network in medical imaging: a review[J]. Medical Image Analysis, 2019, 58:101552.
[2] 胡名起. 基于生成对抗网络的文本生成图像研究[D].南京:东南大学, 2020. HU Mingqi. Research on text-to-image generation based on generative adversarial network[D]. Nanjing: Southeast University, 2020.
[3] GOLDBERG Y. Neural network methods for natural language processing[M]. Berlin: Springer Nature, 2022.
[4] XU K, BA J, KIROS R, et al. Show, attend and tell:neural image caption generation with visual attention[C] //Proceedings of International Conference on Machine Learning. Lille, France: PMLR, 2015: 2048-2057.
[5] ZHANG H, XU T, LI H S, et al. StackGAN: realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1947-1962
[6] XU T, ZHANG P, HUANG Q, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Dhaka, Bangladesh: IEEE Press, 2018: 1316-1324.
[7] TAO M, TANG H, WU S, et al. DF-GAN: deep fusion generative adversarial networks for text-to-image synthesis[EB/OL].(2020-08-13)[2023-03-18]. https://arxiv.org/abs/2008.05865v1.
[8] DU C, ZHANG L, SUN X, et al. Enhanced multi-channel feature synthesis for hand gesture recognition based on CNN with a channel and spatial attention mechanism[J]. IEEE Access, 2020, 8: 144610-144620.
[9] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[J]. Advances in Neural Information Processing Systems, 2016, 29(2): 2234-2242.
[10] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017, 30(4): 6629-6640.
[11] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[12] REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C] // Proceedings of International Conference on Machine Learning. Lille, France: PMLR, 2016: 1060-1069.
[13] NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C] //Proceedings of 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. Bhubaneswar, India: IEEE Press, 2008:722-729.
[14] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset[J]. California Institute of Technology, 2011, 7(1): 1-8.
[15] ZHANG H, XU T, LI H S, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C] //Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE Press, 2017: 5908-5916.
[16] LI B, QI X, LUKASIEWICZ T, et al. Controllable text-to-image generation[J]. Advances in Neural Information Processing Systems, 2019, 32(3):2065-2075.
[17] ZHU M F,PAN P B,CHEN W,et al. DM-GAN:dynamic memory generative adversarial networks for text-to-image synthesis[C] //Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington,USA: IEEE Press, 2019:5795-5803.
[18] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on Signal Processing, 1997, 45(11): 2673-2681.
[19] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 770-778.
[20] XUE W, ZHONG P, ZHANG W, et al. Sample-based online learning for bi-regular hinge loss[J]. International Journal of Machine Learning and Cybernetics, 2021, 12: 1753-1768.
[21] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C] // Proceedings of Computer Vision-ECCV 2014. Zurich, Switzerland: Springer, 2014: 740-755.
[22] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL].(2014-12-22)[2023-03-18]. https://arxiv.org/abs/1412.6980.
[23] QIAO T, ZHANG J, XU D, et al. MirrorGAN: learning text-to-image generation by redescription[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Changsha, China: IEEE Press, 2019: 1505-1514.
[1] 李二超, 张智钊. 在线动态订单需求车辆路径规划[J]. 山东大学学报 (工学版), 2024, 54(5): 62-73.
[2] 杨巨成, 魏峰, 林亮, 贾庆祥, 刘建征. 驾驶员疲劳驾驶检测研究综述[J]. 山东大学学报 (工学版), 2024, 54(2): 1-12.
[3] 肖伟, 郑更生, 陈钰佳. 结合自训练模型的命名实体识别方法[J]. 山东大学学报 (工学版), 2024, 54(2): 96-102.
[4] 胡钢, 王乐萌, 卢志宇, 王琴, 徐翔. 基于节点多阶邻居递阶关联贡献度的重要性辨识[J]. 山东大学学报 (工学版), 2024, 54(1): 1-10.
[5] 李家春,李博文,常建波. 一种高效且轻量的RGB单帧人脸反欺诈模型[J]. 山东大学学报 (工学版), 2023, 53(6): 1-7.
[6] 樊禹江,黄欢欢,丁佳雄,廖凯,余滨杉. 基于云模型的老旧小区韧性评价体系[J]. 山东大学学报 (工学版), 2023, 53(5): 1-9, 19.
[7] 李颖,王建坤. 基于监督图正则化和信息融合的轻度认知障碍分类方法[J]. 山东大学学报 (工学版), 2023, 53(4): 65-73.
[8] 于艺旋,杨耕,耿华. 连续复合运动的多模态层次化关键帧提取方法[J]. 山东大学学报 (工学版), 2023, 53(2): 42-50.
[9] 张豪,李子凌,刘通,张大伟,陶建华. 融合社会学因素的模糊贝叶斯网技术预测模型[J]. 山东大学学报 (工学版), 2023, 53(2): 23-33.
[10] 吴艳丽,刘淑薇,何东晓,王晓宝,金弟. 刻画多种潜在关系的泊松-伽马主题模型[J]. 山东大学学报 (工学版), 2023, 53(2): 51-60.
[11] 余明骏,刁红军,凌兴宏. 基于轨迹掩膜的在线多目标跟踪方法[J]. 山东大学学报 (工学版), 2023, 53(2): 61-69.
[12] 黄华娟,程前,韦修喜,于楚楚. 融合Jaya高斯变异的自适应乌鸦搜索算法[J]. 山东大学学报 (工学版), 2023, 53(2): 11-22.
[13] 刘方旭,王建,魏本征. 基于多空间注意力的小儿肺炎辅助诊断算法[J]. 山东大学学报 (工学版), 2023, 53(2): 135-142.
[14] 刘行,杨璐,郝凡昌. 基于多特征融合的手指静脉图像检索方法[J]. 山东大学学报 (工学版), 2023, 53(2): 118-126.
[15] 袁钺,王艳丽,刘勘. 基于空洞卷积块架构的命名实体识别模型[J]. 山东大学学报 (工学版), 2022, 52(6): 105-114.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!