Journal of Shandong University(Engineering Science) ›› 2024, Vol. 54 ›› Issue (6): 49-56.doi: 10.6040/j.issn.1672-3961.0.2023.113

• Machine Learning & Data Mining • Previous Articles    

Text-to-image synthesis method based on spatial attention and conditional augmentation

MA Jun1,2, CHE Jin1,2*, HE Yuting1,2, MA Pengsen1,2   

  1. 1. School of Electronic and Electrical Engineering, Ningxia University, Yinchuan 750021, Ningxia, China;
    2. Key Laboratory of Intelligent Sensing for Desert Information, Yinchuan 750021, Ningxia, China
  • Published:2024-12-26

CLC Number: 

  • TP391
[1] YI X, WALIA E, BABYN P. Generative adversarial network in medical imaging: a review[J]. Medical Image Analysis, 2019, 58:101552.
[2] 胡名起. 基于生成对抗网络的文本生成图像研究[D].南京:东南大学, 2020. HU Mingqi. Research on text-to-image generation based on generative adversarial network[D]. Nanjing: Southeast University, 2020.
[3] GOLDBERG Y. Neural network methods for natural language processing[M]. Berlin: Springer Nature, 2022.
[4] XU K, BA J, KIROS R, et al. Show, attend and tell:neural image caption generation with visual attention[C] //Proceedings of International Conference on Machine Learning. Lille, France: PMLR, 2015: 2048-2057.
[5] ZHANG H, XU T, LI H S, et al. StackGAN: realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1947-1962
[6] XU T, ZHANG P, HUANG Q, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Dhaka, Bangladesh: IEEE Press, 2018: 1316-1324.
[7] TAO M, TANG H, WU S, et al. DF-GAN: deep fusion generative adversarial networks for text-to-image synthesis[EB/OL].(2020-08-13)[2023-03-18]. https://arxiv.org/abs/2008.05865v1.
[8] DU C, ZHANG L, SUN X, et al. Enhanced multi-channel feature synthesis for hand gesture recognition based on CNN with a channel and spatial attention mechanism[J]. IEEE Access, 2020, 8: 144610-144620.
[9] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[J]. Advances in Neural Information Processing Systems, 2016, 29(2): 2234-2242.
[10] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017, 30(4): 6629-6640.
[11] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[12] REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C] // Proceedings of International Conference on Machine Learning. Lille, France: PMLR, 2016: 1060-1069.
[13] NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C] //Proceedings of 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. Bhubaneswar, India: IEEE Press, 2008:722-729.
[14] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset[J]. California Institute of Technology, 2011, 7(1): 1-8.
[15] ZHANG H, XU T, LI H S, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C] //Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE Press, 2017: 5908-5916.
[16] LI B, QI X, LUKASIEWICZ T, et al. Controllable text-to-image generation[J]. Advances in Neural Information Processing Systems, 2019, 32(3):2065-2075.
[17] ZHU M F,PAN P B,CHEN W,et al. DM-GAN:dynamic memory generative adversarial networks for text-to-image synthesis[C] //Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington,USA: IEEE Press, 2019:5795-5803.
[18] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on Signal Processing, 1997, 45(11): 2673-2681.
[19] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 770-778.
[20] XUE W, ZHONG P, ZHANG W, et al. Sample-based online learning for bi-regular hinge loss[J]. International Journal of Machine Learning and Cybernetics, 2021, 12: 1753-1768.
[21] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C] // Proceedings of Computer Vision-ECCV 2014. Zurich, Switzerland: Springer, 2014: 740-755.
[22] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL].(2014-12-22)[2023-03-18]. https://arxiv.org/abs/1412.6980.
[23] QIAO T, ZHANG J, XU D, et al. MirrorGAN: learning text-to-image generation by redescription[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Changsha, China: IEEE Press, 2019: 1505-1514.
[1] LI Erchao, ZHANG Zhizhao. Online dynamic demand vehicle routing planning [J]. Journal of Shandong University(Engineering Science), 2024, 54(5): 62-73.
[2] YANG Jucheng, WEI Feng, LIN Liang, JIA Qingxiang, LIU Jianzheng. A research survey of driver drowsiness driving detection [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 1-12.
[3] XIAO Wei, ZHENG Gengsheng, CHEN Yujia. Named entity recognition method combined with self-training model [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 96-102.
[4] Gang HU, Lemeng WANG, Zhiyu LU, Qin WANG, Xiang XU. Importance identification method based on multi-order neighborhood hierarchical association contribution of nodes [J]. Journal of Shandong University(Engineering Science), 2024, 54(1): 1-10.
[5] Jiachun LI,Bowen LI,Jianbo CHANG. An efficient and lightweight RGB frame-level face anti-spoofing model [J]. Journal of Shandong University(Engineering Science), 2023, 53(6): 1-7.
[6] Yujiang FAN,Huanhuan HUANG,Jiaxiong DING,Kai LIAO,Binshan YU. Resilience evaluation system of the old community based on cloud model [J]. Journal of Shandong University(Engineering Science), 2023, 53(5): 1-9, 19.
[7] Ying LI,Jiankun WANG. The classification of mild cognitive impairment based on supervised graph regularization and information fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 65-73.
[8] YU Yixuan, YANG Geng, GENG Hua. Multimodal hierarchical keyframe extraction method for continuous combined motion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 42-50.
[9] ZHANG Hao, LI Ziling, LIU Tong, ZHANG Dawei, TAO Jianhua. A technology prediction model based on fuzzy Bayesian networks with sociological factors [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 23-33.
[10] WU Yanli, LIU Shuwei, HE Dongxiao, WANG Xiaobao, JIN Di. Poisson-gamma topic model of describing multiple underlying relationships [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 51-60.
[11] YU Mingjun, DIAO Hongjun, LING Xinghong. Online multi-object tracking method based on trajectory mask [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 61-69.
[12] HUANG Huajuan, CHENG Qian, WEI Xiuxi, YU Chuchu. Adaptive crow search algorithm with Jaya algorithm and Gaussian mutation [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 11-22.
[13] LIU Fangxu, WANG Jian, WEI Benzheng. Auxiliary diagnosis algorithm for pediatric pneumonia based on multi-spatial attention [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 135-142.
[14] LIU Xing, YANG Lu, HAO Fanchang. Finger vein image retrieval based on multi-feature fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 118-126.
[15] Yue YUAN,Yanli WANG,Kan LIU. Named entity recognition model based on dilated convolutional block architecture [J]. Journal of Shandong University(Engineering Science), 2022, 52(6): 105-114.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!