Journal of Shandong University(Engineering Science) ›› 2021, Vol. 51 ›› Issue (2): 9-18.doi: 10.6040/j.issn.1672-3961.0.2020.227

• Machine Learning & Data Mining • Previous Articles     Next Articles

MIRGAN: a medical image report generation model based on GAN

Junsan ZHANG1(),Qiaoqiao CHENG1,Yao WAN2,Jie ZHU3,Shidong ZHANG4   

  1. 1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao 266580, Shandong, China
    2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
    3. Department of Information Management, the National Police University for Criminal Justice, Baoding 071000, Hebei, China
    4. State Grid Shandong Electric Power Research Institute, Jinan 250003, Shandong, China
  • Received:2020-06-17 Online:2021-04-20 Published:2021-04-16

Abstract:

The medical image report generation task based on image understanding became a widely concerned issue. Compared with the traditional image understanding task, medical image report generation was a more challenging task. We proposed a medical image report generative adversarial network (MIRGAN) model for this task. A co-attention mechanism was adopted to synthesize the visual and semantic features of multiple feature areas and generate descriptions corresponding to these areas. Combining the generative adversarial networks (GAN) and reinforcement learning (RL) optimized the performance of the generative model to output higher quality reports. The experiment results demonstrated the effectiveness of our proposed MIRGAN model.

Key words: image understanding task, medical image report generation, co-attention mechanism, generative adversarial network, reinforcement learning

CLC Number: 

  • TP391

Fig.1

The whole structure of MIRGAN"

Fig.2

The generative model of MIRGAN"

Fig.3

The discriminative model of MIRGAN"

Table 1

IU X-Ray dataset's introduction"

内容 描述
数据数量 7470
“impression”部分 总体诊断
“findings”部分 局部诊断
“tags”部分 关键字

Table 2

The statistics after preprocessing"

数据 数量
标签 351
字典大小 2195
训练集# 6470
测试集# 500
验证集# 500

Fig.4

The convergence performance of MIRGAN under different training strategies"

Table 3

The evaluation results on the IU X-Ray dataset"

模型 BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDEr
CNN-RNN[1] 0.316 0.211 0.140 0.095 0.159 0.267 0.111
LRCN[25] 0.369 0.229 0.149 0.099 0.155 0.278 0.190
AdtAtt[10] 0.369 0.226 0.151 0.108 0.171 0.323 0.155
CoAtt[2] 0.382 0.248 0.176 0.125 0.184 0.300 0.287
MIRGAN-visual attention[8] 0.381 0.246 0.171 0.119 0.190 0.327 0.304
MIRGAN-semantic attention[10] 0.372 0.232 0.169 0.120 0.187 0.311 0.298
MIRGAN 0.401 0.257 0.178 0.125 0.194 0.336 0.313

Fig.5

Examples of chest X-ray"

1 VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 3156-3164.
2 JING Baoyu, XIE Pengtao, XING ERIC. On the automatic generation of medical imaging reports[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018.
3 GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014.
4 SUTTON R S , BARTO A G . Introduction to reinforcement learning[M]. Cambridge, UK: MIT Press, 1998.
5 DENTON E L, CHINTALA S, FERGUS R. Deep generative image models using a laplacian pyramid of adversarial networks[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015.
6 LI Changliang, SU Yixin, LIU Wenju. Text-to-text generative adversarial networks[C]//2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro, Brazil: IEEE, 2018: 1-7.
7 CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and shannel-wise attention in convolutional networks for image captioning[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016.
8 XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning. Lille, France: ACM, 2015: 2048-2057.
9 YOU Quanzeng, JIN Hailin, WANG Zhaowen, et al. Image captioning with semantic attention[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4651-4659.
10 LU Jiasen, XIONG Caiming, PARIKH DEVI, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 375-383.
11 WANG Xiaosong, PENG Yifan, LU Zhiyong, et al. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays[C]//Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 9049-9058.
12 LI Y, LIANG X, HU Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[C]// Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2018: 1530-1540.
13 KISILEV P , WALACH E , BARKAN E , et al. From medical image to automatic medical report generation[J]. IBM Journal of Research and Development, 2015, 59 (2/3): 2:1- 2:7.
14 SHIN H C, ROBERTS K, LU Le, et al. Learning to read chest x-rays: recurrent neural cascade model for automated image annotation[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2497-2506.
15 ZHANG Y, GAN Z, LAWRENCE C. Generating Text via Adversarial Training[C]//Advances in Neural Information Processing Systems. Barcelona, Spain: MIT Press, 2016.
16 BACHMAN P, PRECUP D. Data generation as sequential decision making[C]//Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015: 3249-3257.
17 SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems. Denver, USA: MIT Press, 2000.
18 YU Lantao, ZHANG Weinan, WANG Jun, et al. Seqgan: sequence generative adversarial nets with policy gradient[C]//AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017.
19 KRAUSE J, JOHNSON J, KRISHNA R, et al. A hierarchical approach for generating descriptive image paragraphs[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 317-325.
20 VESELY K, GHOSHAL A, BURGET L, et al. Sequence-discriminative training of deep neural networks[C]//Interspeech. Lyon, France: IEEE, 2013: 2345-2349.
21 KIM Y. Convolutional neural networks for sentence classification[C]//Empirical Methods in Natural Language Processing. Doha, Qatar: ACL, 2014: 1746-1751.
22 LAI Siwei, XU Liheng, LIU Kang, et al. Recurrent convolutional neural networks for text classification[C]//29th AAAI Conference on Artificial Intelligence. Austin Texas, USA: AAAI, 2015.
23 DEMNER-FUSHMAN D , KOHLI M D , ROSENMAN M B , et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association, 2016, 23 (2): 304- 310.
24 HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Computer Visual on and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
25 DONAHUE J, ANNE HENDRICKS L, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015.
26 PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Grenoble, France: ACL, 2002.
27 DENKOWSKI M, LAVIE A. Meteor universal: language specific translation evaluation for any target language[C]// Proceedings of the 9th Workshop on Statistical Machine Translation. Baltimore, USA: ACL, 2014: 376-380.
28 CHIN-YEW L. Rouge: a package for automatic evaluation of summaries[C]//Proceedings of the 42th Annual Meeting on Association for Computational Linguistics. Barcelona, Spain: ACL, 2004: 74-81.
29 VEDANTAM R, LAWRENCE Z C, PARIKH D. Cider: consensus-based image description evaluation[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 4566-4575.
[1] ZHENG Zijun, FENG Xiang, YU Huiqun, LI Xiuquan. Dynamic prediction of spatiotemporal big data based on relationship transfer and reinforcement learning [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 105-114.
[2] ZHANG Yuefang, DENG Hongxia, HU Chunxiang, QIAN Guanyu, LI Haifang. Hippocampal segmentation combining residual attention mechanism and generative adversarial networks [J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 76-81.
[3] Chunyang LI,Nan LI,Tao FENG,Zhuhe WANG,Jingkai MA. Abnormal sound detection of washing machines based on deep learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 108-117.
[4] Zhifu CHANG,Fengyu ZHOU,Yugang WANG,Dongdong SHEN,Yang ZHAO. A survey of image captioning methods based on deep learning [J]. Journal of Shandong University(Engineering Science), 2019, 49(6): 25-35.
[5] SHEN Jing, LIU Hai-bo, ZHANG Ru-bo, WU Yan-xia, CHENG Xiao-bei. Multi-robot hierarchical reinforcement learning based on semi-Markov games [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(4): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Li-ju,HUANG Qi-cheng,WANG Zhao-xu . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 51 -56 .
[2] LIU Zhongguo,ZHANG Xiaojing,LIU Boqiang,LIU Changchun, . The development of ultrasonic characterization of the biological tissue elasticity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(3): 34 -38 .
[3] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .
[4] SUN Cong-zheng,GUAN Cong-sheng,QIN Jing-yu,CHENG Chuan . The structure and performances of the electroless Ni-P alloy coating on aluminum alloy[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(5): 108 -112 .
[5] XU Li-li,JI Zhong,XIA Ji-mei . The optimum algorithm for the container loading problem with homogeneous cargoes[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 14 -17 .
[6] DIAO Yong, TIAN Si-Meng, CAO Zhe-Meng. Geological work method for the construction of the Yichang Wanzhou Railway tunnel in high risk karst areas[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 91 -95 .
[7] GAO Hou-Lei, TIAN Jia, DU Jiang, WU Zhi-Gang, LIU Chu-Min. Distributed generation—new technology in energy development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 106 -110 .
[8] YUE Yuan-Zheng. Relaxation in glasses far from equilibrium[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(5): 1 -20 .
[9] CHEN Huaxin, CHEN Shuanfa, WANG Binggang. The aging behavior and mechanism of base asphalts[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 125 -130 .
[10] XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu. The sentiment mining method based on extended sentiment dictionary and integrated features[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 15 -18 .