您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2021, Vol. 51 ›› Issue (2): 9-18.doi: 10.6040/j.issn.1672-3961.0.2020.227

• 机器学习与数据挖掘 • 上一篇    下一篇

MIRGAN: 一种基于GAN的医学影像报告生成模型

张俊三1(),程俏俏1,万瑶2,朱杰3,张世栋4   

  1. 1. 中国石油大学(华东)计算机科学与技术学院, 山东 青岛 266580
    2. 浙江大学计算机科学与技术学院, 浙江 杭州 310027
    3. 中央司法警官学院信息管理系, 河北 保定 071000
    4. 国网山东电科院, 山东 济南 250003
  • 收稿日期:2020-06-17 出版日期:2021-04-20 发布日期:2021-04-16
  • 作者简介:张俊三(1978—),男,山东寿光人,副教授,博士,主要研究方向为web数据挖掘,图像处理.E-mail:zhangjunsan@upc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61873280);河北省自然科学基金青年基金资助项目(F2018511002);中央司法警官学院校级科研资助项目(XYZ201602);河北省高等学校科学技术研究资助项目(Z2019037)

MIRGAN: a medical image report generation model based on GAN

Junsan ZHANG1(),Qiaoqiao CHENG1,Yao WAN2,Jie ZHU3,Shidong ZHANG4   

  1. 1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao 266580, Shandong, China
    2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
    3. Department of Information Management, the National Police University for Criminal Justice, Baoding 071000, Hebei, China
    4. State Grid Shandong Electric Power Research Institute, Jinan 250003, Shandong, China
  • Received:2020-06-17 Online:2021-04-20 Published:2021-04-16

摘要:

基于图像理解的医学影像报告生成任务与传统的图像理解任务相比, 是一个更加具有挑战的任务。针对该任务, 提出医学影像报告生成对抗网络(medical image report generative adversarial network, MIRGAN)模型。采用共同注意力机制对多个特征区域的视觉特征和语义特征进行综合处理并分别生成对应于这些区域的描述。融合生成对抗网络(generative adversarial network, GAN)和强化学习(reinforcement learning, RL)方法优化生成模型的性能使其输出更高质量的报告。试验结果验证了MIRGAN模型的有效性。

关键词: 图像理解任务, 医学影像报告生成, 共同注意力机制, 生成对抗网络, 强化学习

Abstract:

The medical image report generation task based on image understanding became a widely concerned issue. Compared with the traditional image understanding task, medical image report generation was a more challenging task. We proposed a medical image report generative adversarial network (MIRGAN) model for this task. A co-attention mechanism was adopted to synthesize the visual and semantic features of multiple feature areas and generate descriptions corresponding to these areas. Combining the generative adversarial networks (GAN) and reinforcement learning (RL) optimized the performance of the generative model to output higher quality reports. The experiment results demonstrated the effectiveness of our proposed MIRGAN model.

Key words: image understanding task, medical image report generation, co-attention mechanism, generative adversarial network, reinforcement learning

中图分类号: 

  • TP391

图1

MIRGAN模型整体架构"

图2

MIRGAN的生成模型"

图3

MIRGAN的判别模型"

表1

X-Ray数据集介绍"

内容 描述
数据数量 7470
“impression”部分 总体诊断
“findings”部分 局部诊断
“tags”部分 关键字

表2

预处理后的数据统计"

数据 数量
标签 351
字典大小 2195
训练集# 6470
测试集# 500
验证集# 500

图4

不同训练策略下的MIRGAN的收敛性能"

表3

基于IU X-Ray数据集的评价结果"

模型 BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDEr
CNN-RNN[1] 0.316 0.211 0.140 0.095 0.159 0.267 0.111
LRCN[25] 0.369 0.229 0.149 0.099 0.155 0.278 0.190
AdtAtt[10] 0.369 0.226 0.151 0.108 0.171 0.323 0.155
CoAtt[2] 0.382 0.248 0.176 0.125 0.184 0.300 0.287
MIRGAN-visual attention[8] 0.381 0.246 0.171 0.119 0.190 0.327 0.304
MIRGAN-semantic attention[10] 0.372 0.232 0.169 0.120 0.187 0.311 0.298
MIRGAN 0.401 0.257 0.178 0.125 0.194 0.336 0.313

图5

胸部X射线示例图"

1 VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 3156-3164.
2 JING Baoyu, XIE Pengtao, XING ERIC. On the automatic generation of medical imaging reports[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018.
3 GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014.
4 SUTTON R S , BARTO A G . Introduction to reinforcement learning[M]. Cambridge, UK: MIT Press, 1998.
5 DENTON E L, CHINTALA S, FERGUS R. Deep generative image models using a laplacian pyramid of adversarial networks[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015.
6 LI Changliang, SU Yixin, LIU Wenju. Text-to-text generative adversarial networks[C]//2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro, Brazil: IEEE, 2018: 1-7.
7 CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and shannel-wise attention in convolutional networks for image captioning[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016.
8 XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning. Lille, France: ACM, 2015: 2048-2057.
9 YOU Quanzeng, JIN Hailin, WANG Zhaowen, et al. Image captioning with semantic attention[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4651-4659.
10 LU Jiasen, XIONG Caiming, PARIKH DEVI, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 375-383.
11 WANG Xiaosong, PENG Yifan, LU Zhiyong, et al. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays[C]//Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 9049-9058.
12 LI Y, LIANG X, HU Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[C]// Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2018: 1530-1540.
13 KISILEV P , WALACH E , BARKAN E , et al. From medical image to automatic medical report generation[J]. IBM Journal of Research and Development, 2015, 59 (2/3): 2:1- 2:7.
14 SHIN H C, ROBERTS K, LU Le, et al. Learning to read chest x-rays: recurrent neural cascade model for automated image annotation[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2497-2506.
15 ZHANG Y, GAN Z, LAWRENCE C. Generating Text via Adversarial Training[C]//Advances in Neural Information Processing Systems. Barcelona, Spain: MIT Press, 2016.
16 BACHMAN P, PRECUP D. Data generation as sequential decision making[C]//Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015: 3249-3257.
17 SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems. Denver, USA: MIT Press, 2000.
18 YU Lantao, ZHANG Weinan, WANG Jun, et al. Seqgan: sequence generative adversarial nets with policy gradient[C]//AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017.
19 KRAUSE J, JOHNSON J, KRISHNA R, et al. A hierarchical approach for generating descriptive image paragraphs[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 317-325.
20 VESELY K, GHOSHAL A, BURGET L, et al. Sequence-discriminative training of deep neural networks[C]//Interspeech. Lyon, France: IEEE, 2013: 2345-2349.
21 KIM Y. Convolutional neural networks for sentence classification[C]//Empirical Methods in Natural Language Processing. Doha, Qatar: ACL, 2014: 1746-1751.
22 LAI Siwei, XU Liheng, LIU Kang, et al. Recurrent convolutional neural networks for text classification[C]//29th AAAI Conference on Artificial Intelligence. Austin Texas, USA: AAAI, 2015.
23 DEMNER-FUSHMAN D , KOHLI M D , ROSENMAN M B , et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association, 2016, 23 (2): 304- 310.
24 HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Computer Visual on and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
25 DONAHUE J, ANNE HENDRICKS L, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015.
26 PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Grenoble, France: ACL, 2002.
27 DENKOWSKI M, LAVIE A. Meteor universal: language specific translation evaluation for any target language[C]// Proceedings of the 9th Workshop on Statistical Machine Translation. Baltimore, USA: ACL, 2014: 376-380.
28 CHIN-YEW L. Rouge: a package for automatic evaluation of summaries[C]//Proceedings of the 42th Annual Meeting on Association for Computational Linguistics. Barcelona, Spain: ACL, 2004: 74-81.
29 VEDANTAM R, LAWRENCE Z C, PARIKH D. Cider: consensus-based image description evaluation[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 4566-4575.
[1] 张月芳,邓红霞,呼春香,钱冠宇,李海芳. 融合残差块注意力机制和生成对抗网络的海马体分割[J]. 山东大学学报 (工学版), 2020, 50(6): 76-81.
[2] 李春阳,李楠,冯涛,王朱贺,马靖凯. 基于深度学习的洗衣机异常音检测[J]. 山东大学学报 (工学版), 2020, 50(2): 108-117.
[3] 常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35.
[4] 沈晶,刘海波,张汝波,吴艳霞,程晓北. 基于半马尔可夫对策的多机器人分层强化学习[J]. 山东大学学报(工学版), 2010, 40(4): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王丽君,黄奇成,王兆旭 . 敏感性问题中的均方误差与模型比较[J]. 山东大学学报(工学版), 2006, 36(6): 51 -56 .
[2] 刘忠国,张晓静,刘伯强,刘常春 . 视觉刺激间隔对大脑诱发电位的影响[J]. 山东大学学报(工学版), 2006, 36(3): 34 -38 .
[3] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .
[4] 孙从征,管从胜,秦敬玉,程川 . 铝合金化学镀镍磷合金结构和性能[J]. 山东大学学报(工学版), 2007, 37(5): 108 -112 .
[5] 徐丽丽,季忠,夏继梅 . 同规格货物装箱问题的优化计算[J]. 山东大学学报(工学版), 2008, 38(3): 14 -17 .
[6] 赵勇 田四明 曹哲明. 宜万铁路复杂岩溶隧道施工地质工作方法[J]. 山东大学学报(工学版), 2009, 39(5): 91 -95 .
[7] 高厚磊 田佳 杜强 武志刚 刘淑敏. 能源开发新技术——分布式发电[J]. 山东大学学报(工学版), 2009, 39(5): 106 -110 .
[8] 岳远征. 远离平衡态玻璃的弛豫[J]. 山东大学学报(工学版), 2009, 39(5): 1 -20 .
[9] 陈华鑫, 陈拴发, 王秉纲. 基质沥青老化行为与老化机理[J]. 山东大学学报(工学版), 2009, 39(2): 125 -130 .
[10] 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014, 44(6): 15 -18 .