您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2021, Vol. 51 ›› Issue (2): 9-18.doi: 10.6040/j.issn.1672-3961.0.2020.227

• 机器学习与数据挖掘 • 上一篇    下一篇

MIRGAN: 一种基于GAN的医学影像报告生成模型

张俊三1(),程俏俏1,万瑶2,朱杰3,张世栋4   

  1. 1. 中国石油大学(华东)计算机科学与技术学院, 山东 青岛 266580
    2. 浙江大学计算机科学与技术学院, 浙江 杭州 310027
    3. 中央司法警官学院信息管理系, 河北 保定 071000
    4. 国网山东电科院, 山东 济南 250003
  • 收稿日期:2020-06-17 出版日期:2021-04-20 发布日期:2021-04-16
  • 作者简介:张俊三(1978—),男,山东寿光人,副教授,博士,主要研究方向为web数据挖掘,图像处理.E-mail:zhangjunsan@upc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61873280);河北省自然科学基金青年基金资助项目(F2018511002);中央司法警官学院校级科研资助项目(XYZ201602);河北省高等学校科学技术研究资助项目(Z2019037)

MIRGAN: a medical image report generation model based on GAN

Junsan ZHANG1(),Qiaoqiao CHENG1,Yao WAN2,Jie ZHU3,Shidong ZHANG4   

  1. 1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao 266580, Shandong, China
    2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
    3. Department of Information Management, the National Police University for Criminal Justice, Baoding 071000, Hebei, China
    4. State Grid Shandong Electric Power Research Institute, Jinan 250003, Shandong, China
  • Received:2020-06-17 Online:2021-04-20 Published:2021-04-16

摘要:

基于图像理解的医学影像报告生成任务与传统的图像理解任务相比, 是一个更加具有挑战的任务。针对该任务, 提出医学影像报告生成对抗网络(medical image report generative adversarial network, MIRGAN)模型。采用共同注意力机制对多个特征区域的视觉特征和语义特征进行综合处理并分别生成对应于这些区域的描述。融合生成对抗网络(generative adversarial network, GAN)和强化学习(reinforcement learning, RL)方法优化生成模型的性能使其输出更高质量的报告。试验结果验证了MIRGAN模型的有效性。

关键词: 图像理解任务, 医学影像报告生成, 共同注意力机制, 生成对抗网络, 强化学习

Abstract:

The medical image report generation task based on image understanding became a widely concerned issue. Compared with the traditional image understanding task, medical image report generation was a more challenging task. We proposed a medical image report generative adversarial network (MIRGAN) model for this task. A co-attention mechanism was adopted to synthesize the visual and semantic features of multiple feature areas and generate descriptions corresponding to these areas. Combining the generative adversarial networks (GAN) and reinforcement learning (RL) optimized the performance of the generative model to output higher quality reports. The experiment results demonstrated the effectiveness of our proposed MIRGAN model.

Key words: image understanding task, medical image report generation, co-attention mechanism, generative adversarial network, reinforcement learning

中图分类号: 

  • TP391

图1

MIRGAN模型整体架构"

图2

MIRGAN的生成模型"

图3

MIRGAN的判别模型"

表1

X-Ray数据集介绍"

内容 描述
数据数量 7470
“impression”部分 总体诊断
“findings”部分 局部诊断
“tags”部分 关键字

表2

预处理后的数据统计"

数据 数量
标签 351
字典大小 2195
训练集# 6470
测试集# 500
验证集# 500

图4

不同训练策略下的MIRGAN的收敛性能"

表3

基于IU X-Ray数据集的评价结果"

模型 BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDEr
CNN-RNN[1] 0.316 0.211 0.140 0.095 0.159 0.267 0.111
LRCN[25] 0.369 0.229 0.149 0.099 0.155 0.278 0.190
AdtAtt[10] 0.369 0.226 0.151 0.108 0.171 0.323 0.155
CoAtt[2] 0.382 0.248 0.176 0.125 0.184 0.300 0.287
MIRGAN-visual attention[8] 0.381 0.246 0.171 0.119 0.190 0.327 0.304
MIRGAN-semantic attention[10] 0.372 0.232 0.169 0.120 0.187 0.311 0.298
MIRGAN 0.401 0.257 0.178 0.125 0.194 0.336 0.313

图5

胸部X射线示例图"

1 VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 3156-3164.
2 JING Baoyu, XIE Pengtao, XING ERIC. On the automatic generation of medical imaging reports[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018.
3 GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014.
4 SUTTON R S , BARTO A G . Introduction to reinforcement learning[M]. Cambridge, UK: MIT Press, 1998.
5 DENTON E L, CHINTALA S, FERGUS R. Deep generative image models using a laplacian pyramid of adversarial networks[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015.
6 LI Changliang, SU Yixin, LIU Wenju. Text-to-text generative adversarial networks[C]//2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro, Brazil: IEEE, 2018: 1-7.
7 CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and shannel-wise attention in convolutional networks for image captioning[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016.
8 XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning. Lille, France: ACM, 2015: 2048-2057.
9 YOU Quanzeng, JIN Hailin, WANG Zhaowen, et al. Image captioning with semantic attention[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4651-4659.
10 LU Jiasen, XIONG Caiming, PARIKH DEVI, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 375-383.
11 WANG Xiaosong, PENG Yifan, LU Zhiyong, et al. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays[C]//Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 9049-9058.
12 LI Y, LIANG X, HU Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[C]// Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2018: 1530-1540.
13 KISILEV P , WALACH E , BARKAN E , et al. From medical image to automatic medical report generation[J]. IBM Journal of Research and Development, 2015, 59 (2/3): 2:1- 2:7.
14 SHIN H C, ROBERTS K, LU Le, et al. Learning to read chest x-rays: recurrent neural cascade model for automated image annotation[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2497-2506.
15 ZHANG Y, GAN Z, LAWRENCE C. Generating Text via Adversarial Training[C]//Advances in Neural Information Processing Systems. Barcelona, Spain: MIT Press, 2016.
16 BACHMAN P, PRECUP D. Data generation as sequential decision making[C]//Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015: 3249-3257.
17 SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems. Denver, USA: MIT Press, 2000.
18 YU Lantao, ZHANG Weinan, WANG Jun, et al. Seqgan: sequence generative adversarial nets with policy gradient[C]//AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017.
19 KRAUSE J, JOHNSON J, KRISHNA R, et al. A hierarchical approach for generating descriptive image paragraphs[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 317-325.
20 VESELY K, GHOSHAL A, BURGET L, et al. Sequence-discriminative training of deep neural networks[C]//Interspeech. Lyon, France: IEEE, 2013: 2345-2349.
21 KIM Y. Convolutional neural networks for sentence classification[C]//Empirical Methods in Natural Language Processing. Doha, Qatar: ACL, 2014: 1746-1751.
22 LAI Siwei, XU Liheng, LIU Kang, et al. Recurrent convolutional neural networks for text classification[C]//29th AAAI Conference on Artificial Intelligence. Austin Texas, USA: AAAI, 2015.
23 DEMNER-FUSHMAN D , KOHLI M D , ROSENMAN M B , et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association, 2016, 23 (2): 304- 310.
24 HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Computer Visual on and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
25 DONAHUE J, ANNE HENDRICKS L, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015.
26 PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Grenoble, France: ACL, 2002.
27 DENKOWSKI M, LAVIE A. Meteor universal: language specific translation evaluation for any target language[C]// Proceedings of the 9th Workshop on Statistical Machine Translation. Baltimore, USA: ACL, 2014: 376-380.
28 CHIN-YEW L. Rouge: a package for automatic evaluation of summaries[C]//Proceedings of the 42th Annual Meeting on Association for Computational Linguistics. Barcelona, Spain: ACL, 2004: 74-81.
29 VEDANTAM R, LAWRENCE Z C, PARIKH D. Cider: consensus-based image description evaluation[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 4566-4575.
[1] 杨巨成,路开奎,王嫄. 基于生成对抗网络的知识蒸馏研究综述[J]. 山东大学学报 (工学版), 2025, 55(4): 56-71.
[2] 贾轩,许吉凯,任艺婧,刘德才,许强,张利. 基于样本扩容和数据驱动的台区理论线损计算方法[J]. 山东大学学报 (工学版), 2025, 55(3): 158-164.
[3] 高君健,廖祝华,刘毅志,赵肄江. 基于分层多智能体强化学习的个性化与信号控制联合路径引导方法[J]. 山东大学学报 (工学版), 2025, 55(3): 34-45.
[4] 陈兴国,吕咏洲,巩宇,陈耀雄. 基于贝叶斯优化的强化学习广义不动点解逼近[J]. 山东大学学报 (工学版), 2024, 54(4): 21-34.
[5] 曹宇慧,黄昱泽,冯北鹏,张淼,郭珍珍. 基于深度强化学习的物联网服务协同卸载方法[J]. 山东大学学报 (工学版), 2024, 54(1): 83-90.
[6] 蒋桐雨, 陈帆, 和红杰. 基于非对称U型金字塔重建的轻量级人脸超分辨率网络[J]. 山东大学学报 (工学版), 2022, 52(1): 1-8.
[7] 张月芳,邓红霞,呼春香,钱冠宇,李海芳. 融合残差块注意力机制和生成对抗网络的海马体分割[J]. 山东大学学报 (工学版), 2020, 50(6): 76-81.
[8] 李春阳,李楠,冯涛,王朱贺,马靖凯. 基于深度学习的洗衣机异常音检测[J]. 山东大学学报 (工学版), 2020, 50(2): 108-117.
[9] 常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35.
[10] 沈晶,刘海波,张汝波,吴艳霞,程晓北. 基于半马尔可夫对策的多机器人分层强化学习[J]. 山东大学学报(工学版), 2010, 40(4): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[2] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[3] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[4] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[5] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[6] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[7] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[8] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[9] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[10] 王静,李玉江,张晓瑾, 毕研俊,陈位锁 . 粉煤灰去除水中活性紫KN-B[J]. 山东大学学报(工学版), 2006, 36(6): 100 -103 .