MIRGAN: 一种基于GAN的医学影像报告生成模型

doi:10.6040/j.issn.1672-3961.0.2020.227

摘要/Abstract

摘要：

基于图像理解的医学影像报告生成任务与传统的图像理解任务相比, 是一个更加具有挑战的任务。针对该任务, 提出医学影像报告生成对抗网络(medical image report generative adversarial network, MIRGAN)模型。采用共同注意力机制对多个特征区域的视觉特征和语义特征进行综合处理并分别生成对应于这些区域的描述。融合生成对抗网络(generative adversarial network, GAN)和强化学习(reinforcement learning, RL)方法优化生成模型的性能使其输出更高质量的报告。试验结果验证了MIRGAN模型的有效性。

关键词: 图像理解任务, 医学影像报告生成, 共同注意力机制, 生成对抗网络, 强化学习

Abstract:

The medical image report generation task based on image understanding became a widely concerned issue. Compared with the traditional image understanding task, medical image report generation was a more challenging task. We proposed a medical image report generative adversarial network (MIRGAN) model for this task. A co-attention mechanism was adopted to synthesize the visual and semantic features of multiple feature areas and generate descriptions corresponding to these areas. Combining the generative adversarial networks (GAN) and reinforcement learning (RL) optimized the performance of the generative model to output higher quality reports. The experiment results demonstrated the effectiveness of our proposed MIRGAN model.

Key words: image understanding task, medical image report generation, co-attention mechanism, generative adversarial network, reinforcement learning

中图分类号:

TP391

张俊三,程俏俏,万瑶,朱杰,张世栋. MIRGAN: 一种基于GAN的医学影像报告生成模型[J]. 山东大学学报 (工学版), 2021, 51(2): 9-18.

Junsan ZHANG,Qiaoqiao CHENG,Yao WAN,Jie ZHU,Shidong ZHANG. MIRGAN: a medical image report generation model based on GAN[J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 9-18.

图/表 8

图1

图2

图3

表1

表2

图4

表3

图5

参考文献 29

1	VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 3156-3164.
2	JING Baoyu, XIE Pengtao, XING ERIC. On the automatic generation of medical imaging reports[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018.
3	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014.
4	SUTTON R S , BARTO A G . Introduction to reinforcement learning[M]. Cambridge, UK: MIT Press, 1998.
5	DENTON E L, CHINTALA S, FERGUS R. Deep generative image models using a laplacian pyramid of adversarial networks[C]//Proceedings of the Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015.
6	LI Changliang, SU Yixin, LIU Wenju. Text-to-text generative adversarial networks[C]//2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro, Brazil: IEEE, 2018: 1-7.
7	CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and shannel-wise attention in convolutional networks for image captioning[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016.
8	XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning. Lille, France: ACM, 2015: 2048-2057.
9	YOU Quanzeng, JIN Hailin, WANG Zhaowen, et al. Image captioning with semantic attention[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 4651-4659.
10	LU Jiasen, XIONG Caiming, PARIKH DEVI, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 375-383.
11	WANG Xiaosong, PENG Yifan, LU Zhiyong, et al. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays[C]//Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 9049-9058.
12	LI Y, LIANG X, HU Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[C]// Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2018: 1530-1540.
13	KISILEV P , WALACH E , BARKAN E , et al. From medical image to automatic medical report generation[J]. IBM Journal of Research and Development, 2015, 59 (2/3): 2:1- 2:7.
14	SHIN H C, ROBERTS K, LU Le, et al. Learning to read chest x-rays: recurrent neural cascade model for automated image annotation[C]//Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 2497-2506.
15	ZHANG Y, GAN Z, LAWRENCE C. Generating Text via Adversarial Training[C]//Advances in Neural Information Processing Systems. Barcelona, Spain: MIT Press, 2016.
16	BACHMAN P, PRECUP D. Data generation as sequential decision making[C]//Advances in Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015: 3249-3257.
17	SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems. Denver, USA: MIT Press, 2000.
18	YU Lantao, ZHANG Weinan, WANG Jun, et al. Seqgan: sequence generative adversarial nets with policy gradient[C]//AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017.
19	KRAUSE J, JOHNSON J, KRISHNA R, et al. A hierarchical approach for generating descriptive image paragraphs[C]//Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017: 317-325.
20	VESELY K, GHOSHAL A, BURGET L, et al. Sequence-discriminative training of deep neural networks[C]//Interspeech. Lyon, France: IEEE, 2013: 2345-2349.
21	KIM Y. Convolutional neural networks for sentence classification[C]//Empirical Methods in Natural Language Processing. Doha, Qatar: ACL, 2014: 1746-1751.
22	LAI Siwei, XU Liheng, LIU Kang, et al. Recurrent convolutional neural networks for text classification[C]//29th AAAI Conference on Artificial Intelligence. Austin Texas, USA: AAAI, 2015.
23	DEMNER-FUSHMAN D , KOHLI M D , ROSENMAN M B , et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association, 2016, 23 (2): 304- 310.
24	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Computer Visual on and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
25	DONAHUE J, ANNE HENDRICKS L, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015.
26	PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Grenoble, France: ACL, 2002.
27	DENKOWSKI M, LAVIE A. Meteor universal: language specific translation evaluation for any target language[C]// Proceedings of the 9th Workshop on Statistical Machine Translation. Baltimore, USA: ACL, 2014: 376-380.
28	CHIN-YEW L. Rouge: a package for automatic evaluation of summaries[C]//Proceedings of the 42th Annual Meeting on Association for Computational Linguistics. Barcelona, Spain: ACL, 2004: 74-81.
29	VEDANTAM R, LAWRENCE Z C, PARIKH D. Cider: consensus-based image description evaluation[C]//Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 4566-4575.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

内容	描述
数据数量	7470
“impression”部分	总体诊断
“findings”部分	局部诊断
“tags”部分	关键字

数据	数量
标签	351
字典大小	2195
训练集^#	6470
测试集^#	500
验证集^#	500

模型	BLEU-1	BLEU-2	BLEU-3	BLEU-4	METEOR	ROUGE	CIDEr
CNN-RNN^[1]	0.316	0.211	0.140	0.095	0.159	0.267	0.111
LRCN^[25]	0.369	0.229	0.149	0.099	0.155	0.278	0.190
AdtAtt^[10]	0.369	0.226	0.151	0.108	0.171	0.323	0.155
CoAtt^[2]	0.382	0.248	0.176	0.125	0.184	0.300	0.287
MIRGAN-visual attention^[8]	0.381	0.246	0.171	0.119	0.190	0.327	0.304
MIRGAN-semantic attention^[10]	0.372	0.232	0.169	0.120	0.187	0.311	0.298
MIRGAN	0.401	0.257	0.178	0.125	0.194	0.336	0.313

[1]	杨巨成,路开奎,王嫄. 基于生成对抗网络的知识蒸馏研究综述[J]. 山东大学学报 (工学版), 2025, 55(4): 56-71.
[2]	贾轩,许吉凯,任艺婧,刘德才,许强,张利. 基于样本扩容和数据驱动的台区理论线损计算方法[J]. 山东大学学报 (工学版), 2025, 55(3): 158-164.
[3]	高君健,廖祝华,刘毅志,赵肄江. 基于分层多智能体强化学习的个性化与信号控制联合路径引导方法[J]. 山东大学学报 (工学版), 2025, 55(3): 34-45.
[4]	陈兴国,吕咏洲,巩宇,陈耀雄. 基于贝叶斯优化的强化学习广义不动点解逼近[J]. 山东大学学报 (工学版), 2024, 54(4): 21-34.
[5]	曹宇慧,黄昱泽,冯北鹏,张淼,郭珍珍. 基于深度强化学习的物联网服务协同卸载方法[J]. 山东大学学报 (工学版), 2024, 54(1): 83-90.
[6]	蒋桐雨, 陈帆, 和红杰. 基于非对称U型金字塔重建的轻量级人脸超分辨率网络[J]. 山东大学学报 (工学版), 2022, 52(1): 1-8.
[7]	张月芳,邓红霞,呼春香,钱冠宇,李海芳. 融合残差块注意力机制和生成对抗网络的海马体分割[J]. 山东大学学报 (工学版), 2020, 50(6): 76-81.
[8]	李春阳,李楠,冯涛,王朱贺,马靖凯. 基于深度学习的洗衣机异常音检测[J]. 山东大学学报 (工学版), 2020, 50(2): 108-117.
[9]	常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35.
[10]	沈晶,刘海波,张汝波,吴艳霞,程晓北. 基于半马尔可夫对策的多机器人分层强化学习[J]. 山东大学学报(工学版), 2010, 40(4): 1-7.