您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2026, Vol. 56 ›› Issue (1): 1-13.doi: 10.6040/j.issn.1672-3961.0.2025.194

• 机器学习与数据挖掘 •    

多源异构医疗大数据融合与分析技术

崔立真1,2,孙晓芳1,2*,刘宁2,徐庸辉2,何伟1,2   

  1. 1.山东大学软件学院, 山东 济南 250101;2.山东大学-南洋理工大学人工智能国际联合研究院, 山东 济南 250101)
  • 发布日期:2026-02-03
  • 作者简介:崔立真(1976— ),男,甘肃兰州人,教授,博士生导师,博士,主要研究方向为软件与数据工程. E-mail:clz@sdu.edu.cn. *通信作者简介:孙晓芳(1996— ),女,山东潍坊人,博士研究生,主要研究方向为医疗大数据挖掘与分析. E-mail:xiaofangsun@mail.sdu.edu.cn
  • 基金资助:
    国家自然科学基金重大研究计划资助项目(91846205)

Multi-source heterogeneous medical big data fusion and analysis technology

CUI Lizhen1,2, SUN Xiaofang1,2*, LIU Ning2, XU Yonghui2, HE Wei1,2   

  1. CUI Lizhen1, 2, SUN Xiaofang1, 2*, LIU Ning2, XU Yonghui2, HE Wei1, 2(1. School of Software, Shandong University, Jinan 250101, Shandong, China;
    2. Joint SDU-NTU Centre for ArtificialIntelligence Research, Shandong University, Jinan 250101, Shandong, China
  • Published:2026-02-03

摘要: 医疗健康数据作为现代医学研究与实践的核心要素,具有多源异构、碎片化和利用率低等特征,难以有效挖掘蕴含的关联关系与知识价值。如何突破多源异构数据的融合瓶颈,成为实现人体健康管理从被动治疗向主动干预转型的关键挑战。聚焦医疗健康数据的核心价值与融合难题,系统梳理研究进展与技术突破方向。综合分析多模态多源异构数据融合、可解释性知识发现、跨模态关联关系挖掘,创新提出多源异构医疗大数据前沿技术体系,助力医疗健康数据体系向多模态化转型、知识图谱化升级、可解释化革新三重进化,充分释放医疗健康数据作为国家战略性资源的倍增效应。

关键词: 医疗大数据, 多源异构, 多模态融合, 数据驱动与知识引导, 可解释模型

Abstract: Healthcare data, as a core element of modern medical research and practice, was characterized by multi-source heterogeneity, fragmentation, and low utilization rate, making it difficult to effectively uncover the implicit correlations and knowledge value. Overcoming the integration challenges of multi-source heterogeneous data became a key obstacle in shifting health management from passive treatment to proactive intervention. Focusing on the core value and integration difficulties of healthcare data, the research progress and technological breakthroughs were systematically reviewed. Through comprehensive analysis of multimodal and multi-source heterogeneous data fusion, interpretable knowledge discovery, and cross-modal correlation mining, an advanced technological framework for multi-source heterogeneous medical big data was innovatively proposed. This framework supported the triple evolution of healthcare data systems toward multi-modal transformation, knowledge graph upgrading, and interpretability innovation, thereby fully unleashing the multiplier effect of healthcare data as a national strategic resource.

Key words: medical big data, multi-source heterogeneity, multimodal fusion, data driven and knowledge guided, interpretable model

中图分类号: 

  • TP391
[1] 中共中央, 国务院. 关于构建数据基础制度更好发挥数据要素作用的意见[R/OL].(2022-12-19)[2025-09-05].http://www.gov.cn/zhengce/2022-12/19/content_5732698.htm
[2] 国家数据局, 中央网信办, 科技部, 等. “数据要素×”三年行动计划(2024—2026年)[R/OL].(2024-01-04)[2025-09-05].http://www.gov.cn/zhengce/zhengceku/202401/04/content_6925556.htm
[3] 张振, 周毅, 杜守洪, 等. 医疗大数据及其面临的机遇与挑战[J]. 医学信息学杂志, 2014, 35(6): 2-8. ZHANG Zhen, ZHOU Yi, DU Shouhong, et al. Medical big data and the facing opportunities and challenges[J]. Journal of Medical Informatics, 2014, 35(6): 2-8.
[4] 曾汪旺, 谢颖夫, 胡光阔. 多源异构数据整合系统在医疗大数据中的应用[J]. 价值工程, 2017, 36(8): 80-82. ZENG Wangwang, XIE Yingfu, HU Guangkuo. Application of multi-source heterogeneous data integration system in large medical data[J]. Value Engineering, 2017, 36(8): 80-82.
[5] 刘震, 王文桥. 基于区块链的医疗信息共享平台设计与实现[J]. 医疗卫生装备, 2020, 41(8): 36-39. LIU Zhen, WANG Wenqiao. Design and implementation of medical information sharing platform based on blockchain[J]. Chinese Medical Equipment Journal, 2020, 41(8): 36-39.
[6] AMINIZADEH S, HEIDARI A, DEHGHAN M, et al. Opportunities and challenges of artificial intelligence and distributed systems to improve the quality of healthcare service[J]. Artificial Intelligence in Medicine, 2024, 149: 102779.
[7] JENSEN P B, JENSEN L J, BRUNAK S. Mining electronic health records: towards better research applications and clinical care[J]. Nature Reviews Genetics, 2012, 13(6): 395-405.
[8] KADER F, NOYES P, IRUKA I U, et al. Addressing maternal and infant health inequities requires improved birth record data collection[J]. Nature Medicine, 2025, 31(2): 358-359.
[9] RAO V M, HLA M, MOOR M, et al. Multimodal generative AI for medical image interpretation[J]. Nature, 2025, 639(8056): 888-896.
[10] BEN-MILED Z, SHEBESH J A, SU J, et al. Multi-modal fusion of routine care electronic health records(EHR): a scoping review[J]. Information, 2025, 16(1): 54.
[11] CAVUTO M L, MALPARTIDA-CARDENAS K, PENNISI I, et al. Portable molecular diagnostic platform for rapid point-of-care detection ofmpox and other diseases[J]. Nature Communications, 2025, 16: 2875.
[12] HU T, KE X X, YU Y Y, et al. NAPTUNE: nucleic acids and protein biomarkers testing via ultra-sensitive nucleases escalation[J]. Nature Communications, 2025, 16(1): 1331.
[13] 陈驰华, 周婷, 廖凯兵. CT影像特征及影像组学在肺部淋巴瘤与肺浸润性黏液腺癌诊断中的应用[J]. 中国CT和MRI杂志, 2023, 21(9): 82-85. CHEN Chihua, ZHOU Ting, LIAO Kaibing. Application of CT imaging features and radiomics features in the differential diagnosis of pulmonary lymphoma and pulmonary invasive mucinous adenocarcinoma[J]. Chinese Journal of CT and MRI, 2023, 21(9): 82-85.
[14] JIAO C N, GAO Y L, GE D H, et al. Multi-modal imaging genetics data fusion by deep auto-encoder and self-representation network for Alzheimer's disease diagnosis and biomarkers extraction[J]. Engineering Applications of Artificial Intelligence, 2024, 130:107782.
[15] WANG M Y, FAN S Y, LI Y C, et al. Missing-modality enabled multi-modal fusion architecture for medical data[J]. Journal of Biomedical Informatics, 2025, 164: 104796.
[16] MAHATO K, SAHA T, DING S C, et al. Hybrid multimodal wearable sensors for comprehensive health monitoring[J]. Nature Electronics, 2024, 7(9): 735-750.
[17] 曹益铭. 多模态数据与知识双驱动的医学诊断报告生成关键技术研究[D]. 济南: 山东大学, 2023: 71-89. CAO Yiming. Research on key technologies for medical diagnosis report generation driven by multi-modal data and knowledge[D]. Jinan: Shandong University, 2023: 71-89.
[18] DAI S Y, YE K, ZHAN C, et al. SIN-Seg: a joint spatial-spectral information fusion model for medical image segmentation[J]. Computational and Structural Biotechnology Journal, 2025, 27: 744-752.
[19] 李立柱, 孟明, 高云园, 等. 基于小波变换的EEG-fNIRS多模态数据融合方法[J]. 传感技术学报, 2023, 36(7): 1064-1072. LI Lizhu, MENG Ming, GAO Yunyuan, et al. A data fusion method for hybrid EEG-fNIRS BCI base on wavelet transform[J]. Chinese Journal of Sensors and Actuators, 2023, 36(7): 1064-1072.
[20] RELE M, JULIAN A, PATIL D, et al. Multimodal data fusion integrating text and medical imaging data in electronic health records[C] //Innovations and Advances in Cognitive Systems. Cham, Switzerland: Springer, 2024: 348-360.
[21] 高泽文,王建,魏本征. 基于混合偏移轴向自注意力机制的脑胶质瘤分割算法[J]. 山东大学学报(工学版), 2024, 54(2): 80-89. GAO Zewen, WANG Jian, WEI Benzheng. Glioma segmentation algorithm based on hybrid offset axial self-attention mechanism[J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 80-89.
[22] 张贺贺. 融合弱监督和动量蒸馏的多模态预训练模型研究[D]. 徐州: 中国矿业大学, 2023: 34-47. ZHANG Hehe. Multimodal pretraining model with weak supervision and momentum distillation[D]. Xuzhou: China University of Mining and Technology, 2023: 34-47.
[23] 李伟豪,王苹苹,许万博,等. 结构先验引导的多模态腰MRI图像分割算法[J]. 山东大学学报(工学版), 2025, 55(1): 66-76. LI Weihao, WANG Pingping, XU Wanbo, et al. Multimodal lumbar MRI image segmentation algorithm guided by structure priori[J]. Journal of Shandong University(Engineering Science), 2025, 55(1): 66-76.
[24] WANG Y L, YIN C C, ZHANG P. Multimodal risk prediction with physiological signals, medical images and clinical notes[J]. Heliyon, 2024, 10(5): e26772.
[25] SHAKIR M A, ABASS H K, JELWY O F, et al. Developing interpretable models for complex decision-making[C] //2024 36th Conference of Open Innovations Association(FRUCT). Lappeenranta, Finland: IEEE, 2024: 66-75.
[26] ZHANG Y Y, FANG Q, QIAN S S, et al. Multi-modal multi-relational feature aggregation network for medical knowledge representation learning[C] //Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 3956-3965.
[27] JI S X, PAN S R, CAMBRIA E, et al. A survey on knowledge graphs: representation, acquisition, and applications[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514.
[28] 何鹏, 姚瑶, 刘秋菊. 时态知识图谱表示学习研究综述[J]. 计算机工程与应用, 2025, 61(14): 37-53. HE Peng, YAO Yao, LIU Qiuju. Survey on temporal knowledge graph representation learning[J]. Computer Engineering and Applications, 2025, 61(14): 37-53.
[29] 叶青, 刘丹红, 杨喆, 等. 临床指南模型构建中医学知识的规范化表达方法[J]. 中国数字医学, 2011, 6(8): 14-17. YE Qing, LIU Danhong, YANG Zhe, et al. Representation of medical concepts in computer-interpretable guideline modeling[J]. China Digital Medicine, 2011, 6(8): 14-17.
[30] WANG Z J, LONG H Y, YU H, et al. DKGC-LSTM: fusion of domain knowledge to guide CNN and LSTM for heart failure risk prediction[C] //2024 IEEE Inter-national Conference on Medical Artificial Intelligence(MedAI). Chongqing, China: IEEE, 2024: 552-557.
[31] LEE S, HAN Y J, PARK H J, et al. Entity-enhanced BERT for medical specialty prediction based on clinical questionnaire data[J].Plos One, 2025, 20(1): e0317795.
[32] 王钰涵, 马涪元, 王英. 基于深度学习的细粒度医学知识图谱构建[J]. 计算机科学, 2024, 51(增刊2): 230900157. WANG Yuhan, MA Fuyuan, WANG Ying. Con-struction of fine-grained medical knowledge graph based on deep learning[J]. Computer Science, 2024, 51(Suppl.2): 230900157.
[33] LI M M, HUANG K X, ZITNIK M. Graph representation learning in biomedicine and healthcare[J]. Nature Biomedical Engineering, 2022, 6(12): 1353-1369.
[34] ZHANG Y Y, WU X, FANG Q, et al. Knowledge-enhanced attributed multi-task learning for medicine recommendation[J]. ACM Transactions on Information Systems, 2023, 41(1): 1-24.
[35] YANG P R, WANG H J, HUANG Y Z, et al. LMKG: a large-scale and multi-source medical knowledge graph for intelligent medicine applications[J]. Knowledge-Based Systems, 2024, 284: 111323.
[36] RIBEIRO M T, SINGH S, GUESTRIN C. "Why should I trust you?": explaining the predictions of any classifier[C] //Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016: 1135-1144.
[37] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C] //Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM, 2017: 4768-4777.
[38] WANG F Y, ZHOU Y Y, WANG S J, et al. Multi-granularity cross-modal alignment for generalized medical visual representation learning[EB/OL].(2022-10-12)[2025-09-05]. https://doi.org/10.48550/arXiv.2210.06044
[39] MAO L C, WANG H R, HU L S, et al. Knowledge-informed machine learning for cancer diagnosis and prognosis: a review[J]. IEEE Transactions on Automation Science and Engineering, 2025, 22: 10008-10028.
[40] LIANG K, MENG L Y, LIU M, et al. A survey of knowledge graph reasoning on graph types: static, dynamic, and multi-modal[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 9456-9478.
[41] BYEON H, RAINA V, SANDHU M, et al. Artificial intelligence-enabled deep learning model for multimodal biometric fusion[J]. Multimedia Tools and Applications, 2024, 83(33): 80105-80128.
[42] 盛西方, 赵俊东, 陶青川, 等. MedRelNet: 基于关系融合的中文医学文本实体关系联合抽取模型[J]. 现代计算机, 2024, 30(22): 49-54. SHENG Xifang, ZHAO Jundong, TAO Qingchuan, et al. MedRelNet: a joint model for entity-relation extraction in Chinese medical text based on relation fusion[J]. Modern Computer, 2024, 30(22): 49-54.
[43] ZHANG X D, SHI Y Z, JI J Z, et al. MEPNet: medical entity-balanced prompting network for brain CT report generation[EB/OL].(2025-03-22)[2025-09-05]. https://arxiv.org/abs/2503.17784
[44] WINTER M, LANGGUTH B, SCHLEE W, et al. Process mining in mHealth data analysis[J]. NPJ Digital Medicine, 2024, 7: 299.
[45] ZHANG F, GUO T T, WANG H. DFNet: decompo-sition fusion model for long sequence time-series forecasting[J]. Knowledge-Based Systems, 2023, 277: 110794.
[46] 隗昊, 唐焕玲, 周爱, 等. 基于双路分段注意力神经张量网络的临床文本关系抽取[J]. 电子学报, 2023, 51(3): 658-665. WEI Hao, TANG Huanling, ZHOU Ai, et al. Clinical relation extraction via dual piecewise attention neural tensor network[J]. Acta Electronica Sinica, 2023, 51(3): 658-665.
[47] POULAIN R, BEHESHTI R. Graph Transformers on EHRs: better representation improves downstream performance[C] //The Twelfth International Conference on Learning Representations. Vienna, Austria: [s.n.] , 2024: 8705.
[48] WANG Z F, WU Z B, AGARWAL D, et al. MedCLIP: contrastive learning from unpaired medical images and text[C] //Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing(EMNLP). Abu Dhabi, United Arab Emirates: ACL, 2022: 3876-3887.
[49] ALSENTZER E, MURPHY J R, BOAG W, et al. Publicly available clinical BERT embeddings[C] //Proceedings of the 2nd Clinical Natural Language Processing Workshop(ClinicalNLP). Minneapolis, USA: ACL, 2019: 72-78.
[50] CHEN J Y, YIN C C, WANG Y L, et al. Predictive modeling with temporal graphical representation on electronic health records[EB/OL].(2024-05-07)[2025-09-05]. https://arxiv.org/abs/2405.03943
[51] MENG H F, LIN Z Q, YANG F, et al. Knowledge distillation in medical data mining: a survey[C] //5th International Conference on Crowd Science and Engineering. Jinan, China: ACM, 2022: 175-182.
[52] GARG M, KARPINSKI M, MATELSKA D, et al.Disease prediction with multi-omics and biomarkers empowers case: control genetic discoveries in the UK Biobank[J]. Nature Genetics, 2024, 56(9): 1821-1831.
[53] RAHMAN A U, ALSENANI Y, ZAFAR A, et al. Enhancing heart disease prediction using a self-attention-based Transformer model[J]. Scientific Reports, 2024, 14: 514.
[54] DEGROAT W, ABDELHALIM H, PEKER E, et al. Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases[J]. Scientific Reports, 2024, 14: 26503.
[55] REALE-NOSEI G, AMADOR-DOMÍNGUEZ E, SERRANO E. From vision to text: a comprehensive review of natural image captioning in medical diagnosis and radiology report generation[J]. Medical Image Analysis, 2024, 97: 103264.
[56] YANG Y F, LIU X Y, JIN Q, et al. Unmasking and quantifying racial bias of large language models in medical report generation[J]. Communications Medi-cine, 2024, 4: 176.
[57] WANG J Z, WANG K, YU Y F, et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications[J]. Nature Medi-cine, 2025, 31(2): 609-617.
[58] ZHU F L, CUI L Z, XU Y H, et al. A survey of personalized medicine recommendation[J]. International Journal of Crowd Science, 2024, 8(2): 77-82.
[59] LIN X J, LI Y, XU Y H, et al. Personalized clinical pathway recommendation via attention based pre-training[C] //2021 IEEE International Conference on Bioinformatics and Biomedicine(BIBM). Houston, USA: IEEE, 2021: 980-987.
[60] MOGHTADERI S, EINLOU M, WAHID K A, et al. Advancing multimodal medical image fusion: an adaptive image decomposition approach based on multilevel guided filtering[J]. Royal Society Open Science, 2024, 11(4): rsos.231762.
[61] WAN S, GUAN S H, TANG Y C. Advancing bridge structural health monitoring: insights into knowledge-driven and data-driven approaches[J]. Journal of Data Science and Intelligent Systems, 2024, 2(3): 129-140.
[62] HASSIJA V, CHAMOLA V, MAHAPATRA A, et al. Interpreting black-box models: a review on explainable artificial intelligence[J]. Cognitive Computation, 2024, 16(1): 45-74.
[63] CHEN H R, ZHANG S X, ZHANG L Z, et al. Multi role ChatGPT framework for transforming medical data analysis[J]. Scientific Reports, 2024, 14: 13930.
[64] CAO Y M, CUI L Z, ZHANG L, et al. MMTN: multi-modal memory Transformer network for image-report consistent medical report generation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1): 277-285.
[65] CAO Y M, CUI L Z, ZHANG L, et al. CMT: cross-modal memory Transformer for medical image report generation[C] //Database Systems for Advanced Applications. Cham, Switzerland: Springer, 2023: 415-424.
[66] CAO Y M, LI Z, CUI L Z, et al. Adaptive human-LLMs interaction collaboration: reinforcement learning driven vision-language models for medical report generation[C] //Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. Yokohama, Japan: ACM, 2025: 1-6.
[67] QU Z, CUI L Z, XU Y H. Disease risk prediction via heterogeneous graph attention networks[C] //2022 IEEE International Conference on Bioinformatics and Biomedicine(BIBM). Las Vegas, USA: IEEE, 2023: 3385-3390.
[68] GE W, GUO W, CUI L Z, et al. Detection of wrong disease information using knowledge-based embedding and attention[C] // Database Systems for Advanced Applications. Cham, Switzerland: Springer, 2020: 459-473.
[69] CAO Y M, CUI L Z, ZHANG L, et al. KdINet: knowledge-driven interpretable network for medical imaging diagnosis[C] //2022 IEEE International Confe-rence on Bioinformatics and Biomedicine(BIBM). Las Vegas, USA: IEEE, 2023: 1457-1460.
[70] GUO W, GE W, CUI L Z, et al. An interpretable disease onset predictive model using crossover attention mechanism from electronic health records[J]. IEEE Access, 2019, 7: 134236-134244.
[71] YU F Q, CUI L Z, CAO Y M, et al. Feature-guided logical perception network for health risk prediction[C] //2022 IEEE International Conference on Bioinformatics and Biomedicine(BIBM). Las Vegas, USA: IEEE, 2023: 1787-1790.
[1] 聂秀山,巩蕊,董飞,郭杰,马玉玲. 短视频场景分类方法综述[J]. 山东大学学报 (工学版), 2024, 54(3): 1-11.
[2] 杨霄,袭肖明,李维翠,杨璐. 基于层次化双重注意力网络的乳腺多模态图像分类[J]. 山东大学学报 (工学版), 2022, 52(3): 34-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!