Journal of Shandong University(Engineering Science) ›› 2026, Vol. 56 ›› Issue (1): 1-13.doi: 10.6040/j.issn.1672-3961.0.2025.194

• Machine Learning & Data Mining •    

Multi-source heterogeneous medical big data fusion and analysis technology

CUI Lizhen1,2, SUN Xiaofang1,2*, LIU Ning2, XU Yonghui2, HE Wei1,2   

  1. CUI Lizhen1, 2, SUN Xiaofang1, 2*, LIU Ning2, XU Yonghui2, HE Wei1, 2(1. School of Software, Shandong University, Jinan 250101, Shandong, China;
    2. Joint SDU-NTU Centre for ArtificialIntelligence Research, Shandong University, Jinan 250101, Shandong, China
  • Published:2026-02-03

Abstract: Healthcare data, as a core element of modern medical research and practice, was characterized by multi-source heterogeneity, fragmentation, and low utilization rate, making it difficult to effectively uncover the implicit correlations and knowledge value. Overcoming the integration challenges of multi-source heterogeneous data became a key obstacle in shifting health management from passive treatment to proactive intervention. Focusing on the core value and integration difficulties of healthcare data, the research progress and technological breakthroughs were systematically reviewed. Through comprehensive analysis of multimodal and multi-source heterogeneous data fusion, interpretable knowledge discovery, and cross-modal correlation mining, an advanced technological framework for multi-source heterogeneous medical big data was innovatively proposed. This framework supported the triple evolution of healthcare data systems toward multi-modal transformation, knowledge graph upgrading, and interpretability innovation, thereby fully unleashing the multiplier effect of healthcare data as a national strategic resource.

Key words: medical big data, multi-source heterogeneity, multimodal fusion, data driven and knowledge guided, interpretable model

CLC Number: 

  • TP391
[1] 中共中央, 国务院. 关于构建数据基础制度更好发挥数据要素作用的意见[R/OL].(2022-12-19)[2025-09-05].http://www.gov.cn/zhengce/2022-12/19/content_5732698.htm
[2] 国家数据局, 中央网信办, 科技部, 等. “数据要素×”三年行动计划(2024—2026年)[R/OL].(2024-01-04)[2025-09-05].http://www.gov.cn/zhengce/zhengceku/202401/04/content_6925556.htm
[3] 张振, 周毅, 杜守洪, 等. 医疗大数据及其面临的机遇与挑战[J]. 医学信息学杂志, 2014, 35(6): 2-8. ZHANG Zhen, ZHOU Yi, DU Shouhong, et al. Medical big data and the facing opportunities and challenges[J]. Journal of Medical Informatics, 2014, 35(6): 2-8.
[4] 曾汪旺, 谢颖夫, 胡光阔. 多源异构数据整合系统在医疗大数据中的应用[J]. 价值工程, 2017, 36(8): 80-82. ZENG Wangwang, XIE Yingfu, HU Guangkuo. Application of multi-source heterogeneous data integration system in large medical data[J]. Value Engineering, 2017, 36(8): 80-82.
[5] 刘震, 王文桥. 基于区块链的医疗信息共享平台设计与实现[J]. 医疗卫生装备, 2020, 41(8): 36-39. LIU Zhen, WANG Wenqiao. Design and implementation of medical information sharing platform based on blockchain[J]. Chinese Medical Equipment Journal, 2020, 41(8): 36-39.
[6] AMINIZADEH S, HEIDARI A, DEHGHAN M, et al. Opportunities and challenges of artificial intelligence and distributed systems to improve the quality of healthcare service[J]. Artificial Intelligence in Medicine, 2024, 149: 102779.
[7] JENSEN P B, JENSEN L J, BRUNAK S. Mining electronic health records: towards better research applications and clinical care[J]. Nature Reviews Genetics, 2012, 13(6): 395-405.
[8] KADER F, NOYES P, IRUKA I U, et al. Addressing maternal and infant health inequities requires improved birth record data collection[J]. Nature Medicine, 2025, 31(2): 358-359.
[9] RAO V M, HLA M, MOOR M, et al. Multimodal generative AI for medical image interpretation[J]. Nature, 2025, 639(8056): 888-896.
[10] BEN-MILED Z, SHEBESH J A, SU J, et al. Multi-modal fusion of routine care electronic health records(EHR): a scoping review[J]. Information, 2025, 16(1): 54.
[11] CAVUTO M L, MALPARTIDA-CARDENAS K, PENNISI I, et al. Portable molecular diagnostic platform for rapid point-of-care detection ofmpox and other diseases[J]. Nature Communications, 2025, 16: 2875.
[12] HU T, KE X X, YU Y Y, et al. NAPTUNE: nucleic acids and protein biomarkers testing via ultra-sensitive nucleases escalation[J]. Nature Communications, 2025, 16(1): 1331.
[13] 陈驰华, 周婷, 廖凯兵. CT影像特征及影像组学在肺部淋巴瘤与肺浸润性黏液腺癌诊断中的应用[J]. 中国CT和MRI杂志, 2023, 21(9): 82-85. CHEN Chihua, ZHOU Ting, LIAO Kaibing. Application of CT imaging features and radiomics features in the differential diagnosis of pulmonary lymphoma and pulmonary invasive mucinous adenocarcinoma[J]. Chinese Journal of CT and MRI, 2023, 21(9): 82-85.
[14] JIAO C N, GAO Y L, GE D H, et al. Multi-modal imaging genetics data fusion by deep auto-encoder and self-representation network for Alzheimer's disease diagnosis and biomarkers extraction[J]. Engineering Applications of Artificial Intelligence, 2024, 130:107782.
[15] WANG M Y, FAN S Y, LI Y C, et al. Missing-modality enabled multi-modal fusion architecture for medical data[J]. Journal of Biomedical Informatics, 2025, 164: 104796.
[16] MAHATO K, SAHA T, DING S C, et al. Hybrid multimodal wearable sensors for comprehensive health monitoring[J]. Nature Electronics, 2024, 7(9): 735-750.
[17] 曹益铭. 多模态数据与知识双驱动的医学诊断报告生成关键技术研究[D]. 济南: 山东大学, 2023: 71-89. CAO Yiming. Research on key technologies for medical diagnosis report generation driven by multi-modal data and knowledge[D]. Jinan: Shandong University, 2023: 71-89.
[18] DAI S Y, YE K, ZHAN C, et al. SIN-Seg: a joint spatial-spectral information fusion model for medical image segmentation[J]. Computational and Structural Biotechnology Journal, 2025, 27: 744-752.
[19] 李立柱, 孟明, 高云园, 等. 基于小波变换的EEG-fNIRS多模态数据融合方法[J]. 传感技术学报, 2023, 36(7): 1064-1072. LI Lizhu, MENG Ming, GAO Yunyuan, et al. A data fusion method for hybrid EEG-fNIRS BCI base on wavelet transform[J]. Chinese Journal of Sensors and Actuators, 2023, 36(7): 1064-1072.
[20] RELE M, JULIAN A, PATIL D, et al. Multimodal data fusion integrating text and medical imaging data in electronic health records[C] //Innovations and Advances in Cognitive Systems. Cham, Switzerland: Springer, 2024: 348-360.
[21] 高泽文,王建,魏本征. 基于混合偏移轴向自注意力机制的脑胶质瘤分割算法[J]. 山东大学学报(工学版), 2024, 54(2): 80-89. GAO Zewen, WANG Jian, WEI Benzheng. Glioma segmentation algorithm based on hybrid offset axial self-attention mechanism[J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 80-89.
[22] 张贺贺. 融合弱监督和动量蒸馏的多模态预训练模型研究[D]. 徐州: 中国矿业大学, 2023: 34-47. ZHANG Hehe. Multimodal pretraining model with weak supervision and momentum distillation[D]. Xuzhou: China University of Mining and Technology, 2023: 34-47.
[23] 李伟豪,王苹苹,许万博,等. 结构先验引导的多模态腰MRI图像分割算法[J]. 山东大学学报(工学版), 2025, 55(1): 66-76. LI Weihao, WANG Pingping, XU Wanbo, et al. Multimodal lumbar MRI image segmentation algorithm guided by structure priori[J]. Journal of Shandong University(Engineering Science), 2025, 55(1): 66-76.
[24] WANG Y L, YIN C C, ZHANG P. Multimodal risk prediction with physiological signals, medical images and clinical notes[J]. Heliyon, 2024, 10(5): e26772.
[25] SHAKIR M A, ABASS H K, JELWY O F, et al. Developing interpretable models for complex decision-making[C] //2024 36th Conference of Open Innovations Association(FRUCT). Lappeenranta, Finland: IEEE, 2024: 66-75.
[26] ZHANG Y Y, FANG Q, QIAN S S, et al. Multi-modal multi-relational feature aggregation network for medical knowledge representation learning[C] //Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 3956-3965.
[27] JI S X, PAN S R, CAMBRIA E, et al. A survey on knowledge graphs: representation, acquisition, and applications[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514.
[28] 何鹏, 姚瑶, 刘秋菊. 时态知识图谱表示学习研究综述[J]. 计算机工程与应用, 2025, 61(14): 37-53. HE Peng, YAO Yao, LIU Qiuju. Survey on temporal knowledge graph representation learning[J]. Computer Engineering and Applications, 2025, 61(14): 37-53.
[29] 叶青, 刘丹红, 杨喆, 等. 临床指南模型构建中医学知识的规范化表达方法[J]. 中国数字医学, 2011, 6(8): 14-17. YE Qing, LIU Danhong, YANG Zhe, et al. Representation of medical concepts in computer-interpretable guideline modeling[J]. China Digital Medicine, 2011, 6(8): 14-17.
[30] WANG Z J, LONG H Y, YU H, et al. DKGC-LSTM: fusion of domain knowledge to guide CNN and LSTM for heart failure risk prediction[C] //2024 IEEE Inter-national Conference on Medical Artificial Intelligence(MedAI). Chongqing, China: IEEE, 2024: 552-557.
[31] LEE S, HAN Y J, PARK H J, et al. Entity-enhanced BERT for medical specialty prediction based on clinical questionnaire data[J].Plos One, 2025, 20(1): e0317795.
[32] 王钰涵, 马涪元, 王英. 基于深度学习的细粒度医学知识图谱构建[J]. 计算机科学, 2024, 51(增刊2): 230900157. WANG Yuhan, MA Fuyuan, WANG Ying. Con-struction of fine-grained medical knowledge graph based on deep learning[J]. Computer Science, 2024, 51(Suppl.2): 230900157.
[33] LI M M, HUANG K X, ZITNIK M. Graph representation learning in biomedicine and healthcare[J]. Nature Biomedical Engineering, 2022, 6(12): 1353-1369.
[34] ZHANG Y Y, WU X, FANG Q, et al. Knowledge-enhanced attributed multi-task learning for medicine recommendation[J]. ACM Transactions on Information Systems, 2023, 41(1): 1-24.
[35] YANG P R, WANG H J, HUANG Y Z, et al. LMKG: a large-scale and multi-source medical knowledge graph for intelligent medicine applications[J]. Knowledge-Based Systems, 2024, 284: 111323.
[36] RIBEIRO M T, SINGH S, GUESTRIN C. "Why should I trust you?": explaining the predictions of any classifier[C] //Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016: 1135-1144.
[37] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C] //Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM, 2017: 4768-4777.
[38] WANG F Y, ZHOU Y Y, WANG S J, et al. Multi-granularity cross-modal alignment for generalized medical visual representation learning[EB/OL].(2022-10-12)[2025-09-05]. https://doi.org/10.48550/arXiv.2210.06044
[39] MAO L C, WANG H R, HU L S, et al. Knowledge-informed machine learning for cancer diagnosis and prognosis: a review[J]. IEEE Transactions on Automation Science and Engineering, 2025, 22: 10008-10028.
[40] LIANG K, MENG L Y, LIU M, et al. A survey of knowledge graph reasoning on graph types: static, dynamic, and multi-modal[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 9456-9478.
[41] BYEON H, RAINA V, SANDHU M, et al. Artificial intelligence-enabled deep learning model for multimodal biometric fusion[J]. Multimedia Tools and Applications, 2024, 83(33): 80105-80128.
[42] 盛西方, 赵俊东, 陶青川, 等. MedRelNet: 基于关系融合的中文医学文本实体关系联合抽取模型[J]. 现代计算机, 2024, 30(22): 49-54. SHENG Xifang, ZHAO Jundong, TAO Qingchuan, et al. MedRelNet: a joint model for entity-relation extraction in Chinese medical text based on relation fusion[J]. Modern Computer, 2024, 30(22): 49-54.
[43] ZHANG X D, SHI Y Z, JI J Z, et al. MEPNet: medical entity-balanced prompting network for brain CT report generation[EB/OL].(2025-03-22)[2025-09-05]. https://arxiv.org/abs/2503.17784
[44] WINTER M, LANGGUTH B, SCHLEE W, et al. Process mining in mHealth data analysis[J]. NPJ Digital Medicine, 2024, 7: 299.
[45] ZHANG F, GUO T T, WANG H. DFNet: decompo-sition fusion model for long sequence time-series forecasting[J]. Knowledge-Based Systems, 2023, 277: 110794.
[46] 隗昊, 唐焕玲, 周爱, 等. 基于双路分段注意力神经张量网络的临床文本关系抽取[J]. 电子学报, 2023, 51(3): 658-665. WEI Hao, TANG Huanling, ZHOU Ai, et al. Clinical relation extraction via dual piecewise attention neural tensor network[J]. Acta Electronica Sinica, 2023, 51(3): 658-665.
[47] POULAIN R, BEHESHTI R. Graph Transformers on EHRs: better representation improves downstream performance[C] //The Twelfth International Conference on Learning Representations. Vienna, Austria: [s.n.] , 2024: 8705.
[48] WANG Z F, WU Z B, AGARWAL D, et al. MedCLIP: contrastive learning from unpaired medical images and text[C] //Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing(EMNLP). Abu Dhabi, United Arab Emirates: ACL, 2022: 3876-3887.
[49] ALSENTZER E, MURPHY J R, BOAG W, et al. Publicly available clinical BERT embeddings[C] //Proceedings of the 2nd Clinical Natural Language Processing Workshop(ClinicalNLP). Minneapolis, USA: ACL, 2019: 72-78.
[50] CHEN J Y, YIN C C, WANG Y L, et al. Predictive modeling with temporal graphical representation on electronic health records[EB/OL].(2024-05-07)[2025-09-05]. https://arxiv.org/abs/2405.03943
[51] MENG H F, LIN Z Q, YANG F, et al. Knowledge distillation in medical data mining: a survey[C] //5th International Conference on Crowd Science and Engineering. Jinan, China: ACM, 2022: 175-182.
[52] GARG M, KARPINSKI M, MATELSKA D, et al.Disease prediction with multi-omics and biomarkers empowers case: control genetic discoveries in the UK Biobank[J]. Nature Genetics, 2024, 56(9): 1821-1831.
[53] RAHMAN A U, ALSENANI Y, ZAFAR A, et al. Enhancing heart disease prediction using a self-attention-based Transformer model[J]. Scientific Reports, 2024, 14: 514.
[54] DEGROAT W, ABDELHALIM H, PEKER E, et al. Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases[J]. Scientific Reports, 2024, 14: 26503.
[55] REALE-NOSEI G, AMADOR-DOMÍNGUEZ E, SERRANO E. From vision to text: a comprehensive review of natural image captioning in medical diagnosis and radiology report generation[J]. Medical Image Analysis, 2024, 97: 103264.
[56] YANG Y F, LIU X Y, JIN Q, et al. Unmasking and quantifying racial bias of large language models in medical report generation[J]. Communications Medi-cine, 2024, 4: 176.
[57] WANG J Z, WANG K, YU Y F, et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications[J]. Nature Medi-cine, 2025, 31(2): 609-617.
[58] ZHU F L, CUI L Z, XU Y H, et al. A survey of personalized medicine recommendation[J]. International Journal of Crowd Science, 2024, 8(2): 77-82.
[59] LIN X J, LI Y, XU Y H, et al. Personalized clinical pathway recommendation via attention based pre-training[C] //2021 IEEE International Conference on Bioinformatics and Biomedicine(BIBM). Houston, USA: IEEE, 2021: 980-987.
[60] MOGHTADERI S, EINLOU M, WAHID K A, et al. Advancing multimodal medical image fusion: an adaptive image decomposition approach based on multilevel guided filtering[J]. Royal Society Open Science, 2024, 11(4): rsos.231762.
[61] WAN S, GUAN S H, TANG Y C. Advancing bridge structural health monitoring: insights into knowledge-driven and data-driven approaches[J]. Journal of Data Science and Intelligent Systems, 2024, 2(3): 129-140.
[62] HASSIJA V, CHAMOLA V, MAHAPATRA A, et al. Interpreting black-box models: a review on explainable artificial intelligence[J]. Cognitive Computation, 2024, 16(1): 45-74.
[63] CHEN H R, ZHANG S X, ZHANG L Z, et al. Multi role ChatGPT framework for transforming medical data analysis[J]. Scientific Reports, 2024, 14: 13930.
[64] CAO Y M, CUI L Z, ZHANG L, et al. MMTN: multi-modal memory Transformer network for image-report consistent medical report generation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1): 277-285.
[65] CAO Y M, CUI L Z, ZHANG L, et al. CMT: cross-modal memory Transformer for medical image report generation[C] //Database Systems for Advanced Applications. Cham, Switzerland: Springer, 2023: 415-424.
[66] CAO Y M, LI Z, CUI L Z, et al. Adaptive human-LLMs interaction collaboration: reinforcement learning driven vision-language models for medical report generation[C] //Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. Yokohama, Japan: ACM, 2025: 1-6.
[67] QU Z, CUI L Z, XU Y H. Disease risk prediction via heterogeneous graph attention networks[C] //2022 IEEE International Conference on Bioinformatics and Biomedicine(BIBM). Las Vegas, USA: IEEE, 2023: 3385-3390.
[68] GE W, GUO W, CUI L Z, et al. Detection of wrong disease information using knowledge-based embedding and attention[C] // Database Systems for Advanced Applications. Cham, Switzerland: Springer, 2020: 459-473.
[69] CAO Y M, CUI L Z, ZHANG L, et al. KdINet: knowledge-driven interpretable network for medical imaging diagnosis[C] //2022 IEEE International Confe-rence on Bioinformatics and Biomedicine(BIBM). Las Vegas, USA: IEEE, 2023: 1457-1460.
[70] GUO W, GE W, CUI L Z, et al. An interpretable disease onset predictive model using crossover attention mechanism from electronic health records[J]. IEEE Access, 2019, 7: 134236-134244.
[71] YU F Q, CUI L Z, CAO Y M, et al. Feature-guided logical perception network for health risk prediction[C] //2022 IEEE International Conference on Bioinformatics and Biomedicine(BIBM). Las Vegas, USA: IEEE, 2023: 1787-1790.
[1] DENG Bin, ZHANG Zongbao, ZHAO Wenmeng, LUO Xinhang, WU Qiuwei. Cloud-edge collaborative and graph neural network based load forecasting method for electric vehicle charging stations [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 62-69.
[2] LI Erchao, ZHANG Zhizhao. Online dynamic demand vehicle routing planning [J]. Journal of Shandong University(Engineering Science), 2024, 54(5): 62-73.
[3] YANG Jucheng, WEI Feng, LIN Liang, JIA Qingxiang, LIU Jianzheng. A research survey of driver drowsiness driving detection [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 1-12.
[4] XIAO Wei, ZHENG Gengsheng, CHEN Yujia. Named entity recognition method combined with self-training model [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 96-102.
[5] Gang HU, Lemeng WANG, Zhiyu LU, Qin WANG, Xiang XU. Importance identification method based on multi-order neighborhood hierarchical association contribution of nodes [J]. Journal of Shandong University(Engineering Science), 2024, 54(1): 1-10.
[6] Jiachun LI,Bowen LI,Jianbo CHANG. An efficient and lightweight RGB frame-level face anti-spoofing model [J]. Journal of Shandong University(Engineering Science), 2023, 53(6): 1-7.
[7] Yujiang FAN,Huanhuan HUANG,Jiaxiong DING,Kai LIAO,Binshan YU. Resilience evaluation system of the old community based on cloud model [J]. Journal of Shandong University(Engineering Science), 2023, 53(5): 1-9, 19.
[8] Ying LI,Jiankun WANG. The classification of mild cognitive impairment based on supervised graph regularization and information fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 65-73.
[9] WU Yanli, LIU Shuwei, HE Dongxiao, WANG Xiaobao, JIN Di. Poisson-gamma topic model of describing multiple underlying relationships [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 51-60.
[10] YU Mingjun, DIAO Hongjun, LING Xinghong. Online multi-object tracking method based on trajectory mask [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 61-69.
[11] LIU Xing, YANG Lu, HAO Fanchang. Finger vein image retrieval based on multi-feature fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 118-126.
[12] LIU Fangxu, WANG Jian, WEI Benzheng. Auxiliary diagnosis algorithm for pediatric pneumonia based on multi-spatial attention [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 135-142.
[13] YU Yixuan, YANG Geng, GENG Hua. Multimodal hierarchical keyframe extraction method for continuous combined motion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 42-50.
[14] HUANG Huajuan, CHENG Qian, WEI Xiuxi, YU Chuchu. Adaptive crow search algorithm with Jaya algorithm and Gaussian mutation [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 11-22.
[15] ZHANG Hao, LI Ziling, LIU Tong, ZHANG Dawei, TAO Jianhua. A technology prediction model based on fuzzy Bayesian networks with sociological factors [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 23-33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!