Journal of Shandong University(Engineering Science) ›› 2026, Vol. 56 ›› Issue (3): 106-117.doi: 10.6040/j.issn.1672-3961.0.2024.284

• Machine Learning & Data Mining • Previous Articles     Next Articles

Algorithm for two-sided collaborative filtering multimodal contrastive representation enhancement recommender

CHEN Yu1, MENG Guangting1, ZONG Chen1, YUAN Weihua1,2*, WANG Jiening3, WANG Xing1   

  1. CHEN Yu1, MENG Guangting1, ZONG Chen1, YUAN Weihua1, 2*, WANG Jiening3, WANG Xing1(1. School of Computer and Artificial Intelligence, Shandong Jianzhu University, Jinan 250101, Shandong, China;
    2. Computational Intelligence Center, Shandong Jianzhu University, Jinan 250101, Shandong, China;
    3. School of Architecture and Urban Planning, Shandong Jianzhu University, Jinan 250101, Shandong, China
  • Published:2026-06-09

Abstract: The existing multimodal recommenders had three main problems: the potential relevance between multimodal data and interaction data had not been fully explored, leading to the weakening of key features; the accidentally caused noise unrelated to user interests was ignored; the static multimodal fusion method provided the same weight to each modality and could not dynamically perceive the change of user interests, resulting in insufficient discrimination of the learned representations. Therefore, a user and item two-sided collaborative filtering multimodal contrastive representation enhancement(TCFCRE)recommender was proposed. To address the shortcomings in combining multimodal and interaction data, TCFCRE used contrastive learning to enhance key features and mine the potential associations. Meanwhile, to reduce the impact of noise, a cross-modal user representation alignment module was designed to discover the consistency of user features and extract users' true interests. A mask matrix based on the user-item multimodal relationship was also constructed to generate an augmented view, and contrastive learning was adopted to reduce the noise impact in implicit feedback. To alleviate the problem that traditional methods ignored the importance of modalities and could not adapt to dynamic changes, a multimodal dynamic fusion module that calculated fusion weights for each representation was designed. Experiments on three public datasets demonstrated that TCFCRE had achieved significant improvements over existing solutions.

Key words: multimodal recommendation, graph neural network, representation learning, contrastive learning, representation enhancement

CLC Number: 

  • TP391.3
[1] WEI W, HUANG C, XIA L H, et al. Multi-modal self-supervised learning for recommendation[C] //Proceedings of the ACM Web Conference. Austin, USA:ACM, 2023: 790-800.
[2] 李璐, 张志军, 范钰敏, 等. 面向冷启动用户的元学习与图转移学习序列推荐[J]. 山东大学学报(工学版), 2024, 54(2): 69-79. LI Lu, ZHANG Zhijun, FAN Yumin, et al. Sequential recommendation for cold-start users with meta graph transitional learning[J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 69-79.
[3] 段圣宇, 吴伊宁, 赛高乐. 一种面向矩阵分解模型的推荐系统训练加速方法[J]. 山东大学学报(工学版), 2025, 55(1): 24-29. DUAN Shengyu, WU Yining, SAI Gaole. Algorithmic acceleration of matrix factorization based recommendation system[J]. Journal of Shandong University(Engineering Science), 2025, 55(1): 24-29.
[4] HE R N, MCAULEY J. VBPR: visual Bayesian personalized ranking from implicit feedback[EB/OL].(2015-10-06)[2024-01-28]. https://arxiv.org/abs/1510.01784
[5] WEI Y W, WANG X, NIE L Q, et al. MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video[C] //Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM, 2019: 1437-1445.
[6] TAO Z L, WEI Y W, WANG X, et al. MGAT: multimodal graph attention network for recommendation[J]. Information Processing & Management, 2020, 57(5): 102277.
[7] WEI Y W, WANG X, HE X N, et al. Hierarchical user intent graph network for multimedia recommendation[J]. IEEE Transactions on Multimedia, 2022, 24: 2701-2712.
[8] CHEN F Y, WANG J J, WEI Y W, et al. Breaking isolation: multimodal graph fusion for multimedia recommendation by edge-wise modulation[C] //Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM, 2022: 385-394.
[9] ZHANG J H, ZHU Y Q, LIU Q, et al. Mining latent structures for multimedia recommendation[EB/OL].(2021-08-19)[2024-01-28]. https://arxiv.org/abs/2104.09036
[10] KIM T, LEE Y C, SHIN K, et al. MARIO: modality-aware attention and modality-preserving decoders for multimedia recommendation[C] //Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Atlanta, USA: ACM, 2022: 993-1002.
[11] ZHOU X, SHEN Z Q. A tale of two graphs: freezing and denoising graph structures for multimodal recommendation[C] //Proceedings of the 31st ACM International Conference on Multimedia. Ottawa, Canada: ACM, 2023: 935-943.
[12] WANG Q F, WEI Y W, YIN J H, et al. DualGNN: dual graph neural network for multimedia recommendation[J]. IEEE Transactions on Multimedia, 2023, 25: 1074-1084.
[13] LIU K, XUE F, GUO D, et al. Multimodal graph contrastive learning for multimedia-based recommendation[J]. IEEE Transactions on Multimedia, 2023, 25: 9343-9355.
[14] ZHOU X, ZHOU H Y, LIU Y, et al. Bootstrap latent representations for multi-modal recommendation[C] //Proceedings of the ACM Web Conference. Austin, USA:ACM, 2023: 845-854.
[15] HE X N, DENG K, WANG X, et al. LightGCN: simplifying and powering graph convolution network for recommendation[EB/OL].(2020-07-07)[2024-01-28]. https://arxiv.org/abs/2002.02126
[16] WEI Y W, WANG X, NIE L Q, et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback[C] //Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020: 3541-3549.
[17] TIAN C X, XIE Y X, LI Y L, et al. Learning to denoise unreliable interactions for graph collaborative filtering[C] //Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Madrid, Spain:ACM, 2022: 122-132.
[18] REN X B, XIA L H, ZHAO J S, et al. Disentangled contrastive collaborative filtering[EB/OL].(2024-02-25)[2024-01-28]. https://arxiv.org/abs/2305.02759
[19] RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[EB/OL].(2012-05-09)[2024-01-28]. https://arxiv.org/pdf/1205.2618
[20] WANG X, HE X N, WANG M, et al. Neural graph collaborative filtering[C] //Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Paris, France:ACM, 2019: 165-174.
[21] BURABAK M, AYTEKIN T. SynerGraph: an in-tegrated graph convolution network for multim-odal recommendation[EB/OL].(2020-07-13)[2024-01-28]. https://arxiv.org/pdf/2405.19031
[1] TANG Kai, WANG Fang, LIU Jianxia. Graph node classification algorithm based on multi-level core aggregation GNN [J]. Journal of Shandong University(Engineering Science), 2026, 56(3): 137-143.
[2] LI Junliang, JIANG Yuan, WU Longxue, LIU Yu. Enhanced Graph Transformer with node and edge feature fusion [J]. Journal of Shandong University(Engineering Science), 2026, 56(3): 118-126.
[3] DENG Bin, ZHANG Zongbao, ZHAO Wenmeng, LUO Xinhang, WU Qiuwei. Cloud-edge collaborative and graph neural network based load forecasting method for electric vehicle charging stations [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 62-69.
[4] DIAO Zhenyu, HAN Xiaofan, ZHANG Chengyu, NIE Huijia, ZHAO Xiuyang, NIU Dongmei. Single image 3D model retrieval based on instance discrimination and feature enhancement [J]. Journal of Shandong University(Engineering Science), 2025, 55(2): 71-77.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!