您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (2): 66-75.doi: 10.6040/j.issn.1672-3961.0.2019.304

• 机器学习与数据挖掘 • 上一篇    下一篇

基于元图归一化相似性度量的实体推荐

张文凯(),禹可,吴晓非   

  1. 北京邮电大学信息与通信工程学院, 北京 100876
  • 收稿日期:2019-06-13 出版日期:2020-04-20 发布日期:2020-04-16
  • 作者简介:张文凯(1995—),女,山东莱芜人,硕士研究生,主要研究方向为异构信息网络,数据挖掘. E-mail:zhangwenkai2018@163.com
  • 基金资助:
    国家自然科学基金资助项目(61601046);国家自然科学基金资助项目(61171098);中国111基地资助项目(B08004);欧盟FP7 IRSES资助项目(612212)

Entity recommendation based on normalized similarity measure of meta graph in heterogeneous information network

Wenkai ZHANG(),Ke YU,Xiaofei WU   

  1. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2019-06-13 Online:2020-04-20 Published:2020-04-16
  • Supported by:
    国家自然科学基金资助项目(61601046);国家自然科学基金资助项目(61171098);中国111基地资助项目(B08004);欧盟FP7 IRSES资助项目(612212)

摘要:

基于异构信息网络(heterogeneous information networks, HIN)中元图的良好表征特性,提出一种结合隐式反馈和PathSim(meta path-based similarity)的归一化相似性度量(normalized similarity measure of meta graph, NSMG),以解决对异构信息网络中高度可见实体的偏好问题。针对Yelp和Amazon数据集构建Yelp-HIN(heterogeneous information networks in Yelp)和Amazon-HIN(heterogeneous information networks in Amazon),定义不同类型的元图及归一化相似度量,使用矩阵分解和因子分解机来组合计算不同元图上的相似性。试验结果表明,基于NSMG的方法在非常稀疏的数据集上性能表现优于常用的HIN实体推荐方法。

关键词: 异构信息网络, 元图, 归一化相似性度量, 实体推荐, 矩阵分解, 因子分解机

Abstract:

Based on the promising result of meta graph in heterogeneous information networks (HIN), normalized similarity measure of meta graph (NSMG) was proposed which combined implicit feedback matrix and PathSim(meta path-based similarity) to solve the problem of preference for large degree entities. Yelp-HIN(heterogeneous information networks in Yelp) and Amazon-HIN(heterogeneous information networks in Amazon) were constructed based on Yelp and Amazon datasets. Different types of meta graphs and normalized similarity measures were defined. Matrix decomposition and factorization machine were used to combine the similarities on different meta graphs. The experimental results showed that the proposed method based on normalization similarity measure of meta graphs performed better than the commonly used entity recommendation method in HIN on very sparse data sets.

Key words: heterogeneous information networks, meta graph, normalized similarity measure, entity recommendation, matrix decomposition, factorization machine

中图分类号: 

  • TP311

图1

Yelp-HIN网络模式"

图2

Amazon-HIN网络模式"

图3

Yelp-HIN/Amazon-HIN中的元图"

表1

Yelp-HIN和Amazon-HIN中的元路径"

HIN 元路径
Yelp-HIN M3:用户-商家-分类-商家
M4:用户-商家-州-商家
M5:用户-商家-城市-商家
M6:用户-商家-星级-商家
M7:用户-商家-分类-商家-分类-商家
M8:用户-商家-州-商家-州-商家
M9:用户-评论-主题-评论-用户-商家
M10:用户-评论-商家-评论-用户-商家
Amazon-HIN M3:用户-产品-分类-产品
M4:用户-产品-品牌-产品
M5:用户-产品-分类-产品-分类-产品
M6:用户-产品-品牌-产品-品牌-产品
M7:用户-评论-主题-评论-用户-产品
M8:用户-评论-产品-评论-用户-产品

图4

元图的对称部分和用户隐式反馈部分"

图5

M1物品对称元图的统一表示"

图6

M2用户对称元图的统一表示"

表2

Yelp-50k和Amazon-50k的统计信息"

统计信息 用户数/个 商家/个 类别数/个 星级数/个 城市数/个 州数/个 品牌数/个 签到数/个 签到数:用户数 签到数:商家数
Yelp-50k 13 663 8 164 716 9 154 13 37 659 2.76 4.61
Amazon-50k 20 345 8 146 488 640 29 350 1.44 3.60

表3

数据集的稀疏性对比"

数据集 稠密度/%
Yelp-50k 0.037
Amazon-50k 0.017
IMDb-MovieLens-100k 6.988
Yelp[19] 0.086
Douban 0.630

表4

表现对比"

方法 Yelp-50k Amazon-50k
RMSE NSMG提升百分比/% RMSE NSMG提升百分比/%
RSVD 2.599 6 52.6 2.639 5 55.5
HeteRec 1.732 0 28.9 1.936 0 39.3
FMG 1.247 3 1.2 1.185 0 1.3
NSMG 1.231 8 1.174 1

图7

每种类型元图上的性能对比"

图8

Yelp-HIN和Amazon-HIN单个元图上的性能对比"

图9

Yelp-50k和Amazon-50k上的参数调优"

1 ZHAO H, YAO Q M, LI J D, et al. Meta-graph based recommendation fusion over heterogeneous information networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: ACM, 2017: 635-644.
2 SHI C , LI Y T , ZHANG J W , et al. A survey of heterogeneous information network analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (1): 17- 37.
doi: 10.1109/TKDE.2016.2598561
3 JEH G, WIDOM J. Scaling personalized web search[C]//Proceedings of the 12th international conference on World Wide Web. Budapest, Hungary: ACM, 2003: 271-279.
4 郑玉艳, 田莹, 石川. 一种元路径下基于频繁模式的实体集扩展方法[J]. 软件学报, 2018, 29 (10): 2915- 2930.
ZHENG Yuyan , TIAN ying , SHI Chuan . Method of entity set expansion based on frequent pattern under meta path[J]. Journal of Software, 2018, 29 (10): 2915- 2930.
5 黄立威, 李德毅, 马于涛, 等. 一种基于元路径的异质信息网络链路预测模型[J]. 计算机学报, 2014, 37 (4): 848- 858.
HUANG Liwei , LI Deyi , MA Yutao , et al. A meta path-based link prediction model for heterogeneous information networks[J]. Chinese Journal of Computers, 2014, 37 (4): 848- 858.
6 盛权为, 汪一百, 高阳. 一种改进的异构链路协同预测算法研究[J]. 计算机工程与应用, 2017, 53 (15): 155- 163.
SHENG Quanwei , WANG Yibai , GAO Yang . Research on improved algorithm for collaborative prediction of heterogeneous links[J]. Computer Engineering and Applications, 2017, 53 (15): 155- 163.
7 SUN Y Z , HAN J W , YAN X F , et al. Pathsim: meta path-based top-k similarity search in heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2011, 4 (11): 992- 1003.
8 SHI C , KONG X N , HUANG Y , et al. Hetesim: a general framework for relevance measure in heterogeneous networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26 (10): 2479- 2492.
doi: 10.1109/TKDE.2013.2297920
9 HUANG Z P, ZHENG Y D, CHENG R, et al. Meta structure: computing relevance in large heterogeneous information networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016: 1595-1604.
10 YU X, REN X, SUN Y Z, et al. Recommendation in heterogeneous information networks with implicit user feedback[C]//Proceedings of the 7th ACM conference on Recommender systems. Hong Kong, China: ACM, 2013: 347-350.
11 YU X, REN X, SUN Y Z, et al. Personalized entity recommendation: a heterogeneous information network approach[C]//Proceedings of the 7th ACM international conference on Web search and data mining. New York, USA: ACM, 2014: 283-292.
12 SHI C , LIU J , ZHUANG F Z , et al. Integrating heterogeneous information via flexible regularization framework for recommendation[J]. Knowledge and Information Systems, 2016, 49 (3): 835- 859.
doi: 10.1007/s10115-016-0925-0
13 ZHENG J, LIU J, SHI C, et al. Dual similarity regularization for recommendation[C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Auckland, New Zealand: Springer, 2016: 542-554.
14 JAMALI M, LAKSHMANAN L. HeteroMF: recommendation in heterogeneous information networks using context dependent factor models[C]// Proceedings of the 22nd International Conference on World Wide Web. Rio de Janeiro, Brazil: ACM, 2013: 643-654.
15 XIE F, CHEN L, YE Y, et al. A weighted meta-graph based approach for mobile application recommendation on heterogeneous information networks[C]//International Conference on Service-Oriented Computing. Hangzhou, China: Springer, 2018: 404-420.
16 ZHENG J, LIU J, SHI C, et al. Dual similarity regularization for recommendation[C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Auckland, New Zealand: Springer, 2016: 542-554.
17 KOREN Y , BELL R , VOLINSKY C . Matrix factorization techniques for recommender systems[J]. Computer, 2009, 42 (8): 30- 37.
doi: 10.1109/MC.2009.263
18 SHI C, ZHOU C, KONG X N, et al. Heterecom: a semantic-based recommendation system in heterogeneous networks[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, China: ACM, 2012: 1552-1555.
19 SHI C, ZHANG Z Q, LUO P, et al. Semantic path based personalized recommendation on weighted heterogeneous information networks[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. Melbourne, Australia: ACM, 2015: 453-462.
20 BURKE R, VAHEDIAN F, MOBASHER B. Hybrid recommendation in heterogeneous networks[C]//International Conference on User Modeling, Adaptation, and Personalization. Aalborg, Denmark: Springer, 2014: 49-60.
21 王瑜, 武延军, 吴敬征, 等. 基于异构网络面向多标签系统的推荐模型研究[J]. 软件学报, 2017, 28 (10): 2611- 2624.
WANG Yu , WU Yanjun , WU Jingzheng , et al. Multi-Dimensional tag recommender model via heterogeneous networks[J]. Journal of Software(in Chinese), 2017, 28 (10): 2611- 2624.
22 王永, 邓永恒, 李晓光. 考虑非对称用户偏好的推荐算法[J]. 计算机工程与应用, 2018, 54 (23): 1- 6.
doi: 10.3778/j.issn.1002-8331.1809-0322
WANG Yong , DENG Yongheng , LI Xiaoguang . Asymmetric recommendation algorithm based on user preference[J]. Computer Engineering and Applications, 2018, 54 (23): 1- 6.
doi: 10.3778/j.issn.1002-8331.1809-0322
23 戴琳, 孟祥武, 张玉洁, 等. 一种融合多种数据信息的餐馆推荐模型[J]. 软件学报, 2019, 30 (9): 2869- 2885.
DAI Lin , MENG Xiangwu , ZHANG Yujie , et al. A restaurant recommendation model with multiple information fusion[J]. Journal of Software(in Chinese), 2019, 30 (9): 2869- 2885.
24 YUAN M , LIN Y . Model selection and estimation in regression with grouped variables[J]. Journal of the Royal Statistical Society: Series B:Statistical Methodology, 2006, 68 (1): 49- 67.
doi: 10.1111/j.1467-9868.2005.00532.x
25 LI H, LIN Z. Accelerated proximal gradient methods for nonconvex programming[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2015: 379-387.
[1] 段圣宇,吴伊宁,赛高乐. 一种面向矩阵分解模型的推荐系统训练加速方法[J]. 山东大学学报 (工学版), 2025, 55(1): 24-29.
[2] 王冰,马文明,武聪,郝昱猛. 融合信任相似度的偏置概率矩阵分解算法[J]. 山东大学学报 (工学版), 2022, 52(4): 110-117.
[3] 林晓炜,陈黎飞. 结构扩展的非负矩阵分解社区发现算法[J]. 山东大学学报 (工学版), 2021, 51(2): 57-64.
[4] 黄丹,王志海,刘海洋. 一种局部协同过滤的排名推荐算法[J]. 山东大学学报(工学版), 2016, 46(5): 29-36.
[5] 庞俊涛, 张晖, 杨春明, 李波, 赵旭剑. 基于概率矩阵分解的多指标协同过滤算法[J]. 山东大学学报(工学版), 2016, 46(3): 65-73.
[6] 李新玉, 徐桂云,任世锦,杨茂云. 基于鉴别流形的不相关稀疏投影非负矩阵分解[J]. 山东大学学报 (工学版), 2015, 45(5): 1-12.
[7] 李国栋,赵威,田国会*,薛英花. 一种基于旋转矩阵分解的视觉伺服控制算法[J]. 山东大学学报(工学版), 2012, 42(1): 45-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[2] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[3] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[4] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[5] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[6] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[7] 王丽君,黄奇成,王兆旭 . 敏感性问题中的均方误差与模型比较[J]. 山东大学学报(工学版), 2006, 36(6): 51 -56 .
[8] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[9] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[10] 孙玉利,李法德,左敦稳,戚美 . 直立分室式流体连续通电加热系统的升温特性[J]. 山东大学学报(工学版), 2006, 36(6): 19 -23 .