您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (2): 66-75.doi: 10.6040/j.issn.1672-3961.0.2019.304

• 机器学习与数据挖掘 • 上一篇    下一篇

基于元图归一化相似性度量的实体推荐

张文凯(),禹可,吴晓非   

  1. 北京邮电大学信息与通信工程学院, 北京 100876
  • 收稿日期:2019-06-13 出版日期:2020-04-20 发布日期:2020-04-16
  • 作者简介:张文凯(1995—),女,山东莱芜人,硕士研究生,主要研究方向为异构信息网络,数据挖掘. E-mail:zhangwenkai2018@163.com
  • 基金资助:
    国家自然科学基金资助项目(61601046);国家自然科学基金资助项目(61171098);中国111基地资助项目(B08004);欧盟FP7 IRSES资助项目(612212)

Entity recommendation based on normalized similarity measure of meta graph in heterogeneous information network

Wenkai ZHANG(),Ke YU,Xiaofei WU   

  1. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2019-06-13 Online:2020-04-20 Published:2020-04-16
  • Supported by:
    国家自然科学基金资助项目(61601046);国家自然科学基金资助项目(61171098);中国111基地资助项目(B08004);欧盟FP7 IRSES资助项目(612212)

摘要:

基于异构信息网络(heterogeneous information networks, HIN)中元图的良好表征特性,提出一种结合隐式反馈和PathSim(meta path-based similarity)的归一化相似性度量(normalized similarity measure of meta graph, NSMG),以解决对异构信息网络中高度可见实体的偏好问题。针对Yelp和Amazon数据集构建Yelp-HIN(heterogeneous information networks in Yelp)和Amazon-HIN(heterogeneous information networks in Amazon),定义不同类型的元图及归一化相似度量,使用矩阵分解和因子分解机来组合计算不同元图上的相似性。试验结果表明,基于NSMG的方法在非常稀疏的数据集上性能表现优于常用的HIN实体推荐方法。

关键词: 异构信息网络, 元图, 归一化相似性度量, 实体推荐, 矩阵分解, 因子分解机

Abstract:

Based on the promising result of meta graph in heterogeneous information networks (HIN), normalized similarity measure of meta graph (NSMG) was proposed which combined implicit feedback matrix and PathSim(meta path-based similarity) to solve the problem of preference for large degree entities. Yelp-HIN(heterogeneous information networks in Yelp) and Amazon-HIN(heterogeneous information networks in Amazon) were constructed based on Yelp and Amazon datasets. Different types of meta graphs and normalized similarity measures were defined. Matrix decomposition and factorization machine were used to combine the similarities on different meta graphs. The experimental results showed that the proposed method based on normalization similarity measure of meta graphs performed better than the commonly used entity recommendation method in HIN on very sparse data sets.

Key words: heterogeneous information networks, meta graph, normalized similarity measure, entity recommendation, matrix decomposition, factorization machine

中图分类号: 

  • TP311

图1

Yelp-HIN网络模式"

图2

Amazon-HIN网络模式"

图3

Yelp-HIN/Amazon-HIN中的元图"

表1

Yelp-HIN和Amazon-HIN中的元路径"

HIN 元路径
Yelp-HIN M3:用户-商家-分类-商家
M4:用户-商家-州-商家
M5:用户-商家-城市-商家
M6:用户-商家-星级-商家
M7:用户-商家-分类-商家-分类-商家
M8:用户-商家-州-商家-州-商家
M9:用户-评论-主题-评论-用户-商家
M10:用户-评论-商家-评论-用户-商家
Amazon-HIN M3:用户-产品-分类-产品
M4:用户-产品-品牌-产品
M5:用户-产品-分类-产品-分类-产品
M6:用户-产品-品牌-产品-品牌-产品
M7:用户-评论-主题-评论-用户-产品
M8:用户-评论-产品-评论-用户-产品

图4

元图的对称部分和用户隐式反馈部分"

图5

M1物品对称元图的统一表示"

图6

M2用户对称元图的统一表示"

表2

Yelp-50k和Amazon-50k的统计信息"

统计信息 用户数/个 商家/个 类别数/个 星级数/个 城市数/个 州数/个 品牌数/个 签到数/个 签到数:用户数 签到数:商家数
Yelp-50k 13 663 8 164 716 9 154 13 37 659 2.76 4.61
Amazon-50k 20 345 8 146 488 640 29 350 1.44 3.60

表3

数据集的稀疏性对比"

数据集 稠密度/%
Yelp-50k 0.037
Amazon-50k 0.017
IMDb-MovieLens-100k 6.988
Yelp[19] 0.086
Douban 0.630

表4

表现对比"

方法 Yelp-50k Amazon-50k
RMSE NSMG提升百分比/% RMSE NSMG提升百分比/%
RSVD 2.599 6 52.6 2.639 5 55.5
HeteRec 1.732 0 28.9 1.936 0 39.3
FMG 1.247 3 1.2 1.185 0 1.3
NSMG 1.231 8 1.174 1

图7

每种类型元图上的性能对比"

图8

Yelp-HIN和Amazon-HIN单个元图上的性能对比"

图9

Yelp-50k和Amazon-50k上的参数调优"

1 ZHAO H, YAO Q M, LI J D, et al. Meta-graph based recommendation fusion over heterogeneous information networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: ACM, 2017: 635-644.
2 SHI C , LI Y T , ZHANG J W , et al. A survey of heterogeneous information network analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (1): 17- 37.
doi: 10.1109/TKDE.2016.2598561
3 JEH G, WIDOM J. Scaling personalized web search[C]//Proceedings of the 12th international conference on World Wide Web. Budapest, Hungary: ACM, 2003: 271-279.
4 郑玉艳, 田莹, 石川. 一种元路径下基于频繁模式的实体集扩展方法[J]. 软件学报, 2018, 29 (10): 2915- 2930.
ZHENG Yuyan , TIAN ying , SHI Chuan . Method of entity set expansion based on frequent pattern under meta path[J]. Journal of Software, 2018, 29 (10): 2915- 2930.
5 黄立威, 李德毅, 马于涛, 等. 一种基于元路径的异质信息网络链路预测模型[J]. 计算机学报, 2014, 37 (4): 848- 858.
HUANG Liwei , LI Deyi , MA Yutao , et al. A meta path-based link prediction model for heterogeneous information networks[J]. Chinese Journal of Computers, 2014, 37 (4): 848- 858.
6 盛权为, 汪一百, 高阳. 一种改进的异构链路协同预测算法研究[J]. 计算机工程与应用, 2017, 53 (15): 155- 163.
SHENG Quanwei , WANG Yibai , GAO Yang . Research on improved algorithm for collaborative prediction of heterogeneous links[J]. Computer Engineering and Applications, 2017, 53 (15): 155- 163.
7 SUN Y Z , HAN J W , YAN X F , et al. Pathsim: meta path-based top-k similarity search in heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2011, 4 (11): 992- 1003.
8 SHI C , KONG X N , HUANG Y , et al. Hetesim: a general framework for relevance measure in heterogeneous networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26 (10): 2479- 2492.
doi: 10.1109/TKDE.2013.2297920
9 HUANG Z P, ZHENG Y D, CHENG R, et al. Meta structure: computing relevance in large heterogeneous information networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016: 1595-1604.
10 YU X, REN X, SUN Y Z, et al. Recommendation in heterogeneous information networks with implicit user feedback[C]//Proceedings of the 7th ACM conference on Recommender systems. Hong Kong, China: ACM, 2013: 347-350.
11 YU X, REN X, SUN Y Z, et al. Personalized entity recommendation: a heterogeneous information network approach[C]//Proceedings of the 7th ACM international conference on Web search and data mining. New York, USA: ACM, 2014: 283-292.
12 SHI C , LIU J , ZHUANG F Z , et al. Integrating heterogeneous information via flexible regularization framework for recommendation[J]. Knowledge and Information Systems, 2016, 49 (3): 835- 859.
doi: 10.1007/s10115-016-0925-0
13 ZHENG J, LIU J, SHI C, et al. Dual similarity regularization for recommendation[C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Auckland, New Zealand: Springer, 2016: 542-554.
14 JAMALI M, LAKSHMANAN L. HeteroMF: recommendation in heterogeneous information networks using context dependent factor models[C]// Proceedings of the 22nd International Conference on World Wide Web. Rio de Janeiro, Brazil: ACM, 2013: 643-654.
15 XIE F, CHEN L, YE Y, et al. A weighted meta-graph based approach for mobile application recommendation on heterogeneous information networks[C]//International Conference on Service-Oriented Computing. Hangzhou, China: Springer, 2018: 404-420.
16 ZHENG J, LIU J, SHI C, et al. Dual similarity regularization for recommendation[C]// Pacific-Asia Conference on Knowledge Discovery and Data Mining. Auckland, New Zealand: Springer, 2016: 542-554.
17 KOREN Y , BELL R , VOLINSKY C . Matrix factorization techniques for recommender systems[J]. Computer, 2009, 42 (8): 30- 37.
doi: 10.1109/MC.2009.263
18 SHI C, ZHOU C, KONG X N, et al. Heterecom: a semantic-based recommendation system in heterogeneous networks[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, China: ACM, 2012: 1552-1555.
19 SHI C, ZHANG Z Q, LUO P, et al. Semantic path based personalized recommendation on weighted heterogeneous information networks[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. Melbourne, Australia: ACM, 2015: 453-462.
20 BURKE R, VAHEDIAN F, MOBASHER B. Hybrid recommendation in heterogeneous networks[C]//International Conference on User Modeling, Adaptation, and Personalization. Aalborg, Denmark: Springer, 2014: 49-60.
21 王瑜, 武延军, 吴敬征, 等. 基于异构网络面向多标签系统的推荐模型研究[J]. 软件学报, 2017, 28 (10): 2611- 2624.
WANG Yu , WU Yanjun , WU Jingzheng , et al. Multi-Dimensional tag recommender model via heterogeneous networks[J]. Journal of Software(in Chinese), 2017, 28 (10): 2611- 2624.
22 王永, 邓永恒, 李晓光. 考虑非对称用户偏好的推荐算法[J]. 计算机工程与应用, 2018, 54 (23): 1- 6.
doi: 10.3778/j.issn.1002-8331.1809-0322
WANG Yong , DENG Yongheng , LI Xiaoguang . Asymmetric recommendation algorithm based on user preference[J]. Computer Engineering and Applications, 2018, 54 (23): 1- 6.
doi: 10.3778/j.issn.1002-8331.1809-0322
23 戴琳, 孟祥武, 张玉洁, 等. 一种融合多种数据信息的餐馆推荐模型[J]. 软件学报, 2019, 30 (9): 2869- 2885.
DAI Lin , MENG Xiangwu , ZHANG Yujie , et al. A restaurant recommendation model with multiple information fusion[J]. Journal of Software(in Chinese), 2019, 30 (9): 2869- 2885.
24 YUAN M , LIN Y . Model selection and estimation in regression with grouped variables[J]. Journal of the Royal Statistical Society: Series B:Statistical Methodology, 2006, 68 (1): 49- 67.
doi: 10.1111/j.1467-9868.2005.00532.x
25 LI H, LIN Z. Accelerated proximal gradient methods for nonconvex programming[C]//Advances in Neural Information Processing Systems. Montreal, Canada: NIPS, 2015: 379-387.
[1] 黄丹,王志海,刘海洋. 一种局部协同过滤的排名推荐算法[J]. 山东大学学报(工学版), 2016, 46(5): 29-36.
[2] 庞俊涛, 张晖, 杨春明, 李波, 赵旭剑. 基于概率矩阵分解的多指标协同过滤算法[J]. 山东大学学报(工学版), 2016, 46(3): 65-73.
[3] 李国栋,赵威,田国会*,薛英花. 一种基于旋转矩阵分解的视觉伺服控制算法[J]. 山东大学学报(工学版), 2012, 42(1): 45-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[2] 刘兆娟,刘锦波 . 基于输入输出反馈线性化三态Boost DC/DC变换器的新型控制策略[J]. 山东大学学报(工学版), 2008, 38(1): 43 -47 .
[3] 蔡晓军1 ,张擎1 ,柴乔林1 ,孔苏丽2 . 基于能量均衡的n分多路径路由算法[J]. 山东大学学报(工学版), 2009, 39(2): 141 -145 .
[4] 黄劲潮. 基于快速区域建议网络的图像多目标分割算法[J]. 山东大学学报(工学版), 2018, 48(4): 20 -26 .
[5] 孟祥星1,于大洋2,韩学山2,赵建国3. 太阳辐射与负荷波动的相关性对光伏发电并网的影响[J]. 山东大学学报(工学版), 2010, 40(2): 126 -129 .
[6] 罗运虎,邢丽冬,王勤,刘海春,翁晓光 . 需求侧2种可中断负荷备用市场报价策略的协调[J]. 山东大学学报(工学版), 2008, 38(3): 77 -80 .
[7] 韩雪. 平庄西露天煤矿滑坡灾害远程监测实例分析[J]. 山东大学学报(工学版), 2009, 39(4): 116 -120 .
[8] 曹欣 孙新利 李振. 改进灰自助法及其在可靠性评定中的应用[J]. 山东大学学报(工学版), 2010, 40(1): 144 -148 .
[9] 刘玉振,徐承强 . 多晶体材料三维微结构有限元分析的后处理[J]. 山东大学学报(工学版), 2008, 38(2): 13 -17 .
[10] 赵元宾,孙奉仲,王凯,高明 . 自然通风湿式冷却塔传热传质的三维数值分析[J]. 山东大学学报(工学版), 2008, 38(5): 36 -41 .