您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版)

• 机器学习与数据挖掘 • 上一篇    下一篇

基于项目层次相似性的推荐算法

孙远帅1,陈垚1,刘向荣1,2,陈珂3,林琛1,2*   

  1. 1. 厦门大学计算机科学系, 福建 厦门 361005; 2.厦门大学深圳研究院, 广东 深圳 518057;
    3. 广东石油化工学院计算机科学与技术系, 广东 茂名 525000
  • 收稿日期:2013-05-28 出版日期:2014-06-20 发布日期:2013-05-28
  • 通讯作者: 林琛(1982- ),女,福建厦门人,助理教授,博士,主要研究方向为数据挖掘和Web社会网络分析.E-mail:chenlin@xmu.edu.cn
  • 作者简介:孙远帅(1989- ),男,河南濮阳人,硕士研究生,主要研究方向为推荐系统和矩阵分解. E-mail:sunyuan-2008@aliyun.com
  • 基金资助:
    国家自然科学基金资助项目(61370010, 61102136);福建省自然科学基金资助项目(2011J05158, 2010J01350);深圳市科技信息基础研究计划资助项目(JC201006030858A, JCYJ20120618155655087)

Recommendation algorithm based on hierarchical item similarity

SUN Yuanshuai1, CHEN Yao1, LIU Xiangrong1,2, CHEN Ke3, LIN Chen1,2*   

  1. 1. School of Information Science & Technology, Xiamen University, Xiamen 361005, Fujian, China;
    2. Shenzhen Research Institute, Xiamen University, Shenzhen 518057, Guangdong, China;
    3. Department of Computer Science and Technology, Guangdong University of Petrochemical Technology,
    Maoming 525000, Guangdong, China
  • Received:2013-05-28 Online:2014-06-20 Published:2013-05-28

摘要: 针对协同过滤算法推荐效果依赖于相似度度量方法的问题,提出了一种基于项目层次结构相似度的推荐算法REHIS(recommendation hierarchical similarity)。首先利用关联规则挖掘和KNN(K nearest neighbor)算法完善项目层次结构,然后利用TopK算法计算项目之间的相似度,最后利用基于项目的协同过滤算法框架预测用户评分。为解决协同过滤算法扩展性差的问题,还把TopK算法推广到余弦距离和皮尔逊相关系数等常见的相似度度量方法。实验结果表明,与传统的协同过滤算法相比,REHIS能够获得更优的均方根误差,TopK算法可以减少最近邻项目的查找时间。

关键词: TopK, 协同过滤, 项目层次, 倒排索引, 推荐系统, 标签

Abstract: To solve the problem that CF(Collaborative Filtering) recommendation highly depends on the accurate similarity measurement, a novel recommendation algorithm based on item hierarchy similarity was proposed, which was named REHIS(Recommendation Hierarchical Similarity). The framework of REHIS was described as follows. First, the mining association rules and KNN (K Nearest Neighbor) algorithm were used to complement the hierarchy structure. Afterwards, the TopK method was employed to compute the similarity between items. Finally, scores were predicted by using the framework of itembased CF algorithm. On the other hand, to solve the CF poor scalability problem, the TopK algorithm were further extended to the cosine distance and Pearson correlation coefficient, both of which were commonly used similarity measurement methods. Experimental results showed that, compared with existing algorithms, REHIS could achieve a better recommendation in term of root mean square error, and TopK could reduce the time cost for searching the most similar items, too.

Key words: recommendation system, tag, TopK, inverted index, collaborative filtering, item hierarchy

[1] 林耀进,张佳,林梦雷,王娟. 一种基于模糊信息熵的协同过滤推荐方法[J]. 山东大学学报(工学版), 2016, 46(5): 13-20.
[2] 黄丹,王志海,刘海洋. 一种局部协同过滤的排名推荐算法[J]. 山东大学学报(工学版), 2016, 46(5): 29-36.
[3] 庞俊涛, 张晖, 杨春明, 李波, 赵旭剑. 基于概率矩阵分解的多指标协同过滤算法[J]. 山东大学学报(工学版), 2016, 46(3): 65-73.
[4] 李朔,石宇良. 基于位置社交网络中地点聚类推荐方法[J]. 山东大学学报(工学版), 2016, 46(3): 44-50.
[5] 张佳,林耀进,林梦雷,刘景华,李慧宗. 基于信息熵的协同过滤算法[J]. 山东大学学报(工学版), 2016, 46(2): 43-50.
[6] 钱肃驰, 彭甫镕, 陆建峰. 基于语义相似度的标签优化[J]. 山东大学学报(工学版), 2015, 45(2): 37-42.
[7] 陈大伟,闫昭*,刘昊岩. SVD系列算法在评分预测中的过拟合现象[J]. 山东大学学报(工学版), 2014, 44(3): 15-21.
[8] 李改1,2,3, 李磊2,3. 一种解决协同过滤系统冷启动问题的新算法[J]. 山东大学学报(工学版), 2012, 42(2): 11-17.
[9] 王爱国,李廉*,杨静,陈桂林. 一种基于Bayesian网络的网页推荐算法[J]. 山东大学学报(工学版), 2011, 41(4): 137-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!