您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2019, Vol. 49 ›› Issue (1): 47-54.doi: 10.6040/j.issn.1672-3961.0.2017.485

• 机器学习与数据挖掘 • 上一篇    下一篇

基于在线评论情感分析的改进协同过滤推荐模型

钱春琳1,2(),张兴芳3,*(),孙丽华2   

  1. 1. 河海大学企业管理学院, 江苏 常州 213022
    2. 天津大学管理与经济学部, 天津 300072
    3. 聊城大学数学科学学院, 山东 聊城 252000
  • 收稿日期:2017-10-08 出版日期:2019-02-20 发布日期:2019-03-01
  • 通讯作者: 张兴芳 E-mail:qiancl1997@126.com;sunlh68@tju.edu.cn
  • 作者简介:钱春琳(1997—),女,江苏南通人,硕士研究生,主要研究方向为个性化推荐算法研究与在线评论挖掘. E-mail:qiancl1997@126.com
  • 基金资助:
    国家自然科学基金(11471152)

Advanced collaborative filtering recommendation model based on sentiment analysis of online review

Chunlin QIAN1,2(),Xingfang ZHANG3,*(),Lihua SUN2   

  1. 1. School of Business Administration, Hohai University, Changzhou 213022, Jiangsu, China
    2. College of Management and Economics, Tianjin University, Tianjin 300072, China
    3. School of Mathematical Sciences, Liaocheng University, Liaocheng 252000, Shandong, China
  • Received:2017-10-08 Online:2019-02-20 Published:2019-03-01
  • Contact: Xingfang ZHANG E-mail:qiancl1997@126.com;sunlh68@tju.edu.cn
  • Supported by:
    国家自然科学基金(11471152)

摘要:

针对在线中文评论中用户主观意见的不确定性,提出一种基于不确定理论的情感分析模型,并结合情感分析模型设计了个性化推荐算法。采用分词工具ICTCLAS和IKAnalyzer预处理在线中文评论,并基于情感词典(HowNet)计算特征词的点互信息值;应用不确定变量与不确定集设计情感分析模型;根据情感分析模型设计新的最近邻居搜索方法并产生推荐。在两个真实数据集上进行试验,试验结果表明,该方法能够有效改进推荐结果的准确率,缓解数据稀疏问题。

关键词: 推荐模型, 不确定变量, 不确定集, 在线评论, 情感分析

Abstract:

Aiming at the uncertainty of users' subject opinions in online Chinese review, a sentiment analysis model was proposed based on uncertainty theory. An individual recommendation algorithm was designed on the basis of the proposed sentiment analysis model. Firstly, the tokenizers of ICTCLAS and IKAnalyzer were used to preprocess online Chinese review to generate characteristic words, and the point mutual information value of characteristic words accounting for the sentiment direction were computed based on sentiment dictionary (HowNet). Then, the sentiment analysis model was established via uncertainty theory of uncertain variable and uncertain set. In addition, the new similarity formula based on the proposed model was used to search the nearest neighbors. Finally, the recommendation lists were given. The experiments were carried out on two real datasets. The results showed that the proposed method could effectively improve the accuracy of recommendation and alleviate the sparse data problem.

Key words: recommendation model, uncertain variable, uncertain set, online review, sentiment analysis

中图分类号: 

  • TP391

图1

情感倾向分析"

表1

六个等级的程度副词举例"

等级1 等级2 等级3 等级4 等级5 等级6
不得了 着实 更加 点点滴滴 半点 不为过
极度 太甚 还要 好生 不堪 开外
不胜 愈加 稍稍 轻度 出头
莫大 特别 这样 有点儿 相对
入骨 多多 越是 相当 不怎么 过分

表2

情感倾向词举例"

负向情感词 正向情感词
油腻、虚假、讨厌、脆弱、陈旧 接受、暖心、认可、新鲜、最爱

表3

数据集的参数"

记录数 用户数 项目数 稀疏度/%
餐馆数据集 560 113 9 171 2 903 97.90
酒店数据集 10 372 1 411 528 98.61
手机数据集 219 732 215 402 1 007 99.90

表4

两种推荐算法的准确率与F1"

邻居数量 USR-CF CF
Precision F1 Precision F1
5 0.020 48 0.021 17 0.015 26 0.010 05
15 0.018 02 0.020 52 0.013 45 0.008 84
25 0.016 88 0.018 82 0.012 31 0.007 87
35 0.014 91 0.016 91 0.010 50 0.006 69
45 0.013 74 0.015 07 0.010 66 0.007 11
55 0.011 06 0.013 59 0.009 68 0.006 21

表5

两种算法的Precision和F1"

推荐列表 USR-CF FC-Means
Precision F1 Precision F1
5 0.018 19 0.017 78 0.017 40 0.015 66
10 0.018 02 0.020 52 0.018 00 0.017 36
餐馆数据集 15 0.020 70 0.023 83 0.019 04 0.018 63
20 0.021 94 0.024 78 0.020 01 0.019 97
25 0.022 30 0.025 61 0.020 69 0.021 31
5 0.004 81 0.004 87 0.003 81 0.003 97
10 0.004 80 0.005 67 0.003 84 0.004 31
手机数据集 15 0.004 80 0.005 75 0.003 88 0.004 40
20 0.004 82 0.005 33 0.003 90 0.004 40
25 0.004 75 0.005 06 0.003 90 0.004 47

图2

推荐结果的准确率"

图3

F1"

1 RESNICK P , VARIAN H R . Recommender systems[J]. Commun ACM, 1997, 40 (3): 56- 58.
2 DE CAMPOS L M , FERNÁNDEZ-LUNA J M , HUETE J F . A collaborative recommender system based on probabilistic inference from fuzzy observations[J]. Fuzzy Sets and Systems, 2008, 159 (12): 1554- 1576.
doi: 10.1016/j.fss.2008.01.016
3 MURTHI B P S , SARKAR S . The Role of the management sciences in research on personalization[J]. Management Science, 2003, 49 (10): 1344- 1362.
doi: 10.1287/mnsc.49.10.1344.17313
4 WANG J H, LIU T W. Improving sentiment rating of movie review comments for recommendation[C]//Consumer Electronics-Taiwan (ICCE-TW), 2017 IEEE International Conference on. Taipei, China: IEEE, 2017: 433-434.
5 CHEN L , CHEN G , WANG F . Recommender systems based on user reviews: the state of the art[J]. User Modeling and User-Adapted Interaction, 2015, 25 (2): 99- 154.
doi: 10.1007/s11257-015-9155-5
6 QIU L , GAO S , CHENG W , et al. Aspect-based latent factor model by integrating ratings and reviews for recommender system[J]. Knowledge-Based Systems, 2016, 110, 233- 243.
doi: 10.1016/j.knosys.2016.07.033
7 BAO Y, FANG H, ZHANG J. TopicMF: simultaneously exploiting ratings and reviews for recommendation[C]//Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. Québec, Canada: AAAI, 2014, 14: 2-8.
8 ZHAI C, PENG J. Mining latent features from reviews and ratings for item recommendation[C]//Computational Science and Computational Intelligence (CSCI), 2016 International Conference on. Las Vegas, USA: IEEE, 2016: 1119-1125.
9 ZHANG Y, LAI G, ZHANG M, et al. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis[C]//Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. Gold Coast, Queensland, Australia: ACM, 2014: 83-92.
10 KOOHI H , KIANI K . User based collaborative filtering using fuzzy C-means[J]. Measurement, 2016, 91, 134- 139.
doi: 10.1016/j.measurement.2016.05.058
11 LIU B . Uncertainty theory[M]. Berlin, Germany: Springer, 2007.
12 LIU B . Uncertainty theory-a branch of mathematics for modeling human uncertainty[M]. Berlin, Germany: Springer, 2010.
13 DONG Zhendong , DONG Qiang . Hownet and the computation of meaning (with CD-ROM)[M]. Singapore: World Scientific Publishing Company, 2006.
14 TURNEY P D. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Philadelphia, USA: Association for Computational Linguistics, 2002: 417-424.
15 LIU B . Uncertain logic for modeling human language[J]. Journal of Uncertain Systems, 2011, 5 (1): 3- 20.
16 WANG X , GAO Z , GUO H . Delphi method for estimating uncertainty distributions[J]. International Journal on Information, 2012, 15 (2): 449- 459.
17 SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. Hong Kong, China: ACM, 2001: 285-295.
[1] 周荣翔,贾修一. 中文反语识别特征分析[J]. 山东大学学报 (工学版), 2019, 49(1): 41-46.
[2] 沈冀,马志强,李图雅,张力. 面向短文本情感分析的词扩充LDA模型[J]. 山东大学学报(工学版), 2018, 48(3): 120-126.
[3] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23.
[4] 周咏梅1,阳爱民1,林江豪2. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3): 36-40.
[5] 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[2] 夏 斌,张连俊 . DS-CDMA UWB系统中基于能量比较的TOA估计算法[J]. 山东大学学报(工学版), 2007, 37(1): 70 -73 .
[3] 卜德云 张道强. 自适应谱聚类算法研究[J]. 山东大学学报(工学版), 2009, 39(5): 22 -26 .
[4] 赵科军 王新军 刘洋 仇一泓. 基于结构化覆盖网的连续 top-k 联接查询算法[J]. 山东大学学报(工学版), 2009, 39(5): 32 -37 .
[5] 丁万涛 李术才 张庆松. TSP预报倾斜岩层分界面误差规律性探讨[J]. 山东大学学报(工学版), 2009, 39(4): 57 -60 .
[6] 王佰伟,曹升乐 . 工业废水治理效果多目标评价方法研究[J]. 山东大学学报(工学版), 2007, 37(3): 89 -92 .
[7] 丑武胜 王朔. 大刚度环境下力反馈主手自适应算法研究[J]. 山东大学学报(工学版), 2010, 40(1): 1 -5 .
[8] 张辉 王孟夏 韩学山. 电力系统的超前热定值及其应用探讨[J]. 山东大学学报(工学版), 2008, 38(6): 25 -29 .
[9] 闫崇京 廖文和 郭宇 程筱胜. 基于多色图的BOM建模[J]. 山东大学学报(工学版), 2008, 38(6): 70 -75 .
[10] 王建平,王淑华,耿贵立 . InN半导体纳米晶相变活化能的研究[J]. 山东大学学报(工学版), 2008, 38(2): 42 -44 .