文章快速检索 高级检索
 山东大学学报(工学版)  2016, Vol. 46 Issue (5): 13-20  DOI: 10.6040/j.issn.1672-3961.1.2016.165 0

### 引用本文

LIN Yaojin, ZHANG Jia, LIN Menglei, WANG Juan. A method of collaborative filtering recommendation based on fuzzy information entropy[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 13-20. DOI: 10.6040/j.issn.1672-3961.1.2016.165.

### 文章历史

A method of collaborative filtering recommendation based on fuzzy information entropy
LIN Yaojin, ZHANG Jia, LIN Menglei, WANG Juan
School of Computer Science, Minnan Normal University, Zhangzhou 363000, Fujian, China
Abstract: The performance of collaborative filtering was restricted by the sparsity of rating data. To solve this problem, a novel similarity measure based on fuzzy mutual information was proposed. First, the definition of user fuzzy information entropy was given to reflect the uncertainty degree of rating preference. Then, the fuzzy mutual information between users was introduced to measure the similarity degree between users. Finally, the fuzzy information entropy based on similarity measure method was designed to calculate the similarity between users by considering not only the fuzzy mutual information between users but also user fuzzy information entropy. Experimental results on two benchmark data sets showed that the fuzzy information entropy based similarity measure method could reduce the influence of the data sparsity, and the recommendation performance of systems had significant improvements.
Key words: collaborative filtering    data sparsity    fuzzy information entropy    fuzzy mutual information    similarity
0 引言

1 预备知识 1.1 协同过滤推荐方法

 $sim{{\left( a,b \right)}^{COS}}=\frac{\sum\limits_{i\in {{I}_{ab}}}{~{{r}_{ai}}}\times {{r}_{bi}}}{\sqrt{\sum\limits_{i\in {{I}_{ab}}}{r_{ai}^{2}}}\times \sqrt{\sum\limits_{i\in {{I}_{ab}}}{r_{bi}^{2}}}},$ (1)
 $sim{{\left( a,b \right)}^{PCC}}=\frac{\sum\limits_{i\in {{I}_{ab}}}{({{r}_{ai}}-\overline{{{r}_{a}}})}\times ({{r}_{bi}}-\overline{{{r}_{b}}})}{\sum\limits_{i\in {{I}_{ab}}}{{{({{r}_{ai}}-\overline{{{r}_{a}}})}^{2}}}\times \sum\limits_{i\in {{I}_{ab}}}{{{({{r}_{bi}}-\overline{{{r}_{b}}})}^{2}}}}$ (2)

 ${{P}_{ti}}=\overline{{{r}_{t}}}+\frac{\sum\limits_{g=1}^{k}{sim}\left( t,g \right)\times ({{r}_{gi}}-\overline{{{r}_{g}}})}{\sum\limits_{g=1}^{k}{\left| sim\left( t,g \right) \right|}},$ (3)

1.2 模糊信息理论

 ${{M}_{F}}=\left[ \begin{matrix} {{f}_{11}} & {{f}_{12}} & \cdots & {{f}_{1m}} \\ {{f}_{21}} & {{f}_{22}} & \cdots & {{f}_{2m}} \\ \vdots & \vdots & \ddots & \vdots \\ {{f}_{m1}} & {{f}_{m2}} & \cdots & {{f}_{mm}} \\ \end{matrix} \right]~,$ (4)

 $~FH({{M}_{F}})=-\frac{1}{m}\sum\limits_{i=1}^{m}{log}\frac{\left\| {{\left[ {{a}_{i}} \right]}_{F}} \right\|}{m},$ (5)

 $FH({{M}_{{{F}_{1}}}},{{M}_{{{F}_{2}}}})=-\frac{1}{m}\sum\limits_{i=1}^{m}{log}\frac{\left\| {{[{{a}_{i}}]}_{{{F}_{1}}}}\cap {{[{{a}_{i}}]}_{{{F}_{2}}}} \right\|}{m}$ (6)

 $FMI(M{{F}_{1}},{{M}_{{{F}_{2}}}})=\text{ }FH\left( {{M}_{{{F}_{1}}}} \right)+FH\left( {{M}_{{{F}_{2}}}} \right)-FH({{M}_{{{F}_{1}}}},{{M}_{{{F}_{2}}}})$ (7)
2 基于模糊信息熵的相似性度量方法

2.1 单个用户的模糊信息熵

 ${{M}_{u}}=\left( \begin{matrix} {{s}_{11}} & {{s}_{12}} & \cdots & {{s}_{1n}} \\ {{s}_{21}} & {{s}_{22}} & \cdots & {{s}_{2n}}~ \\ \vdots & \vdots & \ddots & \vdots \\ {{s}_{n1}} & {{s}_{n2}} & \cdots & {{s}_{nn}} \\ \end{matrix} \right),$ (8)

 ${{s}_{xy}}=\left\{ \begin{matrix} exp\left( -\frac{1}{2}\times |{{r}_{ux}}-{{r}_{uy}}| \right), & {{r}_{ux}}-{{r}_{uy}}＜{{r}_{med}}; \\ 0, & otherwise \\ \end{matrix} \right.$ (9)

 $FH({{M}_{u}})=-\frac{1}{n}\sum\limits_{x=1}^{n}{log}\frac{\left\| {{\left[ {{i}_{x}} \right]}_{u}} \right\|}{n},$ (10)

 ${M_a} = \left( {\matrix{ 1 & 0 & {0.6065} & {0.3679} & {0.6065} \cr 0 & 1 & 0 & 0 & 0 \cr {0.6065} & 0 & 1 & {0.6065} & 1 \cr {0.3679} & 0 & {0.6065} & 1 & {0.6065} \cr {0.6065} & 0 & 1 & {0.6065} & 1 \cr } } \right),$ (11)
 ${M_b} = \left( {\matrix{ 1 & {0.3679} & 0 & 0 & {0.6065} \cr {0.3679} & 1 & 0 & {0.6065} & 0 \cr 0 & 0 & 1 & 0 & 0 \cr 0 & {0.6065} & 0 & 1 & 0 \cr {0.6065} & 0 & 0 & 0 & 1 \cr } } \right)$ (12)

 $FH({{M}_{a}})=-\frac{1}{5}\left( log\frac{2.639\text{ }4}{5}+log\frac{1}{5}+log\frac{2.213}{5}+log\frac{1.580\text{ }9}{5}+log\frac{3.213}{5} \right)=1.343\text{ }8,$ (13)
 $FH({{M}_{b}})=-\frac{1}{5}\left( log\frac{1.974\text{ }4}{5}+log\frac{1.974\text{ }4}{5}+log\frac{1}{5}+log\frac{1.606\text{ }5}{5}+log\frac{1.606\text{ }5}{5} \right)=1.655\text{ }8$ (14)
2.2 用户之间的模糊互信息

 $FH({{M}_{a}},{{M}_{b}})=-\frac{1}{n}\sum\limits_{x=1}^{n}{log}\frac{\left\| {{\left[ {{i}_{x}} \right]}_{a}}\cap {{\left[ {{i}_{x}} \right]}_{b}} \right\|}{n},$ (15)

 $FMI({{M}_{a}},{{M}_{b}})=FH({{M}_{a}})+FH({{M}_{b}})-FH({{M}_{a}},{{M}_{b}})$ (16)

 $${M_a} \cap {M_b} = \left( {\matrix{ 1 & 0 & 0 & 0 & {0.6065} \cr 0 & 1 & 0 & 0 & 0 \cr 0 & 0 & 1 & 0 & 0 \cr 0 & 0 & 0 & 1 & 0 \cr {0.6065} & 0 & 0 & 0 & 1 \cr } } \right),$$ (17)

 $FMI({{M}_{a}},{{M}_{b}})=-\frac{1}{5}\left( log\frac{1.606\text{ }5}{5}+log\frac{1}{5}+log\frac{1}{5}+log\frac{1}{5}+log\frac{1.606\text{ }5}{5} \right)=2.048\text{ }4$ (18)

 $FMI({{M}_{a}},{{M}_{b}})=FH({{M}_{a}})+FH({{M}_{b}})-FH({{M}_{a}},{{M}_{b}})=0.951\text{ }2$ (19)
2.3 推荐算法及时间复杂度

 $SU({{M}_{a}},{{M}_{b}})=\frac{2FMI({{M}_{a}},{{M}_{b}})}{FH({{M}_{a}})+FH({{M}_{b}})}$ (20)

 $sim{{\left( a,b \right)}^{FIE}}=\frac{2FMI({{M}_{a}},{{M}_{b}})\times exp(-|FH({{M}_{a}})-FH({{M}_{b}})|)}{FH({{M}_{a}})+FH({{M}_{b}})},$ (21)

 ${{P}_{ti}}=\overline{{{r}_{t}}}+\frac{\sum\limits_{g=1}^{k}{si{{m}^{FIE}}}\left( t,g \right)\times ({{r}_{gi}}-\overline{{{r}_{g}}})}{\sum\limits_{g=1}^{k}{|si{{m}^{FIE}}\left( t,g \right)|}},$ (22)

3 试验结果及分析 3.1 数据集

3.2 评价指标[3]

MAE通过计算预测评分和对应实际评分之间的偏差来评价预测结果的准确性。 MAE越小,算法预测质量越好。 Coverage通过计算获得预测评分的项目占所有项目的比例以评价预测结果的全面性。 Coverage越大,预测质量越好。

 $MAE=\frac{\sum\limits_{i=1}^{n}{\left| {{p}_{i}}-{{r}_{i}} \right|}}{n},$ (23)
 $Coverage=\frac{h}{n}$ (24)

 $Precision=\frac{1}{\left| U \right|}\sum\limits_{u\in U}{\frac{|\left\{ i\in {{Z}_{u}} \right.|{{r}_{ui}}>\theta \cap {{p}_{ui}}>\left. \theta \right\}|}{N}},$ (25)
 $Recall=\frac{1}{\left| U \right|}\sum\limits_{u\in U}{\frac{|\left\{ i\in {{Z}_{u}} \right.|{{r}_{ui}}>\theta \cap {{p}_{ui}}>\left. \theta \right\}|}{\left| {{T}_{u}} \right|}}$ (26)
3.3 试验结果

 图 1 基于ML数据集的比较 Figure 1 The comparison based on ML data set
 图 2 基于HRML数据集的比较 Figure 2 The comparison based on HRML data set
4 总结

 [1] ADOMAVICIUS G, TUZHILIN A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions[J]. IEEE Transactions on Knowledge and Data Engineering , 2005, 17 (6) : 734-749 DOI:10.1109/TKDE.2005.99 [2] PAPAGELIS M, PLEXOUSAKIS D, KUTSURAS T. Alleviating the sparsity problem of collaborative filtering using trust inferences[C]//Proc of the 3rd International Conference on Trust Management. Berlin, Germany: Springer, 2005:224-239. http://cn.bing.com/academic/profile?id=1589148151&encoded=0&v=paper_preview&mkt=zh-cn [3] SHI Yue, LARSON M, HANJALIC A. Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges[J]. ACM Computing Surveys , 2014, 47 (1) : 3 [4] RESNICK P, IACOVOU N, SUCHAK M, et al. GroupLens: an open architecture for collaborative filtering of netnews[C]//Proc of the ACM Conference on Computer Supported Cooperative Work. New York, USA: ACM, 1994:175-186. http://cn.bing.com/academic/profile?id=2155106456&encoded=0&v=paper_preview&mkt=zh-cn [5] BREESE J S, HECKERMAN D, KADIE C. Empirical analysis of predictive algorithms for collaborative filtering[C]//Proc of the 14th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers Inc, 1998:43-52. http://cn.bing.com/academic/profile?id=2110325612&encoded=0&v=paper_preview&mkt=zh-cn [6] 吴湖, 王永吉, 王哲, 等. 两阶段联合聚类协同过滤算法[J]. 软件学报 , 2010, 21 (5) : 1042-1054 WU Hu, WANG Yongji, WANG Zhe, et al. Two-phase collaborative filtering algorithm based on co-clustering[J]. Journal of Software , 2010, 21 (5) : 1042-1054 DOI:10.3724/SP.J.1001.2010.03758 [7] 杨兴耀, 于炯, 吐尔根·伊布拉音, 等. 融合奇异性和扩展过程的协同过滤模型[J]. 软件学报 , 2013, 24 (8) : 1868-1884 YANG Xingyao, YU Jiong, IBRAHIM Turgun, et al. Collaborative filtering model fusing singularity and diffusion process[J]. Journal of Software , 2013, 24 (8) : 1868-1884 [8] 林耀进, 胡学钢, 李慧宗. 基于用户群体影响的协同过滤推荐算法[J]. 情报学报 , 2013, 32 (3) : 299-305 LIN Yaojin, HU Xuegang, LI Huizong. Collaborative filtering recommendation algorithm based on user group influence[J]. Journal of the China Society for Scientific and Technical Information , 2013, 32 (3) : 299-305 [9] 张佳, 林耀进, 林梦雷, 等. 基于目标用户近邻修正的协同过滤算法[J]. 模式识别与人工智能 , 2015, 28 (9) : 802-810 ZHANG Jia, LIN Yaojin, LIN Menglei, et al. Target users neighbors modification based collaborative filtering[J]. Pattern Recognition and Artificial Intelligence , 2015, 28 (9) : 802-810 [10] SARWAR B, KARPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proc of the 10th International Conference on World Wide Web. New York, USA: ACM, 2001: 285-295. [11] KALELI C. An entropy-based neighbor selection approach for collaborative filtering[J]. Knowledge-Based Systems , 2014, 56 : 273-280 DOI:10.1016/j.knosys.2013.11.020 [12] JAMALI M, ESTER M. TrustWalker: a random walk model for combining trust-based and item-based recommendation[C]//Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2009: 397-406. http://cn.bing.com/academic/profile?id=2084527756&encoded=0&v=paper_preview&mkt=zh-cn [13] ZHANG J, LIN Y, LIN M, et al. An effective collaborative filtering algorithm based on user preference clustering[J]. Applied Intelligence , 2016, 45 (2) : 230-240 DOI:10.1007/s10489-015-0756-9 [14] 黄创光, 印鉴, 汪静, 等. 不确定近邻的协同过滤推荐算法[J]. 计算机学报 , 2010, 33 (8) : 1369-1377 HUANG Chuangguang, YIN Jian, WANG Jing, et al. Uncertain neighbors' collaborative filtering recommendation algorithm[J]. Chinese Journal of Computers , 2010, 33 (8) : 1369-1377 DOI:10.3724/SP.J.1016.2010.01369 [15] 张佳, 林耀进, 林梦雷, 等. 基于信息熵的协同过滤算法[J]. 山东大学学报(工学版) , 2016, 46 (2) : 43-50 ZHANG Jia, LIN Yaojin, LIN Menglei, et al. Entropy-based collaborative filtering algorithm[J]. Journal of Shandong University(Engineering Science) , 2016, 46 (2) : 43-50 [16] BOBADILLA J, HERNANDO A, ORTEGA F, et al. Collaborative filtering based on significances[J]. Information Sciences , 2012, 185 (1) : 1-17 DOI:10.1016/j.ins.2011.09.014