山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 26-31.doi: 10.6040/j.issn.1672-3961.1.2014.116
卢文羊, 徐佳一, 杨育彬
LU Wenyang, XU Jiayi, YANG Yubin
摘要: 针对传统社会网络链接预测方法忽视节点文本内容的问题,提出一种基于潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)主题模型的协作演化链接预测算法。算法利用LDA模型,对节点的文本内容进行分析,提取出每个节点的主题分布向量,利用分布向量的点积来衡量节点文本的相似性;然后将节点文本内容相似性矩阵与节点邻接矩阵相加,在此基础上计算节点之间的相似性;最后选取相似性最高的k个节点作为预测结果。实验结果表明该算法在网络图稀疏的情况下有较好的效果。
中图分类号:
[1] NEWMAN M E J. Clustering and preferential attachment in growing networks[J].Physical Review E, 2001, 64(2):251021-251024. [2] CARMI S, HAVLIN S, KIRKPATRICK S, et al. A model of Internet topology using k-shell decomposition[J]. Proceedings of the National Academy of Sciences, 2007, 104(27):11150-11154. [3] MURATA T, MORIYASU S. Link prediction of social networks based on weighted proximity measures[C]//Web Intelligence, IEEE/WIC/ACM International Conference on. Fremont:IEEE, 2007:85-88. [4] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J].Journal of Machine Learning Research, 2003, 3:993-1022. [5] SRENSEN T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons[J]. Biol Skr, 1948, 5:1-34. [6] LEICHT E A, HOLME P, NEWMAN M E J. Vertex similarity in networks[J]. Physical Review E, 2006, 73(2):26120. [7] CHOWDHURY G. Introduction to modern information retrieval[M]. London:Facet publishing, 2010. [8] ADAMIC L A, ADAR E. Friends and neighbors on the web[J]. Social Networks, 2003, 25(3):211-230. [9] LU L, ZHOU T. Link prediction in complex networks:a survey[J].Physica A:Statistical Mechanics and its Applications, 2011, 390(6):1150-1170. [10] CHAKRABARTI S, DOM B, INDYK P. Enhanced hypertext categorization using hyperlinks[C]//Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, Washington:ACM, 1998, 27(2):307-318. [11] ZHANG T, PROPESCUL A, DOM B. Linear prediction models with graph regularization for web-page categorization[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, USA:ACM, 2006:821-826. [12] CARVALHO V R, COHEN W W. On the collective classification of email speech acts[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil:ACM, 2005:345-352. [13] HEINRICH G. Parameter estimation for text analysis[R]. Darmstadt, Germany:Fraunhofer IGD, 2005. [14] HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil:ACM, 1999:50-57. [15] KUMAR R, NOVAK J, TOMKINS A. Structure and evolution of online social networks[M].New York:Springer, 2010:337-357. [16] GUHA S, MEYERSON A, MISHRAN, et al. Clustering data streams:Theory and practice[J]. Knowledge and Data Engineering, 2003, 15(3):515-528. |
[1] | 闫盈盈,黄瑞章,王瑞,马灿,刘博伟,黄庭. 一种长文本辅助短文本的文本理解方法[J]. 山东大学学报(工学版), 2018, 48(3): 67-74. |
[2] | 韩忠明, 吴杨, 谭旭升, 刘雯, 杨伟杰. 社会网络结构洞节点度量指标比较与分析[J]. 山东大学学报(工学版), 2015, 45(1): 1-8. |
|