您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 26-31.doi: 10.6040/j.issn.1672-3961.1.2014.116

• 机器学习与数据挖掘 • 上一篇    下一篇

基于LDA主题模型的社会网络链接预测

卢文羊, 徐佳一, 杨育彬   

  1. 南京大学计算机软件新技术国家重点实验室, 江苏 南京 210023
  • 收稿日期:2014-03-31 修回日期:2014-11-14 出版日期:2014-12-20 发布日期:2014-03-31
  • 通讯作者: 杨育彬(1977-),男,江西赣州人,教授,博士(后),主要研究方向为数字媒体理解与智能处理技术及其应用,基于云计算的海量数据挖掘算法及应用系统,社会网络分析及其可视化.E-mail:yangyubin@nju.edu.cn E-mail:yangyubin@nju.edu.cn
  • 作者简介:卢文羊(1992-),男,江苏宿迁人,硕士研究生,主要研究方向为数据挖掘与社会网络分析.E-mail:luwy007@gmail.com
  • 基金资助:
    教育部新世纪优秀人才计划资助项目(NCET-11-0213);国家自然科学基金资助项目(61273257,61035003, 61021062);江苏省六大人才高峰计划资助项目(2013-XXRJ-018)

LDA-based link prediction in social network

LU Wenyang, XU Jiayi, YANG Yubin   

  1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, Jiangsu, China
  • Received:2014-03-31 Revised:2014-11-14 Online:2014-12-20 Published:2014-03-31

摘要: 针对传统社会网络链接预测方法忽视节点文本内容的问题,提出一种基于潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)主题模型的协作演化链接预测算法。算法利用LDA模型,对节点的文本内容进行分析,提取出每个节点的主题分布向量,利用分布向量的点积来衡量节点文本的相似性;然后将节点文本内容相似性矩阵与节点邻接矩阵相加,在此基础上计算节点之间的相似性;最后选取相似性最高的k个节点作为预测结果。实验结果表明该算法在网络图稀疏的情况下有较好的效果。

关键词: 链接预测, 网络演化, 主题模型, 潜在狄利克雷分配, 社会网络

Abstract: To address the problem of ignoring the text contents of nodes in social network link prediction methods, a Latent Dirichlet Allocation(LDA)-based collaborative evolutionary link prediction algorithm was proposed. The algorithm used LDA model to analyze the text content and abstracted a topic distribution vector for each node; The product of the topic distribution vectors was adopted to measure the similarity between the nodes' contents; Afterwards, the content similarity matrix was added to the adjacency matrix and the similarities between the nodes were computed consequently; At last, k most similar nodes were selected as the prediction result. The experimental results showed that the proposed algorithm achieved good prediction performance in sparse networks.

Key words: network evolution, social network, link prediction, topic model, Latent Dirichlet Allocation

中图分类号: 

  • TP301
[1] NEWMAN M E J. Clustering and preferential attachment in growing networks[J].Physical Review E, 2001, 64(2):251021-251024.
[2] CARMI S, HAVLIN S, KIRKPATRICK S, et al. A model of Internet topology using k-shell decomposition[J]. Proceedings of the National Academy of Sciences, 2007, 104(27):11150-11154.
[3] MURATA T, MORIYASU S. Link prediction of social networks based on weighted proximity measures[C]//Web Intelligence, IEEE/WIC/ACM International Conference on. Fremont:IEEE, 2007:85-88.
[4] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J].Journal of Machine Learning Research, 2003, 3:993-1022.
[5] SRENSEN T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons[J]. Biol Skr, 1948, 5:1-34.
[6] LEICHT E A, HOLME P, NEWMAN M E J. Vertex similarity in networks[J]. Physical Review E, 2006, 73(2):26120.
[7] CHOWDHURY G. Introduction to modern information retrieval[M]. London:Facet publishing, 2010.
[8] ADAMIC L A, ADAR E. Friends and neighbors on the web[J]. Social Networks, 2003, 25(3):211-230.
[9] LU L, ZHOU T. Link prediction in complex networks:a survey[J].Physica A:Statistical Mechanics and its Applications, 2011, 390(6):1150-1170.
[10] CHAKRABARTI S, DOM B, INDYK P. Enhanced hypertext categorization using hyperlinks[C]//Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, Washington:ACM, 1998, 27(2):307-318.
[11] ZHANG T, PROPESCUL A, DOM B. Linear prediction models with graph regularization for web-page categorization[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, USA:ACM, 2006:821-826.
[12] CARVALHO V R, COHEN W W. On the collective classification of email speech acts[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil:ACM, 2005:345-352.
[13] HEINRICH G. Parameter estimation for text analysis[R]. Darmstadt, Germany:Fraunhofer IGD, 2005.
[14] HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil:ACM, 1999:50-57.
[15] KUMAR R, NOVAK J, TOMKINS A. Structure and evolution of online social networks[M].New York:Springer, 2010:337-357.
[16] GUHA S, MEYERSON A, MISHRAN, et al. Clustering data streams:Theory and practice[J]. Knowledge and Data Engineering, 2003, 15(3):515-528.
[1] 闫盈盈,黄瑞章,王瑞,马灿,刘博伟,黄庭. 一种长文本辅助短文本的文本理解方法[J]. 山东大学学报(工学版), 2018, 48(3): 67-74.
[2] 韩忠明, 吴杨, 谭旭升, 刘雯, 杨伟杰. 社会网络结构洞节点度量指标比较与分析[J]. 山东大学学报(工学版), 2015, 45(1): 1-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!