您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (5): 7-12.doi: 10.6040/j.issn.1672-3961.1.2016.257

• • 上一篇    下一篇

基于HITS算法的微博用户可信度评估

吴树芳1,2,徐建民3*   

  1. 1. 天津大学管理与经济学部, 天津 300072;2.河北大学管理学院, 河北 保定 071000;3. 河北大学计算机科学与技术学院, 河北 保定 071000
  • 收稿日期:2016-03-31 出版日期:2016-10-20 发布日期:2016-03-31
  • 通讯作者: 徐建民(1966— ),男,河北邯郸人,教授,博士,主要研究方向为信息检索与不确定信息处理.E-mail: hbuxjm@hbu.edu.cn E-mail:shufang_44@126.com
  • 作者简介:吴树芳(1979— ),女,河北邯郸人,副教授,博士,主要研究方向为信息检索与不确定信息处理.E-mail: shufang_44@126.com
  • 基金资助:
    河北省社会科学基金资助项目(HB15TQ013)

Evaluation of microblog users' credibility based on HITS algorithm

WU Shufang1,2, XU Jianmin3 *   

  1. 1. College of Management and Economics, Tianjin University, Tianjin 300072, China;
    2. College of Management, Hebei University, Baoding 071000, Hebei, China;
    3.School of Computer Science and Technology, Hebei University, Baoding 071000, Hebei, China
  • Received:2016-03-31 Online:2016-10-20 Published:2016-03-31

摘要: 以新浪微博为研究平台,在HITS(hyperlink-induced topic search)算法的基础上,提出融合用户交互行为和博文内容的微博用户可信度评估算法。分别构建基于交互行为和基于博文内容的微博用户有向链接图,图中节点表示用户,有向边体现用户基于交互行为或基于内容的指向关系;依据HITS算法计算两种拓扑结构下微博用户的权威度和中心度;以融合的权威度作为度量评估用户可信度。试验采用从新浪微博采集的数据作为测试集合,通过反复训练法获得可信度阈值,绘制不同可信度算法的用户可信度曲线,验证了算法的可行性和有效性。

关键词: HITS算法, 微博用户, 交互行为, 博文, 可信度

Abstract: Based on Sina-Microblog and HITS(hyperlink-induced topic search)algorithm, a new user's credibility algorithm that merged user interactions and blog contents was putted forward. The new algorithm firstly constructed two directed connection graphs based on user interactions and blog contents respectively, where nodes represented users and arcs embodied the direction relationship between users. Authority and hub of these two connected graphs was computed. The fusion authority was adopted as measurement to evaluate user's credibility. The data collected from Sina-Microblog as test set was used to conduct experiments. Threshold of credibility was obtained by repeated training, and then credibility curves of different algorithms were drawn to verify the feasibility and effectiveness of the new algorithm.

Key words: HITS algorithm, interaction, credibility, blog, microblog users

中图分类号: 

  • TP391
[1] SONG J, LEE S, KIM J. Spam filtering in Twitter using sender-receiver relationship[M]. Berlin, German:Springer, 2006:301-317.
[2] 王越, 张剑金, 刘芳芳. 一种多特征微博僵尸粉检测方法与实现[J]. 中国科技论文, 2014, 9(1):81-86. WANG Yue, ZHANG Jianjin, LIU Fangfang. Detection of micro-blog zombie fans based on multi-features[J]. China Science Paper, 2014, 9(1):81-86.
[3] 刘晓飞. 基于链接分析的微博用户可信度研究[D]. 兰州:兰州交通大学, 2015. LIU Xiaofei. Research on credibility of microblog users based on link analysis[D]. Lanzhou:Lanzhou Jiaotong University, 2015.
[4] 蒋盛益, 陈东沂, 庞观松,等. 微博信息可信度分析研究综述[J]. 图书情报工作, 2013, 57(12):136-142. JIANG Shengyi, CHEN Dongyi, PANG Guansong, et al. A review of micro-blog information reliability analysis[J]. Library and Information Service, 2013, 57(12):136-142.
[5] 毛佳昕, 刘奕群, 张敏,等. 基于用户交互行为的微博用户社会影响力分析[J]. 计算机学报, 2014, 37(4):791-880. MAO Jiaxin, LIU Yiqun, ZHANG Min, et al. Social influence analysis for micro-blog user based on user behavior[J]. Chinese Journal of Computers, 2014, 37(4):791-880.
[6] Wikipedia Inc. Credibility[EB/OL].(2013-01-20)[2015-01-20].http://en.wikipedia.org/wiki/Credibility.
[7] CASTILLO C, MENDOZA M, POBLTETE B. Information credibility on Twitter [C] //Proceedings of Information International Conference on World Wide Web. New York, USA: ACM Press, 2011:675-684.
[8] 闫光辉, 刘晓飞, 王梦阳. 基于链接的微博用户可信度研究[J].计算机应用研究, 2015, 32(10):2910-2917. YAN Guanghui, LIU Xiaofei, WANG Mengyang. Research on credibility of microblog users based on link[J]. Application Research of Computers, 2015, 32(10):2910-2917.
[9] GUPTA M, ZHAO P, ZHAO J. Evaluation event credibility on Twitter[C] //Proceedings of the 2012 SIAM International Conference on Data Mining. California, USA: SIAM Press, 2012:153-164.
[10] MUKHERJEE A, LIU B, GLANCE N. Spotting fake reviewer groups in consumer reviewer[C] //Proceedings of the 21st International Conference on World Wide Web. New York, USA: ACM Press, 2012:191-200.
[11] CHU Z, GIANVECCHIO S, WANG H, et al. Detecting automation of twitter accounts: are you a human, bot, or cyborg?[J]. IEEE Transactions on Dependable and Secure Computing, 2012, 9(6):811-824.
[12] 徐建民, 粟武林, 吴树芳,等. 基于逻辑回归的微博用户可信度建模[J].计算机工程与设计,2015, 36(3):772-777. XU Jianmin, SU Wulin, WU Shufang, et al. Modeling user reliability based on logistic regression in Micro-blog[J]. Computer Engineerinlg and Design, 2015, 36(3):772-777.
[13] 苗家, 马军, 陈竹敏. 一种基于HITS算法的Blog文摘方法[J].中文信息学报, 2011, 25(1):104-109. MIAO Jia, MA Jun, CHEN Zhumin. A new HITS-based summarization approach for Blog [J]. Journal of Chinese Information Processing, 2011, 25(1):104-109.
[14] 周小平,梁循,张海燕. 基于R-C模型的微博用户社区发现[J]. 软件学报,2014,25(12):2808-2823. ZHOU Xiaoping, LIANG Xun, ZHANG Haiyan. User community detection Micro-blog using R-C model[J]. Journal of Software, 2014, 25(12):2808-2823.
[15] KLEINBERG J M. Authoritative sources in a hyperlinked environment[C] //Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms. New York, USA: ACM Press, 1998:668-677.
[16] 田中生. 基于影响力的社会网络关键用户识别方法研究 [D]. 长春:吉林大学, 2015. TIAN Zhongsheng. Research on key user identification method based on influence in social networks[D]. Changchun:Jinlin University, 2014.
[17] 李赫元, 俞晓明, 刘悦,等. 中文微博客的垃圾用户检测[J]. 中文信息学报, 2014, 28(3):62-67. LI Heyuan, YU Xiaoming, LIU Yue, et al. Research on detecting spammer in Micro-blogs[J]. Journal of Chinese Information Processing, 2014, 28(3):62-67.
[18] 王峰, 余伟, 李石君. 新浪微博平台上的用户可信度评估[J].计算机科学与探索, 2013, 7(12):1125-1134. WANG Feng, YU Wei, LI Shijun. Evaluation of user credibility based on Sina weibo platform[J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(12):1125-1134.
[1] 张欣怡,翟玉庆*. 基于证据理论的信任模型中冲突证据[J]. 山东大学学报(工学版), 2013, 43(1): 48-53.
[2] 郭剑毅1,2,雷春雅1,余正涛1,2,苏磊1,2,赵君1,田维1. 基于信息熵的半监督领域实体关系抽取研究[J]. 山东大学学报(工学版), 2011, 41(4): 7-12.
[3] 曹欣 孙新利 李振. 改进灰自助法及其在可靠性评定中的应用[J]. 山东大学学报(工学版), 2010, 40(1): 144-148.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王素玉,艾兴,赵军,李作丽,刘增文 . 高速立铣3Cr2Mo模具钢切削力建模及预测[J]. 山东大学学报(工学版), 2006, 36(1): 1 -5 .
[2] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[3] 施来顺,万忠义 . 新型甜菜碱型沥青乳化剂的合成与性能测试[J]. 山东大学学报(工学版), 2008, 38(4): 112 -115 .
[4] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[5] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[6] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[7] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[8] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[9] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[10] 季涛,高旭,孙同景,薛永端,徐丙垠 . 铁路10 kV自闭/贯通线路故障行波特征分析[J]. 山东大学学报(工学版), 2006, 36(2): 111 -116 .