您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2016, Vol. 46 ›› Issue (5): 7-12.doi: 10.6040/j.issn.1672-3961.1.2016.257

• • 上一篇    下一篇

基于HITS算法的微博用户可信度评估

吴树芳1,2,徐建民3*   

  1. 1. 天津大学管理与经济学部, 天津 300072;2.河北大学管理学院, 河北 保定 071000;3. 河北大学计算机科学与技术学院, 河北 保定 071000
  • 收稿日期:2016-03-31 出版日期:2016-10-20 发布日期:2016-03-31
  • 通讯作者: 徐建民(1966— ),男,河北邯郸人,教授,博士,主要研究方向为信息检索与不确定信息处理.E-mail: hbuxjm@hbu.edu.cn E-mail:shufang_44@126.com
  • 作者简介:吴树芳(1979— ),女,河北邯郸人,副教授,博士,主要研究方向为信息检索与不确定信息处理.E-mail: shufang_44@126.com
  • 基金资助:
    河北省社会科学基金资助项目(HB15TQ013)

Evaluation of microblog users' credibility based on HITS algorithm

WU Shufang1,2, XU Jianmin3 *   

  1. 1. College of Management and Economics, Tianjin University, Tianjin 300072, China;
    2. College of Management, Hebei University, Baoding 071000, Hebei, China;
    3.School of Computer Science and Technology, Hebei University, Baoding 071000, Hebei, China
  • Received:2016-03-31 Online:2016-10-20 Published:2016-03-31

摘要: 以新浪微博为研究平台,在HITS(hyperlink-induced topic search)算法的基础上,提出融合用户交互行为和博文内容的微博用户可信度评估算法。分别构建基于交互行为和基于博文内容的微博用户有向链接图,图中节点表示用户,有向边体现用户基于交互行为或基于内容的指向关系;依据HITS算法计算两种拓扑结构下微博用户的权威度和中心度;以融合的权威度作为度量评估用户可信度。试验采用从新浪微博采集的数据作为测试集合,通过反复训练法获得可信度阈值,绘制不同可信度算法的用户可信度曲线,验证了算法的可行性和有效性。

关键词: HITS算法, 微博用户, 交互行为, 博文, 可信度

Abstract: Based on Sina-Microblog and HITS(hyperlink-induced topic search)algorithm, a new user's credibility algorithm that merged user interactions and blog contents was putted forward. The new algorithm firstly constructed two directed connection graphs based on user interactions and blog contents respectively, where nodes represented users and arcs embodied the direction relationship between users. Authority and hub of these two connected graphs was computed. The fusion authority was adopted as measurement to evaluate user's credibility. The data collected from Sina-Microblog as test set was used to conduct experiments. Threshold of credibility was obtained by repeated training, and then credibility curves of different algorithms were drawn to verify the feasibility and effectiveness of the new algorithm.

Key words: HITS algorithm, interaction, credibility, blog, microblog users

中图分类号: 

  • TP391
[1] SONG J, LEE S, KIM J. Spam filtering in Twitter using sender-receiver relationship[M]. Berlin, German:Springer, 2006:301-317.
[2] 王越, 张剑金, 刘芳芳. 一种多特征微博僵尸粉检测方法与实现[J]. 中国科技论文, 2014, 9(1):81-86. WANG Yue, ZHANG Jianjin, LIU Fangfang. Detection of micro-blog zombie fans based on multi-features[J]. China Science Paper, 2014, 9(1):81-86.
[3] 刘晓飞. 基于链接分析的微博用户可信度研究[D]. 兰州:兰州交通大学, 2015. LIU Xiaofei. Research on credibility of microblog users based on link analysis[D]. Lanzhou:Lanzhou Jiaotong University, 2015.
[4] 蒋盛益, 陈东沂, 庞观松,等. 微博信息可信度分析研究综述[J]. 图书情报工作, 2013, 57(12):136-142. JIANG Shengyi, CHEN Dongyi, PANG Guansong, et al. A review of micro-blog information reliability analysis[J]. Library and Information Service, 2013, 57(12):136-142.
[5] 毛佳昕, 刘奕群, 张敏,等. 基于用户交互行为的微博用户社会影响力分析[J]. 计算机学报, 2014, 37(4):791-880. MAO Jiaxin, LIU Yiqun, ZHANG Min, et al. Social influence analysis for micro-blog user based on user behavior[J]. Chinese Journal of Computers, 2014, 37(4):791-880.
[6] Wikipedia Inc. Credibility[EB/OL].(2013-01-20)[2015-01-20].http://en.wikipedia.org/wiki/Credibility.
[7] CASTILLO C, MENDOZA M, POBLTETE B. Information credibility on Twitter [C] //Proceedings of Information International Conference on World Wide Web. New York, USA: ACM Press, 2011:675-684.
[8] 闫光辉, 刘晓飞, 王梦阳. 基于链接的微博用户可信度研究[J].计算机应用研究, 2015, 32(10):2910-2917. YAN Guanghui, LIU Xiaofei, WANG Mengyang. Research on credibility of microblog users based on link[J]. Application Research of Computers, 2015, 32(10):2910-2917.
[9] GUPTA M, ZHAO P, ZHAO J. Evaluation event credibility on Twitter[C] //Proceedings of the 2012 SIAM International Conference on Data Mining. California, USA: SIAM Press, 2012:153-164.
[10] MUKHERJEE A, LIU B, GLANCE N. Spotting fake reviewer groups in consumer reviewer[C] //Proceedings of the 21st International Conference on World Wide Web. New York, USA: ACM Press, 2012:191-200.
[11] CHU Z, GIANVECCHIO S, WANG H, et al. Detecting automation of twitter accounts: are you a human, bot, or cyborg?[J]. IEEE Transactions on Dependable and Secure Computing, 2012, 9(6):811-824.
[12] 徐建民, 粟武林, 吴树芳,等. 基于逻辑回归的微博用户可信度建模[J].计算机工程与设计,2015, 36(3):772-777. XU Jianmin, SU Wulin, WU Shufang, et al. Modeling user reliability based on logistic regression in Micro-blog[J]. Computer Engineerinlg and Design, 2015, 36(3):772-777.
[13] 苗家, 马军, 陈竹敏. 一种基于HITS算法的Blog文摘方法[J].中文信息学报, 2011, 25(1):104-109. MIAO Jia, MA Jun, CHEN Zhumin. A new HITS-based summarization approach for Blog [J]. Journal of Chinese Information Processing, 2011, 25(1):104-109.
[14] 周小平,梁循,张海燕. 基于R-C模型的微博用户社区发现[J]. 软件学报,2014,25(12):2808-2823. ZHOU Xiaoping, LIANG Xun, ZHANG Haiyan. User community detection Micro-blog using R-C model[J]. Journal of Software, 2014, 25(12):2808-2823.
[15] KLEINBERG J M. Authoritative sources in a hyperlinked environment[C] //Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms. New York, USA: ACM Press, 1998:668-677.
[16] 田中生. 基于影响力的社会网络关键用户识别方法研究 [D]. 长春:吉林大学, 2015. TIAN Zhongsheng. Research on key user identification method based on influence in social networks[D]. Changchun:Jinlin University, 2014.
[17] 李赫元, 俞晓明, 刘悦,等. 中文微博客的垃圾用户检测[J]. 中文信息学报, 2014, 28(3):62-67. LI Heyuan, YU Xiaoming, LIU Yue, et al. Research on detecting spammer in Micro-blogs[J]. Journal of Chinese Information Processing, 2014, 28(3):62-67.
[18] 王峰, 余伟, 李石君. 新浪微博平台上的用户可信度评估[J].计算机科学与探索, 2013, 7(12):1125-1134. WANG Feng, YU Wei, LI Shijun. Evaluation of user credibility based on Sina weibo platform[J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(12):1125-1134.
[1] 张欣怡,翟玉庆*. 基于证据理论的信任模型中冲突证据[J]. 山东大学学报(工学版), 2013, 43(1): 48-53.
[2] 郭剑毅1,2,雷春雅1,余正涛1,2,苏磊1,2,赵君1,田维1. 基于信息熵的半监督领域实体关系抽取研究[J]. 山东大学学报(工学版), 2011, 41(4): 7-12.
[3] 曹欣 孙新利 李振. 改进灰自助法及其在可靠性评定中的应用[J]. 山东大学学报(工学版), 2010, 40(1): 144-148.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李可,刘常春,李同磊 . 一种改进的最大互信息医学图像配准算法[J]. 山东大学学报(工学版), 2006, 36(2): 107 -110 .
[2] 岳远征. 远离平衡态玻璃的弛豫[J]. 山东大学学报(工学版), 2009, 39(5): 1 -20 .
[3] 程代展,李志强. 非线性系统线性化综述(英文)[J]. 山东大学学报(工学版), 2009, 39(2): 26 -36 .
[4] 王勇, 谢玉东.

大流量管道煤气的控制技术研究

[J]. 山东大学学报(工学版), 2009, 39(2): 70 -74 .
[5] 刘新1 ,宋思利1 ,王新洪2 . 石墨配比对钨极氩弧熔敷层TiC增强相含量及分布形态的影响[J]. 山东大学学报(工学版), 2009, 39(2): 98 -100 .
[6] 田芳1,张颖欣2,张礼3,侯秀萍3,裘南畹3. 新型金属氧化物薄膜气敏元件基材料的开发[J]. 山东大学学报(工学版), 2009, 39(2): 104 -107 .
[7] 陈华鑫, 陈拴发, 王秉纲. 基质沥青老化行为与老化机理[J]. 山东大学学报(工学版), 2009, 39(2): 125 -130 .
[8] 赵延风1,2, 王正中1,2 ,芦琴1,祝晗英3 . 梯形明渠水跃共轭水深的直接计算方法[J]. 山东大学学报(工学版), 2009, 39(2): 131 -136 .
[9] 李士进,王声特,黄乐平. 基于正反向异质性的遥感图像变化检测[J]. 山东大学学报(工学版), 2018, 48(3): 1 -9 .
[10] 赵科军 王新军 刘洋 仇一泓. 基于结构化覆盖网的连续 top-k 联接查询算法[J]. 山东大学学报(工学版), 2009, 39(5): 32 -37 .