Journal of Shandong University(Engineering Science) ›› 2020, Vol. 50 ›› Issue (2): 1-9.doi: 10.6040/j.issn.1672-3961.0.2019.402

• Machine Learning & Data Mining •     Next Articles

Fake comment detection based on heterogeneous ensemble learning

Dapeng ZHANG1(),Yajun LIU2,*(),Wei ZHANG1,Fen SHEN1,Jiansheng YANG2   

  1. 1. School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China
    2. College of Information Engineering, Hebei Institute of Architecture and Civil Engineering, Zhangjiakou 075000, Hebei, China
  • Received:2019-07-24 Online:2020-04-20 Published:2020-04-16
  • Contact: Yajun LIU E-mail:daniao@ysu.edu.cn;liuyajun@stumail.ysu.edu.cn
  • Supported by:
    张家口市科学技术研究与发展指令计划项目(1711007B);张家口市科学技术研究与发展指令计划项目(1711045H);张家口市科学技术研究与发展指令计划项目(1811009B-04)

Abstract:

In view of the problem of small data set and inaccurate labeling in the field of fake comment detection, in order to prevent the vicious competition of sellers, ensure the fair trading of e-commerce platform, and protect the rights of consumers, the latest fake comment data set released by Amazon was used. The research was carried out and the related algorithms were improved. The Word2vec model could not recognize the word pairs in English. The Bigram-Word2vec model was proposed. The "two-class weighted hard voting" was proposed to solve the heterogeneous integration learning's case where the number of votes of the classifier was equal. The "weighted soft voting" was studied for how to set the weight of the classifier in heterogeneous integration learning. The experimental results showed that the improvement of related algorithms in this paper had achieved more ideal results.

Key words: machine learning, heterogeneous ensemble learning, voting, fake comment detection, Word2vec

CLC Number: 

  • TP312

Fig.1

Correspondence between research content and research direction"

Fig.2

CBOW and Skip-gram model"

Fig.3

Ensemble learning schematic"

Table 1

Ensemble performance enhancement table"

分类器 测试例1 测试例2 测试例3
分类器1 ×
分类器2 ×
分类器3 ×
集成

Table 2

Ensemble failed to improve performance table"

分类器 测试例1 测试例2 测试例3
分类器1 ×
分类器2 ×
分类器3 ×
集成 ×

Table 3

Integrated side effects table"

分类器 测试例1 测试例2 测试例3
分类器1 × ×
分类器2 × ×
分类器3 × ×
集成 × × ×

Fig.4

Two-class weighted hard voting"

Table 4

Soft voting calculation process"

分类器 分类器1 分类器2 分类器3 结果
类别1 Wei1×0.2 Wei2×0.6 Wei3×0.3 0.37
类别2 Wei1×0.5 Wei2×0.3 Wei3×0.4 0.40
类别3 Wei1×0.3 Wei2×0.1 Wei3×0.3 0.23

Table 5

Weighted soft voting calculation process"

分类器 分类器1 分类器2 分类器3 结果
类别1 A1×0.2 A2×0.6 A3×0.3 0.391/3
类别2 A1×0.5 A2×0.3 A3×0.4 0.389/3
类别3 A1×0.3 A2×0.1 A3×0.3 0.221/3

Fig.5

Rating and true, false review charts"

Table 6

Similarity for mark in Word2vec and Bigram-Word2vec"

单词 Word2vec模型的相似度 Bigram-Word2vec模型的相似度 差值
1 0.730 117 678 642 273 000 0.773 957 729 339 599 600 0.043 8
2 0.724 927 425 384 521 500 0.769 369 959 831 237 800 0.044 4
2 0.703 679 680 824 279 800 0.763 955 950 736 999 500 0.060 3
4 0.695 606 172 084 808 300 0.762 220 799 922 943 100 0.066 6
5 0.693 261 742 591 857 900 0.760 901 212 692 260 700 0.066 7
5 0.690 877 795 219 421 400 0.750 092 387 199 401 900 0.059 2
7 0.688 343 644 142 150 900 0.746 123 850 345 611 600 0.057 8
8 0.685 064 375 400 543 200 0.739 520 072 937 011 700 0.054 5
9 0.679 870 009 422 302 200 0.739 287 137 985 229 500 0.059 4
10 0.678 099 155 426 025 400 0.738 687 157 630 920 400 0.060 6

Table 7

Similarity for simply in Word2vec and BiWordvec"

单词 Word2vec模型的相似度 Bigram-Word2vec模型的相似度 差值
1 0.513 332 486 152 648 900 0.599 793 553 352 356 000 0.086 5
2 0.506 101 250 648 498 500 0.570 188 283 920 288 100 0.064 1
3 0.492 439 687 252 044 700 0.564 021 468 162 536 600 0.071 6
4 0.479 794 591 665 267 940 0.518 532 931 804 657 000 0.038 7
5 0.474 006 831 645 965 600 0.517 969 369 888 305 700 0.044 0
6 0.473 198 235 034 942 600 0.517 845 869 064 331 000 0.044 6
7 0.472 356 975 078 582 760 0.515 905 439 853 668 200 0.043 5
8 0.468 731 433 153 152 470 0.507 999 360 561 370 800 0.039 3
9 0.468 069 523 572 921 750 0.504 582 762 718 200 700 0.036 5
10 0.467 946 290 969 848 630 0.502 337 872 982 025 100 0.034 4

Fig.6

Comparison curves of Word2vec and Bigram-Word2vec experiments"

Fig.7

Experimental results of the accuracy of traditional text features"

Table 8

The results of ensembles learning experiment"

集成方法 Hard 2Hard Soft WeiSoft
准确率 0.769 7 0.812 7 0.771 0 0.808 8
1 JINDAL N, LIU B. Review spam detection[C]//Proceedings of the 16th International Conference on World Wide Web. Alberta, Canada: ACM, 2007.
2 OTT M, CHOI Y, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technolo-gies: Volume 1. Portland, USA : Association for Computational Linguistics, 2011.
3 MUKHERJEE A, VENKATARAMAN V, LIU B, et al. What yelp fake review filter might be doing?[C]//Seventh international AAAI Conference on Weblogs and Social Media. Boston, USA: AAAI, 2013.
4 LIM E P, NGUYEN V A, JINDAL N, et al. Detecting product review spammers using rating behaviors[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. Toronto, Canada: ACM, 2010.
5 MUKHERJEE A, LIU B, GLANCE N. Spotting fake reviewer groups in consumer reviews[C]//Proceedings of the 21st International Conference on World Wide Web. Lyon, France: ACM, 2012.
6 CHOWDHARY N S , PANDIT A A . Fake review detection using classification[J]. International Journal of Computer Applications, 2018, 180 (50): 16- 21.
doi: 10.5120/ijca2018917316
7 BARBADO R , ARAQUE O , IGLESIAS C A . A framework for fake review detection in online consumer electronics retailers[J]. Information Processing & Management, 2019, 56 (4): 1234- 1244.
doi: 10.1016/j.ipm.2019.03.002
8 KHALIFA M B, ELOUEDI Z, LEFEVRE E. Spammers detection based on reviewers' behaviors under belief function theory[C]//International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Graz, Austria: Springer, 2019: 642-653.
9 BENGIO Y , DUCHARME R , VINCENT P . A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3 (6): 932- 938.
10 MIKOLOV T , CHEN K . Efficient estimation of word representations in vector space[J]. Computer Science, 2013, 132 (43): 59- 63.
[1] ZHU Ming, SHI Chenglong, LÜ Pan, LIU Xianrong, SUN Chi, CHEN Jiancheng, FAN Hongyun. Deformation prediction method and engineering application of deep foundation pit based on optimized LSTM method [J]. Journal of Shandong University(Engineering Science), 2025, 55(3): 141-148.
[2] Xiushan NIE,Yuling MA,Huiyan QIAO,Jie GUO,Chaoran CUI,Zhiyun YU,Xingbo LIU,Yilong YIN. Survey on student academic performance prediction from the perspective of task granularity [J]. Journal of Shandong University(Engineering Science), 2022, 52(2): 1-14.
[3] Gaoteng YUAN,Yihui LIU,Wei HUANG,Bing HU. MR image classification and recognition model of breast cancer based onGabor feature [J]. Journal of Shandong University(Engineering Science), 2020, 50(3): 15-23.
[4] Minghe GAO,Ying ZHANG,Rongrong ZHANG,Zihao HUANG,Linyan HUANG,Fanyu LI,Xin ZHANG,Yanhao WANG. Air quality prediction approach based on integrating forecasting dataset [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 91-99.
[5] Yutian LIU, Runjia SUN, Hongtao WANG, Xueping GU. Review on application of artificial intelligence in power system restoration [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 1-8.
[6] Tong LI,Ran MA,Honghe ZHENG,Ping AN,Xiangyu HU. An error sensitivity model based on video statistical features [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 116-121.
[7] Qijie ZOU,Haoyu LI,Rubo ZHANG,Tengda PEI,Yan LIU. Survey of human-robot interaction control for autonomous driving [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 23-33.
[8] Mingxia GAO,Jingwei LI. Chinese short text classification method based on word2vec embedding [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 34-41.
[9] Mian ZHANG,Ying HUANG,Haiyi MEI,Yu GUO. Intelligent interaction method for power distribution robot based on Kinect [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 103-108.
[10] LIN Jianghao, ZHOU Yongmei, YANG Aimin, CHEN Jin. Building of domain sentiment lexicon based on word2vec [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 40-47.
[11] LIU Yang, LIU Bo, WANG Feng. Optimization algorithm for big data mining based on parameter server framework [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 1-6.
[12] WEI Bo, ZHANG Wensheng, LI Yuanxiang, XIA Xuewen, LYU Jingqin. A sparse online learning algorithm for feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 22-27.
[13] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
[14] MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
[15] LIU Jie, YANG Peng, LYU Wensheng, LIU Agudamu, LIU Junxiu. Prediction models of PM2.5 mass concentration based on meteorological factors [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(6): 76-83.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Kan . Empolder and implement of the embedded weld control system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 37 -41 .
[2] LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[3] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[4] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[5] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[6] ZHANG Ying,LANG Yongmei,ZHAO Yuxiao,ZHANG Jianda,QIAO Peng,LI Shanping . Research on technique of aerobic granular sludge cultivationby seeding EGSB anaerobic granular sludge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 56 -59 .
[7] Yue Khing Toh1, XIAO Wendong2, XIE Lihua1. Wireless sensor network for distributed target tracking: practices via real test bed development[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 50 -56 .
[8] SUN Weiwei, WANG Yuzhen. Finite gain stabilization of singlemachine infinite bus system subject to saturation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 69 -76 .
[9] LI Fangjia, GAO Shangce, TANG Zheng*, Ishii Masahiro, Yamashita Kazuya. 3D similar pattern generation of snow crystals with cellular automata[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 102 -105 .
[10] SUN Yu-li,LI De-fa,ZUO Dun-wen,QI mei . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(6): 19 -23 .