Journal of Shandong University(Engineering Science) ›› 2020, Vol. 50 ›› Issue (2): 1-9.doi: 10.6040/j.issn.1672-3961.0.2019.402

• Machine Learning & Data Mining •     Next Articles

Fake comment detection based on heterogeneous ensemble learning

Dapeng ZHANG1(),Yajun LIU2,*(),Wei ZHANG1,Fen SHEN1,Jiansheng YANG2   

  1. 1. School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China
    2. College of Information Engineering, Hebei Institute of Architecture and Civil Engineering, Zhangjiakou 075000, Hebei, China
  • Received:2019-07-24 Online:2020-04-20 Published:2020-04-16
  • Contact: Yajun LIU E-mail:daniao@ysu.edu.cn;liuyajun@stumail.ysu.edu.cn
  • Supported by:
    张家口市科学技术研究与发展指令计划项目(1711007B);张家口市科学技术研究与发展指令计划项目(1711045H);张家口市科学技术研究与发展指令计划项目(1811009B-04)

Abstract:

In view of the problem of small data set and inaccurate labeling in the field of fake comment detection, in order to prevent the vicious competition of sellers, ensure the fair trading of e-commerce platform, and protect the rights of consumers, the latest fake comment data set released by Amazon was used. The research was carried out and the related algorithms were improved. The Word2vec model could not recognize the word pairs in English. The Bigram-Word2vec model was proposed. The "two-class weighted hard voting" was proposed to solve the heterogeneous integration learning's case where the number of votes of the classifier was equal. The "weighted soft voting" was studied for how to set the weight of the classifier in heterogeneous integration learning. The experimental results showed that the improvement of related algorithms in this paper had achieved more ideal results.

Key words: machine learning, heterogeneous ensemble learning, voting, fake comment detection, Word2vec

CLC Number: 

  • TP312

Fig.1

Correspondence between research content and research direction"

Fig.2

CBOW and Skip-gram model"

Fig.3

Ensemble learning schematic"

Table 1

Ensemble performance enhancement table"

分类器 测试例1 测试例2 测试例3
分类器1 ×
分类器2 ×
分类器3 ×
集成

Table 2

Ensemble failed to improve performance table"

分类器 测试例1 测试例2 测试例3
分类器1 ×
分类器2 ×
分类器3 ×
集成 ×

Table 3

Integrated side effects table"

分类器 测试例1 测试例2 测试例3
分类器1 × ×
分类器2 × ×
分类器3 × ×
集成 × × ×

Fig.4

Two-class weighted hard voting"

Table 4

Soft voting calculation process"

分类器 分类器1 分类器2 分类器3 结果
类别1 Wei1×0.2 Wei2×0.6 Wei3×0.3 0.37
类别2 Wei1×0.5 Wei2×0.3 Wei3×0.4 0.40
类别3 Wei1×0.3 Wei2×0.1 Wei3×0.3 0.23

Table 5

Weighted soft voting calculation process"

分类器 分类器1 分类器2 分类器3 结果
类别1 A1×0.2 A2×0.6 A3×0.3 0.391/3
类别2 A1×0.5 A2×0.3 A3×0.4 0.389/3
类别3 A1×0.3 A2×0.1 A3×0.3 0.221/3

Fig.5

Rating and true, false review charts"

Table 6

Similarity for mark in Word2vec and Bigram-Word2vec"

单词 Word2vec模型的相似度 Bigram-Word2vec模型的相似度 差值
1 0.730 117 678 642 273 000 0.773 957 729 339 599 600 0.043 8
2 0.724 927 425 384 521 500 0.769 369 959 831 237 800 0.044 4
2 0.703 679 680 824 279 800 0.763 955 950 736 999 500 0.060 3
4 0.695 606 172 084 808 300 0.762 220 799 922 943 100 0.066 6
5 0.693 261 742 591 857 900 0.760 901 212 692 260 700 0.066 7
5 0.690 877 795 219 421 400 0.750 092 387 199 401 900 0.059 2
7 0.688 343 644 142 150 900 0.746 123 850 345 611 600 0.057 8
8 0.685 064 375 400 543 200 0.739 520 072 937 011 700 0.054 5
9 0.679 870 009 422 302 200 0.739 287 137 985 229 500 0.059 4
10 0.678 099 155 426 025 400 0.738 687 157 630 920 400 0.060 6

Table 7

Similarity for simply in Word2vec and BiWordvec"

单词 Word2vec模型的相似度 Bigram-Word2vec模型的相似度 差值
1 0.513 332 486 152 648 900 0.599 793 553 352 356 000 0.086 5
2 0.506 101 250 648 498 500 0.570 188 283 920 288 100 0.064 1
3 0.492 439 687 252 044 700 0.564 021 468 162 536 600 0.071 6
4 0.479 794 591 665 267 940 0.518 532 931 804 657 000 0.038 7
5 0.474 006 831 645 965 600 0.517 969 369 888 305 700 0.044 0
6 0.473 198 235 034 942 600 0.517 845 869 064 331 000 0.044 6
7 0.472 356 975 078 582 760 0.515 905 439 853 668 200 0.043 5
8 0.468 731 433 153 152 470 0.507 999 360 561 370 800 0.039 3
9 0.468 069 523 572 921 750 0.504 582 762 718 200 700 0.036 5
10 0.467 946 290 969 848 630 0.502 337 872 982 025 100 0.034 4

Fig.6

Comparison curves of Word2vec and Bigram-Word2vec experiments"

Fig.7

Experimental results of the accuracy of traditional text features"

Table 8

The results of ensembles learning experiment"

集成方法 Hard 2Hard Soft WeiSoft
准确率 0.769 7 0.812 7 0.771 0 0.808 8
1 JINDAL N, LIU B. Review spam detection[C]//Proceedings of the 16th International Conference on World Wide Web. Alberta, Canada: ACM, 2007.
2 OTT M, CHOI Y, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technolo-gies: Volume 1. Portland, USA : Association for Computational Linguistics, 2011.
3 MUKHERJEE A, VENKATARAMAN V, LIU B, et al. What yelp fake review filter might be doing?[C]//Seventh international AAAI Conference on Weblogs and Social Media. Boston, USA: AAAI, 2013.
4 LIM E P, NGUYEN V A, JINDAL N, et al. Detecting product review spammers using rating behaviors[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. Toronto, Canada: ACM, 2010.
5 MUKHERJEE A, LIU B, GLANCE N. Spotting fake reviewer groups in consumer reviews[C]//Proceedings of the 21st International Conference on World Wide Web. Lyon, France: ACM, 2012.
6 CHOWDHARY N S , PANDIT A A . Fake review detection using classification[J]. International Journal of Computer Applications, 2018, 180 (50): 16- 21.
doi: 10.5120/ijca2018917316
7 BARBADO R , ARAQUE O , IGLESIAS C A . A framework for fake review detection in online consumer electronics retailers[J]. Information Processing & Management, 2019, 56 (4): 1234- 1244.
doi: 10.1016/j.ipm.2019.03.002
8 KHALIFA M B, ELOUEDI Z, LEFEVRE E. Spammers detection based on reviewers' behaviors under belief function theory[C]//International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Graz, Austria: Springer, 2019: 642-653.
9 BENGIO Y , DUCHARME R , VINCENT P . A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3 (6): 932- 938.
10 MIKOLOV T , CHEN K . Efficient estimation of word representations in vector space[J]. Computer Science, 2013, 132 (43): 59- 63.
[1] GAO Minghe, ZHANG Ying, ZHANG Rongrong, HUANG Zihao, HUANG Linyan, LI Fanyu, ZHANG Xin, WANG Yanhao. Air quality prediction approach based on integrating forecasting dataset [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 91-99.
[2] Yutian LIU, Runjia SUN, Hongtao WANG, Xueping GU. Review on application of artificial intelligence in power system restoration [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 1-8.
[3] Tong LI,Ran MA,Honghe ZHENG,Ping AN,Xiangyu HU. An error sensitivity model based on video statistical features [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 116-121.
[4] Mingxia GAO,Jingwei LI. Chinese short text classification method based on word2vec embedding [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 34-41.
[5] Qijie ZOU,Haoyu LI,Rubo ZHANG,Tengda PEI,Yan LIU. Survey of human-robot interaction control for autonomous driving [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 23-33.
[6] Mian ZHANG,Ying HUANG,Haiyi MEI,Yu GUO. Intelligent interaction method for power distribution robot based on Kinect [J]. Journal of Shandong University(Engineering Science), 2018, 48(5): 103-108.
[7] LIN Jianghao, ZHOU Yongmei, YANG Aimin, CHEN Jin. Building of domain sentiment lexicon based on word2vec [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 40-47.
[8] LIU Yang, LIU Bo, WANG Feng. Optimization algorithm for big data mining based on parameter server framework [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(4): 1-6.
[9] WEI Bo, ZHANG Wensheng, LI Yuanxiang, XIA Xuewen, LYU Jingqin. A sparse online learning algorithm for feature selection [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2017, 47(1): 22-27.
[10] ZHOU Wang, ZHANG Chenlin, WU Jianxin. Qualitative balanced clustering algorithm based on Hartigan-Wong and Lloyd [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(5): 37-44.
[11] MENG Lingheng, DING Shifei. Depth perceptual model based on the single image [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2016, 46(3): 37-43.
[12] LIU Jie, YANG Peng, LYU Wensheng, LIU Agudamu, LIU Junxiu. Prediction models of PM2.5 mass concentration based on meteorological factors [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(6): 76-83.
[13] ZHENG Yi, ZHU Chengzhang. A prediction method of atmospheric PM2.5 based on DBNs [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 19-25.
[14] XIE Lin1, YIN Xi-yao2, LI Fan-zhang3, WU Jia3. A kind of inverse resolution learning expression [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(4): 46-50.
[15] AN Chun-lin1, LU Hui-juan1,2*, ZHENG En-hui3, WANG Ming-yi1, LU Yi4. Gene expression data classification of the extreme learning machine with misclassification cost and rejection cost [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2013, 43(4): 18-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] QIN Tong, SUN Fengrong*, WANG Limei, WANG Qinghao, LI Xincai. 3D surface reconstruction using the shape based interpolation guided by maximal discs[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 1 -5 .
[2] WANG Yong, XIE Yudong. Gas control technology of largeflow pipe[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 70 -74 .
[3] LI Shijin, WANG Shengte, HUANG Leping. Change detection with remote sensing images based on forward-backward heterogenicity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 1 -9 .
[4] LI Xin-Ping, DAI Yi-Fei, HU Jing. Fluid-solid coupling analysis of surrounding rock mass stability and water inflow forecast of a tunnel in a karst zone[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 1 -6 .
[5] SUN Liang. The effect analysis of advanced detection of water interbed by TEM[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 50 -52 .
[6] SUN Huai-Feng, LI Shu-Cai, CUI Wei, QIU Dao-Hong, LIU Qin. Application of comprehensive geological predictionin open-cut tunnel detection[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 69 -73 .
[7] LUO Yun-hu,WU Xu-wen,PAN Shuang-lai,DONG Er-ling,SUN Xiu-juan,WANG Chuan-jiang,WU Na . Coordination of two kinds of interruptible loads of demand side and reserve capacity of generation side[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(6): 66 -70 .
[8] GAO Yang, ZHANG Qing-Song, YUAN Xiao-Shuai, XU Zhen-Hao, LIU Bin. Application of geological radar to geological forecast in karst tunnel[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 82 -86 .
[9] . The magnetic glass state in the magnetocaloric material Gd5Ge4[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 67 -70 .
[10] JIANG Peng-fei,WANG Zhen . Game analysis of manufacturer and different suppliers[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(2): 117 -119 .