Journal of Shandong University(Engineering Science) ›› 2020, Vol. 50 ›› Issue (2): 50-59.doi: 10.6040/j.issn.1672-3961.0.2019.403

• Machine Learning & Data Mining • Previous Articles     Next Articles

Identification of the same product feature based on multi-dimension similarity and sentiment word expansion

Longmao HU1,2(),Xuegang HU1,*()   

  1. 1. School of Computer and Information, Hefei University of Technology, Hefei 230601, Anhui, China
    2. Anhui Finance and Trade Vocational College, Hefei 230601, Anhui, China
  • Received:2019-07-17 Online:2020-04-20 Published:2020-04-16
  • Contact: Xuegang HU E-mail:hulongmao@163.com;jsjxhuxg@hfut.edu.cn
  • Supported by:
    国家自然科学基金项目(61673152);安徽省高校自然科学研究重点项目(KJ2017A858)

Abstract:

Because the existing methods for identifying the same product features were limited by the lack of dictionary coverage or corpus size, an identification method was proposed based on multidimensional similarity and sentiment word expansion. Extracting emotional words of product features through bi-directional long short-term memory and conditional random field (Bi-LSTM-CRF), combining the morpheme similarity, Cilin similarity and term frequency-inverse document frequency (TF-IDF) cosine similarity of product feature words, the same product features were identified by K-medoids clustering algorithm. The experimental results showed that, on mobile and notebook datasets, the maximum adjusted rand index (ARI) reached 0.579 and 0.595 9 respectively, while the minimum entropy reached 0.782 6 and 0.745 7. The proposed method was superior to the adjusted Jaccard similarity combined morpheme, Word2Vec similarity and Word2Vec similarity based on bisecting K-means.

Key words: product feature, sentiment word expansion, Bi-LSTM-CRF, multi-dimension, similarity calculation

CLC Number: 

  • TP391

Fig.1

Structure of Bi-LSTM-CRF"

Fig.2

Basic process of identify the same product feature"

Table 1

Contingency table of clustering and partitioning"

C L1 L2 Ls Sums
C1 n11 n12 n1s a1
C2 n21 n22 n2s a2
Ck nk1 nk2 nks ak
Sums b1 b2 bs n

Table 2

Semantic similarity"

方法 斯皮尔曼系数 产品特征词覆盖率/%
同义词林相似度 0.299 3 35.68
调整Jaccard相似度 0.122 3 100
调整Jaccard相似度(扩充情感词) 0.293 3 100
TF-IDF余弦相似度 0.209 9 100
TF-IDF余弦相似度(扩充情感词) 0.434 2 100
Word2Vec相似度 -0.001 3 0.250 3
Word2Vec相似度(扩充评论) 97.83 98.92

Fig.3

Experimental results of α threshold selection"

Table 3

Results of clustering"

项目 语素+TF-IDF余弦相似度(扩充情感词)+同义词林 语素+调整Jaccard相似度(扩充情感词) Word2Vec相似度(扩充评论) Word2Vec相似度(扩充评论,biKMeans聚类)
ARI Entropy ARI Entropy ARI Entropy ARI Entropy
最大 0.579 0 1.402 7 0.541 2 1.506 3 0.150 1 2.606 5 0.123 5 2.443 4
最小 0.223 0 0.782 6 0.200 0 0.877 4 0.044 0 1.975 4 0.062 8 2.010 6
平均 0.361 6 1.081 0 0.330 3 1.168 2 0.091 8 2.244 1 0.087 2 2.242 4
标准差 0.046 4 0.084 2 0.045 4 0.085 7 0.016 0 0.085 0 0.020 4 0.120 6

Fig.4

Maximum ARI value for different number of expanded reviews"

Table 4

Results of clustering for notebook computer data set"

项目 语素+ TF-IDF余弦相似度(扩充情感词)+同义词林 语素+调整Jaccard相似度(扩充情感词) Word2Vec相似度(扩充评论) Word2Vec相似度(扩充评论,biKMeans聚类)
ARI Entropy ARI Entropy ARI Entropy ARI Entropy
最大 0.595 9 1.284 1 0.543 1 1.372 0 0.375 1 1.748 2 0.280 8 1.593 1
最小 0.273 9 0.754 7 0.250 5 0.872 7 0.183 4 1.228 1 0.226 7 1.459 9
平均 0.400 2 1.008 6 0.354 2 1.136 1 0.277 0 1.455 7 0.255 8 1.556 8
标准差 0.040 5 0.075 6 0.038 2 0.071 1 0.028 4 0.068 2 0.018 8 0.038 2
1 XU Hua , ZHANG Fan , WANG Wei . Implicit feature identification in Chinese reviews using explicit topic mining model[J]. Knowledge-Based Systems, 2015, 76, 166- 175.
doi: 10.1016/j.knosys.2014.12.012
2 CARENINI G, NG R T, ZWART E. Extracting knowledge from evaluative text[C]// Proceedings of the 3rd International Conference on Knowledge Capture. New York, USA: ACM, 2005: 11-18.
3 李昌兵, 庞崇鹏, 凌永亮, 等. 基于改进特征提取及聚类的网络评论挖掘研究[J]. 现代情报, 2018, 38 (2): 68- 74.
LI Changbing , PANG Chongpeng , LING Yongliang , et al. Research on network review mining based on improved feature extraction and clustering[J]. Journal of Modern Information, 2018, 38 (2): 68- 74.
4 杨源, 马云龙, 林鸿飞. 评论挖掘中产品属性归类问题研究[J]. 中文信息学报, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018
YANG Yuan , MA Yunlong , LIN Hongfei . Clustering product features in opinion mining[J]. Journal of Chinese Information Processing, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018
5 刘丽珍, 赵新蕾, 王函石, 等. 基于产品特征的领域情感本体构建[J]. 北京理工大学学报, 2015, 35 (5): 538- 544.
LIU Lizhen , ZHAO Xinlei , WANG Hanshi , et al. Constructing domain affective ontology based on product features[J]. Transactions of Beijing Institute of Technology, 2015, 35 (5): 538- 544.
6 ZHAO L, HUANG M, CHEN H, et al. Clustering aspect-related phrases by leveraging sentiment distribution consistency[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014: 1614-1623.
7 ZHAO L, LUO H, LUO H, et al. Sentiment extraction by leveraging aspect-opinion association structure[C]// ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2015: 343-352.
8 卿勇, 刘梦娟, 薛浩, 等. OPEN:一个基于评论的商品特征抽取及情感分析框架[J]. 计算机应用与软件, 2018, 35 (1): 65- 71.
QING Yong , LIU Mengjuan , XUE Hao , et al. OPEN: a framwork for product feature extraction and sentiment analysis based on product comments[J]. Computer Applications and Software, 2018, 35 (1): 65- 71.
9 李良强, 袁华, 叶开, 等. 基于在线评论词向量表征的产品属性提取[J]. 系统工程学报, 2018, 33 (5): 687- 697.
LI Liangqiang , YUAN Hua , YE Kai , et al. Extraction product features from online reviews based on word-vector-representation[J]. Journal of Systems Engineering, 2018, 33 (5): 687- 697.
10 江伟, 路松峰, 杨莉萍. 基于CBC-LIKE算法的产品特征词聚类的研究[J]. 现代电子技术, 2017, 40 (14): 81- 84.
JIANG We , LU Songfeng , YANG Liping . Research on product feature words clustering based on CBC-LIKE algorithm[J]. Modern Electronics Technique, 2017, 40 (14): 81- 84.
11 李伟卿, 王伟军. 基于大规模评论数据的产品特征词典构建方法研究[J]. 数据分析与知识发现, 2018, 2 (1): 41- 50.
LI Weiqing , WANG Weijun . Building product feature dictionary with large-scale review data[J]. Data Analysis and Knowledge Discovery, 2018, 2 (1): 41- 50.
12 何有世, 何述芳. 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘[J]. 数据分析与知识发现, 2018, 2 (8): 60- 68.
HE Youshi , HE Shufang . Sentiment mining of online product reviews based on domain ontology[J]. Data Analysis and Knowledge Discovery, 2018, 2 (8): 60- 68.
13 王松松, 高伟勋, 徐逸凡. 基于路径与词林编码的词语相似度计算方法[J]. 计算机工程, 2018, 44 (10): 160- 167.
WANG Songsong , GAO Weixun , XU Yifan . Word similarity calculation method based on path and cilin coding[J]. Computer Engineering, 2018, 44 (10): 160- 167.
14 陈宏朝, 李飞, 朱新华, 等. 基于路径与深度的同义词词林词语相似度计算[J]. 中文信息学报, 2016, 30 (5): 80- 88.
CHEN Hongchao , LI Fei , ZHU Xinhua , et al. A path and depth-based approach to word semantic similarity calcalation in cilin[J]. Journal of Chinese Information Processing, 2016, 30 (5): 80- 88.
15 罗燕, 赵书良, 李晓超, 等. 基于词频统计的文本关键词提取方法[J]. 计算机应用, 2016, 36 (3): 718- 725.
LUO Yan , ZHAO Shuliang , LI Xiaochao , et al. Text keyword extraction method based on word frequency statistics[J]. Journal of Computer Applications, 2016, 36 (3): 718- 725.
16 HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv, 2015. https://arxiv.org/abs/1508.01991.
17 MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations. Scottsdale, USA: [s.n.], 2013: 1-12.
18 MIKOLOV T , DEAN J . Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26, 3111- 3119.
19 LI Shen, ZHAO Zhe, HU Renfen, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2018.
20 KAUFMAN L , ROUSSEEUW P J . Finding groups in data: an introduction to cluster analysis[M]. New York, USA: Wiley, 1990: 126- 163.
21 王建仁, 马鑫, 段刚龙. 改进的K-means聚类K值选择算法[J]. 计算机工程与应用, 2019, 55 (8): 27- 33.
WANG Jianren , MA Xin , DUAN Ganglong . Improved K-means clustering K-value selection algorithm[J]. Computer Engineering and Applications, 2019, 55 (8): 27- 33.
22 周治平, 朱书伟, 张道文. 分类数据的多目标模糊中心点聚类算法[J]. 计算机研究与发展, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467
ZHOU Zhiping , ZHU Shuwei , ZHANG Daowen . Multiobjective clustering algorithm with fuzzy centroids for categorical data[J]. Journal of Computer Research and Development, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467
23 刘勘, 袁蕴英. 基于自动编码器的短文本特征提取及聚类研究[J]. 北京大学学报(自然科学版), 2015, 51 (2): 282- 288.
LIU Kan , YUAN Yunying . Short texts feature extraction and clustering based on auto-encoder[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51 (2): 282- 288.
24 成静, 葛璐琦, 张涛, 等. 移动应用众包测试质量影响因素分析[J]. 计算机应用, 2018, 38 (9): 2626- 2630.
CHENG Jing , GE Luqi , ZHANG Tao , et al. Research on factors affecting quality of mobile application crowdsourced testing[J]. Journal of Computer Applications, 2018, 38 (9): 2626- 2630.
[1] DENG Bin, ZHANG Zongbao, ZHAO Wenmeng, LUO Xinhang, WU Qiuwei. Cloud-edge collaborative and graph neural network based load forecasting method for electric vehicle charging stations [J]. Journal of Shandong University(Engineering Science), 2025, 55(5): 62-69.
[2] LI Erchao, ZHANG Zhizhao. Online dynamic demand vehicle routing planning [J]. Journal of Shandong University(Engineering Science), 2024, 54(5): 62-73.
[3] YANG Jucheng, WEI Feng, LIN Liang, JIA Qingxiang, LIU Jianzheng. A research survey of driver drowsiness driving detection [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 1-12.
[4] XIAO Wei, ZHENG Gengsheng, CHEN Yujia. Named entity recognition method combined with self-training model [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 96-102.
[5] Gang HU, Lemeng WANG, Zhiyu LU, Qin WANG, Xiang XU. Importance identification method based on multi-order neighborhood hierarchical association contribution of nodes [J]. Journal of Shandong University(Engineering Science), 2024, 54(1): 1-10.
[6] Jiachun LI,Bowen LI,Jianbo CHANG. An efficient and lightweight RGB frame-level face anti-spoofing model [J]. Journal of Shandong University(Engineering Science), 2023, 53(6): 1-7.
[7] Yujiang FAN,Huanhuan HUANG,Jiaxiong DING,Kai LIAO,Binshan YU. Resilience evaluation system of the old community based on cloud model [J]. Journal of Shandong University(Engineering Science), 2023, 53(5): 1-9, 19.
[8] Ying LI,Jiankun WANG. The classification of mild cognitive impairment based on supervised graph regularization and information fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 65-73.
[9] YU Mingjun, DIAO Hongjun, LING Xinghong. Online multi-object tracking method based on trajectory mask [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 61-69.
[10] LIU Xing, YANG Lu, HAO Fanchang. Finger vein image retrieval based on multi-feature fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 118-126.
[11] LIU Fangxu, WANG Jian, WEI Benzheng. Auxiliary diagnosis algorithm for pediatric pneumonia based on multi-spatial attention [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 135-142.
[12] YU Yixuan, YANG Geng, GENG Hua. Multimodal hierarchical keyframe extraction method for continuous combined motion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 42-50.
[13] HUANG Huajuan, CHENG Qian, WEI Xiuxi, YU Chuchu. Adaptive crow search algorithm with Jaya algorithm and Gaussian mutation [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 11-22.
[14] ZHANG Hao, LI Ziling, LIU Tong, ZHANG Dawei, TAO Jianhua. A technology prediction model based on fuzzy Bayesian networks with sociological factors [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 23-33.
[15] WU Yanli, LIU Shuwei, HE Dongxiao, WANG Xiaobao, JIN Di. Poisson-gamma topic model of describing multiple underlying relationships [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 51-60.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Su-yu,<\sup>,AI Xing<\sup>,ZHAO Jun<\sup>,LI Zuo-li<\sup>,LIU Zeng-wen<\sup> . Milling force prediction model for highspeed end milling 3Cr2Mo steel[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 1 -5 .
[2] ZHANG Yong-hua,WANG An-ling,LIU Fu-ping . The reflected phase angle of low frequent inhomogeneous[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 22 -25 .
[3] LI Kan . Empolder and implement of the embedded weld control system[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 37 -41 .
[4] KONG Xiang-zhen,LIU Yan-jun,WANG Yong,ZHAO Xiu-hua . Compensation and simulation for the deadband of the pneumatic proportional valve[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 99 -102 .
[5] LAI Xiang . The global domain of attraction for a kind of MKdV equations[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 87 -92 .
[6] YU Jia yuan1, TIAN Jin ting1, ZHU Qiang zhong2. Computational intelligence and its application in psychology[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 1 -5 .
[7] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[8] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[9] LI Ke,LIU Chang-chun,LI Tong-lei . Medical registration approach using improved maximization of mutual information[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 107 -110 .
[10] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .