Journal of Shandong University(Engineering Science) ›› 2020, Vol. 50 ›› Issue (2): 50-59.doi: 10.6040/j.issn.1672-3961.0.2019.403

• Machine Learning & Data Mining • Previous Articles     Next Articles

Identification of the same product feature based on multi-dimension similarity and sentiment word expansion

Longmao HU1,2(),Xuegang HU1,*()   

  1. 1. School of Computer and Information, Hefei University of Technology, Hefei 230601, Anhui, China
    2. Anhui Finance and Trade Vocational College, Hefei 230601, Anhui, China
  • Received:2019-07-17 Online:2020-04-20 Published:2020-04-16
  • Contact: Xuegang HU E-mail:hulongmao@163.com;jsjxhuxg@hfut.edu.cn
  • Supported by:
    国家自然科学基金项目(61673152);安徽省高校自然科学研究重点项目(KJ2017A858)

Abstract:

Because the existing methods for identifying the same product features were limited by the lack of dictionary coverage or corpus size, an identification method was proposed based on multidimensional similarity and sentiment word expansion. Extracting emotional words of product features through bi-directional long short-term memory and conditional random field (Bi-LSTM-CRF), combining the morpheme similarity, Cilin similarity and term frequency-inverse document frequency (TF-IDF) cosine similarity of product feature words, the same product features were identified by K-medoids clustering algorithm. The experimental results showed that, on mobile and notebook datasets, the maximum adjusted rand index (ARI) reached 0.579 and 0.595 9 respectively, while the minimum entropy reached 0.782 6 and 0.745 7. The proposed method was superior to the adjusted Jaccard similarity combined morpheme, Word2Vec similarity and Word2Vec similarity based on bisecting K-means.

Key words: product feature, sentiment word expansion, Bi-LSTM-CRF, multi-dimension, similarity calculation

CLC Number: 

  • TP391

Fig.1

Structure of Bi-LSTM-CRF"

Fig.2

Basic process of identify the same product feature"

Table 1

Contingency table of clustering and partitioning"

C L1 L2 Ls Sums
C1 n11 n12 n1s a1
C2 n21 n22 n2s a2
Ck nk1 nk2 nks ak
Sums b1 b2 bs n

Table 2

Semantic similarity"

方法 斯皮尔曼系数 产品特征词覆盖率/%
同义词林相似度 0.299 3 35.68
调整Jaccard相似度 0.122 3 100
调整Jaccard相似度(扩充情感词) 0.293 3 100
TF-IDF余弦相似度 0.209 9 100
TF-IDF余弦相似度(扩充情感词) 0.434 2 100
Word2Vec相似度 -0.001 3 0.250 3
Word2Vec相似度(扩充评论) 97.83 98.92

Fig.3

Experimental results of α threshold selection"

Table 3

Results of clustering"

项目 语素+TF-IDF余弦相似度(扩充情感词)+同义词林 语素+调整Jaccard相似度(扩充情感词) Word2Vec相似度(扩充评论) Word2Vec相似度(扩充评论,biKMeans聚类)
ARI Entropy ARI Entropy ARI Entropy ARI Entropy
最大 0.579 0 1.402 7 0.541 2 1.506 3 0.150 1 2.606 5 0.123 5 2.443 4
最小 0.223 0 0.782 6 0.200 0 0.877 4 0.044 0 1.975 4 0.062 8 2.010 6
平均 0.361 6 1.081 0 0.330 3 1.168 2 0.091 8 2.244 1 0.087 2 2.242 4
标准差 0.046 4 0.084 2 0.045 4 0.085 7 0.016 0 0.085 0 0.020 4 0.120 6

Fig.4

Maximum ARI value for different number of expanded reviews"

Table 4

Results of clustering for notebook computer data set"

项目 语素+ TF-IDF余弦相似度(扩充情感词)+同义词林 语素+调整Jaccard相似度(扩充情感词) Word2Vec相似度(扩充评论) Word2Vec相似度(扩充评论,biKMeans聚类)
ARI Entropy ARI Entropy ARI Entropy ARI Entropy
最大 0.595 9 1.284 1 0.543 1 1.372 0 0.375 1 1.748 2 0.280 8 1.593 1
最小 0.273 9 0.754 7 0.250 5 0.872 7 0.183 4 1.228 1 0.226 7 1.459 9
平均 0.400 2 1.008 6 0.354 2 1.136 1 0.277 0 1.455 7 0.255 8 1.556 8
标准差 0.040 5 0.075 6 0.038 2 0.071 1 0.028 4 0.068 2 0.018 8 0.038 2
1 XU Hua , ZHANG Fan , WANG Wei . Implicit feature identification in Chinese reviews using explicit topic mining model[J]. Knowledge-Based Systems, 2015, 76, 166- 175.
doi: 10.1016/j.knosys.2014.12.012
2 CARENINI G, NG R T, ZWART E. Extracting knowledge from evaluative text[C]// Proceedings of the 3rd International Conference on Knowledge Capture. New York, USA: ACM, 2005: 11-18.
3 李昌兵, 庞崇鹏, 凌永亮, 等. 基于改进特征提取及聚类的网络评论挖掘研究[J]. 现代情报, 2018, 38 (2): 68- 74.
LI Changbing , PANG Chongpeng , LING Yongliang , et al. Research on network review mining based on improved feature extraction and clustering[J]. Journal of Modern Information, 2018, 38 (2): 68- 74.
4 杨源, 马云龙, 林鸿飞. 评论挖掘中产品属性归类问题研究[J]. 中文信息学报, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018
YANG Yuan , MA Yunlong , LIN Hongfei . Clustering product features in opinion mining[J]. Journal of Chinese Information Processing, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018
5 刘丽珍, 赵新蕾, 王函石, 等. 基于产品特征的领域情感本体构建[J]. 北京理工大学学报, 2015, 35 (5): 538- 544.
LIU Lizhen , ZHAO Xinlei , WANG Hanshi , et al. Constructing domain affective ontology based on product features[J]. Transactions of Beijing Institute of Technology, 2015, 35 (5): 538- 544.
6 ZHAO L, HUANG M, CHEN H, et al. Clustering aspect-related phrases by leveraging sentiment distribution consistency[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014: 1614-1623.
7 ZHAO L, LUO H, LUO H, et al. Sentiment extraction by leveraging aspect-opinion association structure[C]// ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2015: 343-352.
8 卿勇, 刘梦娟, 薛浩, 等. OPEN:一个基于评论的商品特征抽取及情感分析框架[J]. 计算机应用与软件, 2018, 35 (1): 65- 71.
QING Yong , LIU Mengjuan , XUE Hao , et al. OPEN: a framwork for product feature extraction and sentiment analysis based on product comments[J]. Computer Applications and Software, 2018, 35 (1): 65- 71.
9 李良强, 袁华, 叶开, 等. 基于在线评论词向量表征的产品属性提取[J]. 系统工程学报, 2018, 33 (5): 687- 697.
LI Liangqiang , YUAN Hua , YE Kai , et al. Extraction product features from online reviews based on word-vector-representation[J]. Journal of Systems Engineering, 2018, 33 (5): 687- 697.
10 江伟, 路松峰, 杨莉萍. 基于CBC-LIKE算法的产品特征词聚类的研究[J]. 现代电子技术, 2017, 40 (14): 81- 84.
JIANG We , LU Songfeng , YANG Liping . Research on product feature words clustering based on CBC-LIKE algorithm[J]. Modern Electronics Technique, 2017, 40 (14): 81- 84.
11 李伟卿, 王伟军. 基于大规模评论数据的产品特征词典构建方法研究[J]. 数据分析与知识发现, 2018, 2 (1): 41- 50.
LI Weiqing , WANG Weijun . Building product feature dictionary with large-scale review data[J]. Data Analysis and Knowledge Discovery, 2018, 2 (1): 41- 50.
12 何有世, 何述芳. 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘[J]. 数据分析与知识发现, 2018, 2 (8): 60- 68.
HE Youshi , HE Shufang . Sentiment mining of online product reviews based on domain ontology[J]. Data Analysis and Knowledge Discovery, 2018, 2 (8): 60- 68.
13 王松松, 高伟勋, 徐逸凡. 基于路径与词林编码的词语相似度计算方法[J]. 计算机工程, 2018, 44 (10): 160- 167.
WANG Songsong , GAO Weixun , XU Yifan . Word similarity calculation method based on path and cilin coding[J]. Computer Engineering, 2018, 44 (10): 160- 167.
14 陈宏朝, 李飞, 朱新华, 等. 基于路径与深度的同义词词林词语相似度计算[J]. 中文信息学报, 2016, 30 (5): 80- 88.
CHEN Hongchao , LI Fei , ZHU Xinhua , et al. A path and depth-based approach to word semantic similarity calcalation in cilin[J]. Journal of Chinese Information Processing, 2016, 30 (5): 80- 88.
15 罗燕, 赵书良, 李晓超, 等. 基于词频统计的文本关键词提取方法[J]. 计算机应用, 2016, 36 (3): 718- 725.
LUO Yan , ZHAO Shuliang , LI Xiaochao , et al. Text keyword extraction method based on word frequency statistics[J]. Journal of Computer Applications, 2016, 36 (3): 718- 725.
16 HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv, 2015. https://arxiv.org/abs/1508.01991.
17 MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations. Scottsdale, USA: [s.n.], 2013: 1-12.
18 MIKOLOV T , DEAN J . Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26, 3111- 3119.
19 LI Shen, ZHAO Zhe, HU Renfen, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2018.
20 KAUFMAN L , ROUSSEEUW P J . Finding groups in data: an introduction to cluster analysis[M]. New York, USA: Wiley, 1990: 126- 163.
21 王建仁, 马鑫, 段刚龙. 改进的K-means聚类K值选择算法[J]. 计算机工程与应用, 2019, 55 (8): 27- 33.
WANG Jianren , MA Xin , DUAN Ganglong . Improved K-means clustering K-value selection algorithm[J]. Computer Engineering and Applications, 2019, 55 (8): 27- 33.
22 周治平, 朱书伟, 张道文. 分类数据的多目标模糊中心点聚类算法[J]. 计算机研究与发展, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467
ZHOU Zhiping , ZHU Shuwei , ZHANG Daowen . Multiobjective clustering algorithm with fuzzy centroids for categorical data[J]. Journal of Computer Research and Development, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467
23 刘勘, 袁蕴英. 基于自动编码器的短文本特征提取及聚类研究[J]. 北京大学学报(自然科学版), 2015, 51 (2): 282- 288.
LIU Kan , YUAN Yunying . Short texts feature extraction and clustering based on auto-encoder[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51 (2): 282- 288.
24 成静, 葛璐琦, 张涛, 等. 移动应用众包测试质量影响因素分析[J]. 计算机应用, 2018, 38 (9): 2626- 2630.
CHENG Jing , GE Luqi , ZHANG Tao , et al. Research on factors affecting quality of mobile application crowdsourced testing[J]. Journal of Computer Applications, 2018, 38 (9): 2626- 2630.
[1] Yanping CHEN,Li FENG,Yongbin QIN,Ruizhang HUANG. A syntactic element recognition method based on deep neural network [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 44-49.
[2] Wei YAN,Damin ZHANG,Huijuan ZHANG,Ziyun XI,Zhongyun CHEN. Improved bird swarm algorithms based on mixed decision making [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 34-43.
[3] Shiqi SONG,Yan PIAO,Zexin JIANG. Vehicle classification and tracking for complex scenes based on improved YOLOv3 [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 27-33.
[4] Ningning CHEN,Jianwei ZHAO,Zhenghua ZHOU. Visual tracking algorithm based on verifying networks [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 17-26.
[5] Yuenan ZHAO,Guiyou CHEN,Chen SUN,Ning LU,Liwei LIAO. Risk assessment method based on spatial hidden danger distribution and motion intention analysis [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 28-34.
[6] Jialin SU,Yuanzhuo WANG,Xiaolong JIN,Xueqi CHENG. Entity alignment method based on adaptive attribute selection [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 14-20.
[7] Guoyong CAI,Qiang LIN,Kaiqi REN. Cross-domain text sentiment classification based on domain-adversarialnetwork and BERT [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 1-7,20.
[8] Yuanxi YAO. Analysis of wind power convergence trend quantitation based on sub-scene reconstruction [J]. Journal of Shandong University(Engineering Science), 2019, 49(6): 86-92.
[9] Ji ZHANG,Cui JIN,Hongyuan WANG,Shoubing CHEN. Pedestrian recognition based on singular value decomposition pedestrian alignment network [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 91-97.
[10] Junmei YUE,Dongmei ZHANG. Lightweight self-adaptive CSI-based positioning algorithm in underground mine [J]. Journal of Shandong University(Engineering Science), 2019, 49(5): 112-118.
[11] Zongtang ZHANG,Sen WANG,Shilin SUN. An ensemble learning algorithm for unbalanced data classification [J]. Journal of Shandong University(Engineering Science), 2019, 49(4): 8-13.
[12] Xindi CHEN,Tianrui LI,Huanhuan YANG. Visualization of interactive ThemeRiver based on time-series data [J]. Journal of Shandong University(Engineering Science), 2019, 49(4): 29-35, 43.
[13] Jinchao HUANG. Object tracking algorithm based on deep residual features and entropy energy optimization [J]. Journal of Shandong University(Engineering Science), 2019, 49(4): 14-23.
[14] Jiachen WANG,Xianghong TANG,Jianguang LU. Research onfeature selection technology in bearing fault diagnosis [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 80-87, 95.
[15] Hongbin ZHANG,Diedie QIU,Renzhong WU,Tao ZHU,Jin HUA,Donghong JI. Image attribute annotation based on extreme gradient boosting algorithm [J]. Journal of Shandong University(Engineering Science), 2019, 49(2): 8-16.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] QIN Tong, SUN Fengrong*, WANG Limei, WANG Qinghao, LI Xincai. 3D surface reconstruction using the shape based interpolation guided by maximal discs[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 1 -5 .
[2] CHENG Daizhan, LI Zhiqiang. A survey on linearization of nonlinear systems[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 26 -36 .
[3] WANG Yong, XIE Yudong. Gas control technology of largeflow pipe[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(2): 70 -74 .
[4] LI Shijin, WANG Shengte, HUANG Leping. Change detection with remote sensing images based on forward-backward heterogenicity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 1 -9 .
[5] WANG Wei,MAO Hua-yong,LI Guo-xiang,PAN Shi-yan,GONG Ting-fang,JIN Shi-qiang,HAO Sheng-bing . Numerical simulation of the flow in a fuel burned vehicle heater[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 64 -68 .
[6] XU Yan-sheng,LIU Xing-fang . Application of the fuzzy clustering iterative model to the evalution of water resource carrying capacity[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2007, 37(3): 100 -104 .
[7] LI Xin-Ping, DAI Yi-Fei, HU Jing. Fluid-solid coupling analysis of surrounding rock mass stability and water inflow forecast of a tunnel in a karst zone[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 1 -6 .
[8] SUN Liang. The effect analysis of advanced detection of water interbed by TEM[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 50 -52 .
[9] SUN Huai-Feng, LI Shu-Cai, CUI Wei, QIU Dao-Hong, LIU Qin. Application of comprehensive geological predictionin open-cut tunnel detection[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(4): 69 -73 .
[10] LIU Dian-rui,ZHAO Hui-hong,ZHONG Mai-ying . The H∞ fault estimation for linear discrete time-varying systems[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(4): 11 -16 .