山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (2): 50-59.doi: 10.6040/j.issn.1672-3961.0.2019.403
Longmao HU1,2(),Xuegang HU1,*()
摘要:
针对现有相同产品特征识别方法受限于词典覆盖率或语料规模的不足,提出一种基于多维相似度和情感词扩充的识别方法。通过双向长短时记忆条件随机场(bi-directional long short-term memory and conditional random field, Bi-LSTM-CRF)模型抽取产品特征的扩充情感词,综合特征词的语素相似度、同义词林相似度和TF-IDF(term frequency-inverse document frequency)余弦相似度,采用K-medoids聚类算法,识别相同的产品特征。试验结果表明,在手机和笔记本数据集上,该方法的最大调整兰德指数分别达到0.579和0.595 9,而最小熵值分别达到0.782 6和0.745 7,均优于结合语素的调整Jaccard相似度、Word2Vec相似度和基于二分K-means的Word2Vec相似度三种基线试验方法。
中图分类号:
1 |
XU Hua , ZHANG Fan , WANG Wei . Implicit feature identification in Chinese reviews using explicit topic mining model[J]. Knowledge-Based Systems, 2015, 76, 166- 175.
doi: 10.1016/j.knosys.2014.12.012 |
2 | CARENINI G, NG R T, ZWART E. Extracting knowledge from evaluative text[C]// Proceedings of the 3rd International Conference on Knowledge Capture. New York, USA: ACM, 2005: 11-18. |
3 | 李昌兵, 庞崇鹏, 凌永亮, 等. 基于改进特征提取及聚类的网络评论挖掘研究[J]. 现代情报, 2018, 38 (2): 68- 74. |
LI Changbing , PANG Chongpeng , LING Yongliang , et al. Research on network review mining based on improved feature extraction and clustering[J]. Journal of Modern Information, 2018, 38 (2): 68- 74. | |
4 |
杨源, 马云龙, 林鸿飞. 评论挖掘中产品属性归类问题研究[J]. 中文信息学报, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018 |
YANG Yuan , MA Yunlong , LIN Hongfei . Clustering product features in opinion mining[J]. Journal of Chinese Information Processing, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018 |
|
5 | 刘丽珍, 赵新蕾, 王函石, 等. 基于产品特征的领域情感本体构建[J]. 北京理工大学学报, 2015, 35 (5): 538- 544. |
LIU Lizhen , ZHAO Xinlei , WANG Hanshi , et al. Constructing domain affective ontology based on product features[J]. Transactions of Beijing Institute of Technology, 2015, 35 (5): 538- 544. | |
6 | ZHAO L, HUANG M, CHEN H, et al. Clustering aspect-related phrases by leveraging sentiment distribution consistency[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014: 1614-1623. |
7 | ZHAO L, LUO H, LUO H, et al. Sentiment extraction by leveraging aspect-opinion association structure[C]// ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2015: 343-352. |
8 | 卿勇, 刘梦娟, 薛浩, 等. OPEN:一个基于评论的商品特征抽取及情感分析框架[J]. 计算机应用与软件, 2018, 35 (1): 65- 71. |
QING Yong , LIU Mengjuan , XUE Hao , et al. OPEN: a framwork for product feature extraction and sentiment analysis based on product comments[J]. Computer Applications and Software, 2018, 35 (1): 65- 71. | |
9 | 李良强, 袁华, 叶开, 等. 基于在线评论词向量表征的产品属性提取[J]. 系统工程学报, 2018, 33 (5): 687- 697. |
LI Liangqiang , YUAN Hua , YE Kai , et al. Extraction product features from online reviews based on word-vector-representation[J]. Journal of Systems Engineering, 2018, 33 (5): 687- 697. | |
10 | 江伟, 路松峰, 杨莉萍. 基于CBC-LIKE算法的产品特征词聚类的研究[J]. 现代电子技术, 2017, 40 (14): 81- 84. |
JIANG We , LU Songfeng , YANG Liping . Research on product feature words clustering based on CBC-LIKE algorithm[J]. Modern Electronics Technique, 2017, 40 (14): 81- 84. | |
11 | 李伟卿, 王伟军. 基于大规模评论数据的产品特征词典构建方法研究[J]. 数据分析与知识发现, 2018, 2 (1): 41- 50. |
LI Weiqing , WANG Weijun . Building product feature dictionary with large-scale review data[J]. Data Analysis and Knowledge Discovery, 2018, 2 (1): 41- 50. | |
12 | 何有世, 何述芳. 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘[J]. 数据分析与知识发现, 2018, 2 (8): 60- 68. |
HE Youshi , HE Shufang . Sentiment mining of online product reviews based on domain ontology[J]. Data Analysis and Knowledge Discovery, 2018, 2 (8): 60- 68. | |
13 | 王松松, 高伟勋, 徐逸凡. 基于路径与词林编码的词语相似度计算方法[J]. 计算机工程, 2018, 44 (10): 160- 167. |
WANG Songsong , GAO Weixun , XU Yifan . Word similarity calculation method based on path and cilin coding[J]. Computer Engineering, 2018, 44 (10): 160- 167. | |
14 | 陈宏朝, 李飞, 朱新华, 等. 基于路径与深度的同义词词林词语相似度计算[J]. 中文信息学报, 2016, 30 (5): 80- 88. |
CHEN Hongchao , LI Fei , ZHU Xinhua , et al. A path and depth-based approach to word semantic similarity calcalation in cilin[J]. Journal of Chinese Information Processing, 2016, 30 (5): 80- 88. | |
15 | 罗燕, 赵书良, 李晓超, 等. 基于词频统计的文本关键词提取方法[J]. 计算机应用, 2016, 36 (3): 718- 725. |
LUO Yan , ZHAO Shuliang , LI Xiaochao , et al. Text keyword extraction method based on word frequency statistics[J]. Journal of Computer Applications, 2016, 36 (3): 718- 725. | |
16 | HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv, 2015. https://arxiv.org/abs/1508.01991. |
17 | MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations. Scottsdale, USA: [s.n.], 2013: 1-12. |
18 | MIKOLOV T , DEAN J . Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26, 3111- 3119. |
19 | LI Shen, ZHAO Zhe, HU Renfen, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2018. |
20 | KAUFMAN L , ROUSSEEUW P J . Finding groups in data: an introduction to cluster analysis[M]. New York, USA: Wiley, 1990: 126- 163. |
21 | 王建仁, 马鑫, 段刚龙. 改进的K-means聚类K值选择算法[J]. 计算机工程与应用, 2019, 55 (8): 27- 33. |
WANG Jianren , MA Xin , DUAN Ganglong . Improved K-means clustering K-value selection algorithm[J]. Computer Engineering and Applications, 2019, 55 (8): 27- 33. | |
22 |
周治平, 朱书伟, 张道文. 分类数据的多目标模糊中心点聚类算法[J]. 计算机研究与发展, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467 |
ZHOU Zhiping , ZHU Shuwei , ZHANG Daowen . Multiobjective clustering algorithm with fuzzy centroids for categorical data[J]. Journal of Computer Research and Development, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467 |
|
23 | 刘勘, 袁蕴英. 基于自动编码器的短文本特征提取及聚类研究[J]. 北京大学学报(自然科学版), 2015, 51 (2): 282- 288. |
LIU Kan , YUAN Yunying . Short texts feature extraction and clustering based on auto-encoder[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51 (2): 282- 288. | |
24 | 成静, 葛璐琦, 张涛, 等. 移动应用众包测试质量影响因素分析[J]. 计算机应用, 2018, 38 (9): 2626- 2630. |
CHENG Jing , GE Luqi , ZHANG Tao , et al. Research on factors affecting quality of mobile application crowdsourced testing[J]. Journal of Computer Applications, 2018, 38 (9): 2626- 2630. |
[1] | 陈艳平,冯丽,秦永彬,黄瑞章. 一种基于深度神经网络的句法要素识别方法[J]. 山东大学学报 (工学版), 2020, 50(2): 44-49. |
[2] | 闫威,张达敏,张绘娟,辛梓芸,陈忠云. 基于混合决策的改进鸟群算法[J]. 山东大学学报 (工学版), 2020, 50(2): 34-43. |
[3] | 宋士奇,朴燕,蒋泽新. 基于改进YOLOv3的复杂场景车辆分类与跟踪[J]. 山东大学学报 (工学版), 2020, 50(2): 27-33. |
[4] | 陈宁宁,赵建伟,周正华. 基于校正神经网络的视频追踪算法[J]. 山东大学学报 (工学版), 2020, 50(2): 17-26. |
[5] | 赵越男,陈桂友,孙琛,卢宁,廖立伟. 基于空间隐患分布与运动意图解析的危险评估方法[J]. 山东大学学报 (工学版), 2020, 50(1): 28-34. |
[6] | 苏佳林,王元卓,靳小龙,程学旗. 自适应属性选择的实体对齐方法[J]. 山东大学学报 (工学版), 2020, 50(1): 14-20. |
[7] | 蔡国永,林强,任凯琪. 基于域对抗网络和BERT的跨领域文本情感分析[J]. 山东大学学报 (工学版), 2020, 50(1): 1-7,20. |
[8] | 姚元玺. 基于分场景重构的风电汇聚趋势性量化方法[J]. 山东大学学报 (工学版), 2019, 49(6): 86-92. |
[9] | 张继,金翠,王洪元,陈首兵. 基于奇异值分解行人对齐网络的行人重识别[J]. 山东大学学报 (工学版), 2019, 49(5): 91-97. |
[10] | 岳俊梅,张冬梅. 基于CSI的轻量级自适应井下定位算法[J]. 山东大学学报 (工学版), 2019, 49(5): 112-118. |
[11] | 张宗堂,王森,孙世林. 一种针对不平衡数据分类的集成学习算法[J]. 山东大学学报 (工学版), 2019, 49(4): 8-13. |
[12] | 陈馨菂,李天瑞,杨欢欢. 基于时间序列数据的交互式主题河流可视化[J]. 山东大学学报 (工学版), 2019, 49(4): 29-35, 43. |
[13] | 黄劲潮. 深度残差特征与熵能量优化运动目标跟踪算法[J]. 山东大学学报 (工学版), 2019, 49(4): 14-23. |
[14] | 汪嘉晨,唐向红,陆见光. 轴承故障诊断中特征选取技术[J]. 山东大学学报 (工学版), 2019, 49(2): 80-87, 95. |
[15] | 张红斌,邱蝶蝶,邬任重,朱涛,滑瑾,姬东鸿. 基于极端梯度提升树算法的图像属性标注[J]. 山东大学学报 (工学版), 2019, 49(2): 8-16. |
|