山东大学学报 (工学版) ›› 2020, Vol. 50 ›› Issue (2): 50-59.doi: 10.6040/j.issn.1672-3961.0.2019.403
Longmao HU1,2(
),Xuegang HU1,*(
)
摘要:
针对现有相同产品特征识别方法受限于词典覆盖率或语料规模的不足,提出一种基于多维相似度和情感词扩充的识别方法。通过双向长短时记忆条件随机场(bi-directional long short-term memory and conditional random field, Bi-LSTM-CRF)模型抽取产品特征的扩充情感词,综合特征词的语素相似度、同义词林相似度和TF-IDF(term frequency-inverse document frequency)余弦相似度,采用K-medoids聚类算法,识别相同的产品特征。试验结果表明,在手机和笔记本数据集上,该方法的最大调整兰德指数分别达到0.579和0.595 9,而最小熵值分别达到0.782 6和0.745 7,均优于结合语素的调整Jaccard相似度、Word2Vec相似度和基于二分K-means的Word2Vec相似度三种基线试验方法。
中图分类号:
| 1 |
XU Hua , ZHANG Fan , WANG Wei . Implicit feature identification in Chinese reviews using explicit topic mining model[J]. Knowledge-Based Systems, 2015, 76, 166- 175.
doi: 10.1016/j.knosys.2014.12.012 |
| 2 | CARENINI G, NG R T, ZWART E. Extracting knowledge from evaluative text[C]// Proceedings of the 3rd International Conference on Knowledge Capture. New York, USA: ACM, 2005: 11-18. |
| 3 | 李昌兵, 庞崇鹏, 凌永亮, 等. 基于改进特征提取及聚类的网络评论挖掘研究[J]. 现代情报, 2018, 38 (2): 68- 74. |
| LI Changbing , PANG Chongpeng , LING Yongliang , et al. Research on network review mining based on improved feature extraction and clustering[J]. Journal of Modern Information, 2018, 38 (2): 68- 74. | |
| 4 |
杨源, 马云龙, 林鸿飞. 评论挖掘中产品属性归类问题研究[J]. 中文信息学报, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018 |
|
YANG Yuan , MA Yunlong , LIN Hongfei . Clustering product features in opinion mining[J]. Journal of Chinese Information Processing, 2012, 26 (3): 104- 109.
doi: 10.3969/j.issn.1003-0077.2012.03.018 |
|
| 5 | 刘丽珍, 赵新蕾, 王函石, 等. 基于产品特征的领域情感本体构建[J]. 北京理工大学学报, 2015, 35 (5): 538- 544. |
| LIU Lizhen , ZHAO Xinlei , WANG Hanshi , et al. Constructing domain affective ontology based on product features[J]. Transactions of Beijing Institute of Technology, 2015, 35 (5): 538- 544. | |
| 6 | ZHAO L, HUANG M, CHEN H, et al. Clustering aspect-related phrases by leveraging sentiment distribution consistency[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014: 1614-1623. |
| 7 | ZHAO L, LUO H, LUO H, et al. Sentiment extraction by leveraging aspect-opinion association structure[C]// ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2015: 343-352. |
| 8 | 卿勇, 刘梦娟, 薛浩, 等. OPEN:一个基于评论的商品特征抽取及情感分析框架[J]. 计算机应用与软件, 2018, 35 (1): 65- 71. |
| QING Yong , LIU Mengjuan , XUE Hao , et al. OPEN: a framwork for product feature extraction and sentiment analysis based on product comments[J]. Computer Applications and Software, 2018, 35 (1): 65- 71. | |
| 9 | 李良强, 袁华, 叶开, 等. 基于在线评论词向量表征的产品属性提取[J]. 系统工程学报, 2018, 33 (5): 687- 697. |
| LI Liangqiang , YUAN Hua , YE Kai , et al. Extraction product features from online reviews based on word-vector-representation[J]. Journal of Systems Engineering, 2018, 33 (5): 687- 697. | |
| 10 | 江伟, 路松峰, 杨莉萍. 基于CBC-LIKE算法的产品特征词聚类的研究[J]. 现代电子技术, 2017, 40 (14): 81- 84. |
| JIANG We , LU Songfeng , YANG Liping . Research on product feature words clustering based on CBC-LIKE algorithm[J]. Modern Electronics Technique, 2017, 40 (14): 81- 84. | |
| 11 | 李伟卿, 王伟军. 基于大规模评论数据的产品特征词典构建方法研究[J]. 数据分析与知识发现, 2018, 2 (1): 41- 50. |
| LI Weiqing , WANG Weijun . Building product feature dictionary with large-scale review data[J]. Data Analysis and Knowledge Discovery, 2018, 2 (1): 41- 50. | |
| 12 | 何有世, 何述芳. 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘[J]. 数据分析与知识发现, 2018, 2 (8): 60- 68. |
| HE Youshi , HE Shufang . Sentiment mining of online product reviews based on domain ontology[J]. Data Analysis and Knowledge Discovery, 2018, 2 (8): 60- 68. | |
| 13 | 王松松, 高伟勋, 徐逸凡. 基于路径与词林编码的词语相似度计算方法[J]. 计算机工程, 2018, 44 (10): 160- 167. |
| WANG Songsong , GAO Weixun , XU Yifan . Word similarity calculation method based on path and cilin coding[J]. Computer Engineering, 2018, 44 (10): 160- 167. | |
| 14 | 陈宏朝, 李飞, 朱新华, 等. 基于路径与深度的同义词词林词语相似度计算[J]. 中文信息学报, 2016, 30 (5): 80- 88. |
| CHEN Hongchao , LI Fei , ZHU Xinhua , et al. A path and depth-based approach to word semantic similarity calcalation in cilin[J]. Journal of Chinese Information Processing, 2016, 30 (5): 80- 88. | |
| 15 | 罗燕, 赵书良, 李晓超, 等. 基于词频统计的文本关键词提取方法[J]. 计算机应用, 2016, 36 (3): 718- 725. |
| LUO Yan , ZHAO Shuliang , LI Xiaochao , et al. Text keyword extraction method based on word frequency statistics[J]. Journal of Computer Applications, 2016, 36 (3): 718- 725. | |
| 16 | HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv, 2015. https://arxiv.org/abs/1508.01991. |
| 17 | MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations. Scottsdale, USA: [s.n.], 2013: 1-12. |
| 18 | MIKOLOV T , DEAN J . Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26, 3111- 3119. |
| 19 | LI Shen, ZHAO Zhe, HU Renfen, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2018. |
| 20 | KAUFMAN L , ROUSSEEUW P J . Finding groups in data: an introduction to cluster analysis[M]. New York, USA: Wiley, 1990: 126- 163. |
| 21 | 王建仁, 马鑫, 段刚龙. 改进的K-means聚类K值选择算法[J]. 计算机工程与应用, 2019, 55 (8): 27- 33. |
| WANG Jianren , MA Xin , DUAN Ganglong . Improved K-means clustering K-value selection algorithm[J]. Computer Engineering and Applications, 2019, 55 (8): 27- 33. | |
| 22 |
周治平, 朱书伟, 张道文. 分类数据的多目标模糊中心点聚类算法[J]. 计算机研究与发展, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467 |
|
ZHOU Zhiping , ZHU Shuwei , ZHANG Daowen . Multiobjective clustering algorithm with fuzzy centroids for categorical data[J]. Journal of Computer Research and Development, 2016, 53 (11): 2594- 2606.
doi: 10.7544/issn1000-1239.2016.20150467 |
|
| 23 | 刘勘, 袁蕴英. 基于自动编码器的短文本特征提取及聚类研究[J]. 北京大学学报(自然科学版), 2015, 51 (2): 282- 288. |
| LIU Kan , YUAN Yunying . Short texts feature extraction and clustering based on auto-encoder[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51 (2): 282- 288. | |
| 24 | 成静, 葛璐琦, 张涛, 等. 移动应用众包测试质量影响因素分析[J]. 计算机应用, 2018, 38 (9): 2626- 2630. |
| CHENG Jing , GE Luqi , ZHANG Tao , et al. Research on factors affecting quality of mobile application crowdsourced testing[J]. Journal of Computer Applications, 2018, 38 (9): 2626- 2630. |
| [1] | 邓彬, 张宗包, 赵文猛, 罗新航, 吴秋伟. 基于云边协同和图神经网络的电动汽车充电站负荷预测方法[J]. 山东大学学报 (工学版), 2025, 55(5): 62-69. |
| [2] | 李二超, 张智钊. 在线动态订单需求车辆路径规划[J]. 山东大学学报 (工学版), 2024, 54(5): 62-73. |
| [3] | 杨巨成, 魏峰, 林亮, 贾庆祥, 刘建征. 驾驶员疲劳驾驶检测研究综述[J]. 山东大学学报 (工学版), 2024, 54(2): 1-12. |
| [4] | 肖伟, 郑更生, 陈钰佳. 结合自训练模型的命名实体识别方法[J]. 山东大学学报 (工学版), 2024, 54(2): 96-102. |
| [5] | 胡钢, 王乐萌, 卢志宇, 王琴, 徐翔. 基于节点多阶邻居递阶关联贡献度的重要性辨识[J]. 山东大学学报 (工学版), 2024, 54(1): 1-10. |
| [6] | 李家春,李博文,常建波. 一种高效且轻量的RGB单帧人脸反欺诈模型[J]. 山东大学学报 (工学版), 2023, 53(6): 1-7. |
| [7] | 樊禹江,黄欢欢,丁佳雄,廖凯,余滨杉. 基于云模型的老旧小区韧性评价体系[J]. 山东大学学报 (工学版), 2023, 53(5): 1-9, 19. |
| [8] | 李颖,王建坤. 基于监督图正则化和信息融合的轻度认知障碍分类方法[J]. 山东大学学报 (工学版), 2023, 53(4): 65-73. |
| [9] | 余明骏,刁红军,凌兴宏. 基于轨迹掩膜的在线多目标跟踪方法[J]. 山东大学学报 (工学版), 2023, 53(2): 61-69. |
| [10] | 刘行,杨璐,郝凡昌. 基于多特征融合的手指静脉图像检索方法[J]. 山东大学学报 (工学版), 2023, 53(2): 118-126. |
| [11] | 刘方旭,王建,魏本征. 基于多空间注意力的小儿肺炎辅助诊断算法[J]. 山东大学学报 (工学版), 2023, 53(2): 135-142. |
| [12] | 于艺旋,杨耕,耿华. 连续复合运动的多模态层次化关键帧提取方法[J]. 山东大学学报 (工学版), 2023, 53(2): 42-50. |
| [13] | 黄华娟,程前,韦修喜,于楚楚. 融合Jaya高斯变异的自适应乌鸦搜索算法[J]. 山东大学学报 (工学版), 2023, 53(2): 11-22. |
| [14] | 张豪,李子凌,刘通,张大伟,陶建华. 融合社会学因素的模糊贝叶斯网技术预测模型[J]. 山东大学学报 (工学版), 2023, 53(2): 23-33. |
| [15] | 吴艳丽,刘淑薇,何东晓,王晓宝,金弟. 刻画多种潜在关系的泊松-伽马主题模型[J]. 山东大学学报 (工学版), 2023, 53(2): 51-60. |
|