您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (1): 19-23.doi: 10.6040/j.issn.1672-3961.1.2014.250

• 机器学习与数据挖掘 • 上一篇    下一篇

一种基于动态词典和三支决策的情感分析方法

周哲1,2, 商琳1,2   

  1. 1. 南京大学计算机科学与技术系, 江苏 南京 210046;
    2. 南京大学计算机软件新技术国家重点实验室, 江苏 南京 210046
  • 收稿日期:2014-05-23 修回日期:2014-10-15 发布日期:2014-05-23
  • 通讯作者: 商琳(1973-),女,江苏南京人,博士,副教授,主要研究方向为数据挖掘,人工智能和粗糙集.E-mail:shanglin@nju.edu.cn E-mail:shanglin@nju.edu.cn
  • 作者简介:周哲(1991-),男,江苏南京人,硕士研究生,主要研究方向为文本情感分析.E-mail:mahirunozuki1@gmail.com
  • 基金资助:
    国家自然科学基金面上项目(61170180)

A sentiment analysis method based on dynamic lexicon and three-way decision

ZHOU Zhe1,2, SHANG Lin1,2   

  1. 1. Department of Computer Science and Technology, Nanjing University, Nanjing 210046, Jiangsu, China;
    2. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210046, Jiangsu, China
  • Received:2014-05-23 Revised:2014-10-15 Published:2014-05-23

摘要: 提出了一种新的特征提取方式,与三支决策思想相结合,运用在文本情感分析中,以提高分类器的效率。根据训练集合创建动态情感词典,然后根据情感词典提取文本的抽象特征,形成特征矩阵。在分类过程中,如果分类器对于目标文本的所属分类确信程度不够高,那么分类器会利用三支决策的思想,将文本置于边界域中,等待别的处理方法。实验结果表明,在英文影评数据集上,基于动态词典的特征提取方法可以取得更好的分类准确率,而且三支决策规则可将一些样例放入边界域,提高了分类准确率。

关键词: 情感分析, 三支决策, 文本数据挖掘, 特征抽取, 观点挖掘

Abstract: A new way of feature extraction and the concept of three-way decision was utilized in traditional text sentiment analysis methods in order to boost the classification accuracy. In the new method, a dynamic lexicon was introduced according to the training set and was utilized to extract abstract features for every piece of text to form the feature matrix. Besides, in the classification process, target texts with which the classifier had low confidence of sentiment labels were put into the boundary region for later decision. Experimental results showed that the method reached better results with the help of dynamic sentiment lexicon, and the three-way decision also raised the accuracy of classification.

Key words: text data mining, three-way decision, sentiment analysis, opinion mining, feature extraction

中图分类号: 

  • TP391
[1] PANG B, LEE L, VAITHYANATHAN S. Thumbs up?: sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Philadelphia,USA: Association for Computational Linguistics, 2002:79-86.
[2] PANG B, LEE L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts[C]//Proceedings of the 42nd Annual meeting on Association for Computational Linguistics. Barcelona, Spain: Association for Computational Linguistics, 2004: 271.
[3] BACCIANELLA S, ESULI A, SEBASTIANI F. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining[C]//Proceedings of the International Conference on Language Resources and Evaluation(LREC). Valletta, Malta:LREC, 2010:2200-2204.
[4] TABOADA M, BROOKE J, TOFILOSKI M, et al. Lexicon-based methods for sentiment analysis[J]. Computational Linguistics, 2011, 37(2): 267-307.
[5] TURNEY P D. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Philadelphia, USA: Association for Computational Linguistics, 2002: 417-424.
[6] LI Gang, LIU Fei. A clustering-based approach on sentiment analysis[C]//Intelligent Systems and Knowledge Engineering (ISKE), 2010 International Conference on. Hangzhou, China: IEEE, 2010: 331-337.
[7] OFEK N, CARAGEA C, ROKACH L, et al. A Suite of Core NLP Tools[DB/OL].(2010-11-16)[2014-03-26].http://nlp.stanford.edu/software/corenlp.shtml.
[8] SINGH V K, PIRYANI R, UDDIN A, et al. Sentiment analysis of textual reviews: Evaluating machine learning, unsupervised and SentiWordNet approaches[C]//Knowledge and Smart Technology (KST), 2013 5th International Conference on. Chonburi, Thailand: IEEE, 2013: 122-127.
[9] OFEK N, CARAGEA C, ROKACH L, et al. Improving sentiment analysis in an online cancer survivor community using dynamic sentiment lexicon[C]//Social Intelligence and Technology (SOCIETY), 2013 International Conference on. Pennsylvania, USA: IEEE, 2013: 109-113.
[10] YAO Yiyu. Three-way decision: an interpretation of rules in rough set theory[M]//Rough Sets and Knowledge Technology. Berlin Heidelberg: Springer, 2009: 642-649.
[11] ZHOU Bing, YAO Yiyu, LUO Jigang. A three-way decision approach to email spam filtering[M]//Advances in Artificial Intelligence. Berlin Heidelberg: Springer, 2010: 28-39.
[1] 杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21.
[2] 蔡国永,贺歆灏,储阳阳. 基于空间注意力和卷积神经网络的视觉情感分析[J]. 山东大学学报 (工学版), 2020, 50(4): 8-13.
[3] 蔡国永, 林强, 任凯琪. 基于域对抗网络和BERT的跨领域文本情感分析[J]. 山东大学学报 (工学版), 2020, 50(1): 1-7.
[4] 钱春琳,张兴芳,孙丽华. 基于在线评论情感分析的改进协同过滤推荐模型[J]. 山东大学学报 (工学版), 2019, 49(1): 47-54.
[5] 周荣翔,贾修一. 中文反语识别特征分析[J]. 山东大学学报 (工学版), 2019, 49(1): 41-46.
[6] 沈冀,马志强,李图雅,张力. 面向短文本情感分析的词扩充LDA模型[J]. 山东大学学报(工学版), 2018, 48(3): 120-126.
[7] 张玉玲,尹传环. 基于SVM的安卓恶意软件检测[J]. 山东大学学报(工学版), 2017, 47(1): 42-47.
[8] 周咏梅1,阳爱民1,林江豪2. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3): 36-40.
[9] 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33.
[10] 解洪胜,张虹 . 基于支持向量机的图像纹理识别方法[J]. 山东大学学报(工学版), 2006, 36(6): 95-99 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张永花,王安玲,刘福平 . 低频非均匀电磁波在导电界面的反射相角[J]. 山东大学学报(工学版), 2006, 36(2): 22 -25 .
[2] 关小军,韩振强,申孝民,麻晓飞,刘运腾 . 09CuPTiRE钢动态再结晶的热模拟实验与有限元模拟[J]. 山东大学学报(工学版), 2006, 36(5): 17 -20 .
[3] 王勇, 谢玉东.

大流量管道煤气的控制技术研究

[J]. 山东大学学报(工学版), 2009, 39(2): 70 -74 .
[4] 陈华鑫, 陈拴发, 王秉纲. 基质沥青老化行为与老化机理[J]. 山东大学学报(工学版), 2009, 39(2): 125 -130 .
[5] 赵科军 王新军 刘洋 仇一泓. 基于结构化覆盖网的连续 top-k 联接查询算法[J]. 山东大学学报(工学版), 2009, 39(5): 32 -37 .
[6] 孔宪明 鞠培军. 一类中立型不确定变时滞系统的稳定性新判据[J]. 山东大学学报(工学版), 2009, 39(5): 48 -51 .
[7] 姚占勇,商庆森,赵之仲,贾朝霞 . 界面条件对半刚性沥青路面结构应力分布的影响[J]. 山东大学学报(工学版), 2007, 37(3): 93 -99 .
[8] 周晓林,曾广周 . 一种基于P2P的工作流管理系统设计[J]. 山东大学学报(工学版), 2007, 37(5): 89 -94 .
[9] 张庆松 李术才 韩宏伟 葛颜慧 刘人太 张霄. 岩溶隧道施工风险评价与突水灾害防治技术研究[J]. 山东大学学报(工学版), 2009, 39(3): 106 -110 .
[10] 王建平,王淑华,耿贵立 . InN半导体纳米晶相变活化能的研究[J]. 山东大学学报(工学版), 2008, 38(2): 42 -44 .