您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 15-18.doi: 10.6040/j.issn.1672-3961.1.2014.108

• 机器学习与数据挖掘 • 上一篇    下一篇

基于扩展情感词典及特征加权的情感挖掘方法

徐晓丹, 段正杰, 陈中育   

  1. 浙江师范大学数理与信息工程学院, 浙江 金华 321004
  • 收稿日期:2014-05-12 修回日期:2014-10-28 出版日期:2014-12-20 发布日期:2014-05-12
  • 作者简介:徐晓丹(1978-),女,浙江东阳人,讲师,硕士,主要研究方向为数据挖掘,Web挖掘,自然语言处理.E-mail:xuxiaodan@zjnu.cn
  • 基金资助:
    浙江省教育厅科研资助项目(Y201328291);浙江省语委十二五科研规划资助项目(ZY2011C77);国家自然科学基金资助项目(61272007)

The sentiment mining method based on extended sentiment dictionary and integrated features

XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu   

  1. Mathematics, Physics and Information Engineering College, Zhejiang Normal University, Jinhua 321004, Zhejiang, China
  • Received:2014-05-12 Revised:2014-10-28 Online:2014-12-20 Published:2014-05-12

摘要: 针对情感分类中采用单一特征分类精度不高的问题,提出多特征加权的分类算法:根据扩展的情感词典计算每个词的情感倾向度,经CHI特征选择后,根据情感词的极性强度调整贝叶斯分类模型中该词的正负后验概率,在原值的基础上加上极性强度影响值。实验将该方法和其他3种单特征选择方法在酒店、影视等语料上的分类精度进行了对比,分类精度得到提升。实验结果表明,将词语的情感倾向度的特征融入到分类器中方法,在有效提高情感倾向性分类精度的同时降低了特征维数。

关键词: 特征选择, 情感挖掘, 倾向性分析, 情感词典, 分类

Abstract: In the traditional classification method, only one feature was considered, that was not good enough for the precision. In order to improve the precision, a classification method based on integrated features was provided. First, the emotional tendency value of one word was calculated according to an extended sentiment dictionary; then after the CHI selection, the weights of the positive and negative emotion word posterior probability in the Bayesian model were adjusted acrodding to its tendency value. In the experiments, four kinds of corpus such as hotel and movie reviews were used, compared with other three methods, the integrated features method was better. The results showed the precision of classification was improved and the dimension of the feature was reduced.

Key words: orientation analysis, sentiment lexicon, sentiment mining, feature-selection, classificaiton

中图分类号: 

  • TP391
[1] PANG Bo, LEE Lillian. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2):11-35.
[2] LIN Weihao, WILSON Theresa, WIEBE Janyce. Identifying perspectives at the document and sentence levels[C]//Proceeding of the Conference on Natural Language Learning (CoNLL). Morristown:ACL Press, 2006:109-116.
[3] KIM Soomin, HOVY Eduard. Crystal: Analyzing predictive opinions on the Web[C]//Proceeding of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Morristown: ACL Press, 2007:1056-1064.
[4] 赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848. ZHAO Yanyan, QIN Bing, LIU Ting. Sentiment analysis[J]. Journal of Software, 2010, 21(8):1834-1848.
[5] 吴琼,谭松波,程学旗.中文情感倾向性分析的相关研究进展[J].信息技术快报,2010,8:16-31. WU qiong, TAN Songbo, CHENG Xueqi.The progress in the study of chinese text orientation analysis[J]. Information Technology Letter, 2010, 8:16-31.
[6] HU Mingqing, LIU Bing. Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York: ACM Press, 2004:168-177.
[7] PANG Bo, LEE Lillian, VAITHYANATHAN Shivakumar.Sentiment classification using machine learning techniques[C]//Proceeding of Empirical Methods in Natural Language Processing.Morristown:ACL Press, 2002:79-86.
[8] YU Hong, HATZIVASSILOGLOU Vasileios.Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences[C]//Proceedings of the EMNLP 2003. Morristown: ACL Press, 2003:129-136.
[9] RAO Delip, RAVICHANDRAN Deepak. Semi-Supervised polarity lexicon induction[C]//Proceedings of the EACL 2009. Morristown:ACL Press, 2009:675-682.
[10] TAKAMURA Hiroya, INUI Takashi, OKUMURA Manabu. Extracting semantic orientation of words using spin model[C]//Proceedings of the Association for Computational Linguistics. Morristown: ACL Press, 2005:133-140.
[11] LIU Qun, LI Sujian. Word similarity computing based on howNet[C]//Proceedings of the 3th Chinese Lexical Semantic Workshop. Taibei:CLSW Press, 2002:45-56. [12] 江敏,肖诗斌,王弘蔚,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报, 2008, 22(5):84-89. JIANG Min, XIAO Shibin, WANG Hongwei, et al. An improved word similarity computing method based on hownet[J].Journal of Chinese Information Processing, 2008, 22(5):84-89.
[13] 朱嫣岚,闵锦,周雅倩,等.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. ZHU Yanlan, MIN Jin, ZHOU Yaqian, et al.Sementic orientation computing based on howNet[J]. Journal of Chinese Information Processing, 2006, 20(1):14-20.
[14] TURNEY Peter. Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of ACL. Morristown:ACL Press, 2002:417-424.
[15] 杨超,冯时,王大玲,等.基于情感扩展技术的网络舆情倾向性分析[J].小型微型计算机系统,2010,04: 691-695. YANG Chao, FENG Shi, WANG Daling, et al. Analysis on Web public opinion orientation on extending sentiment lexicon[J].Journal of Chinese Computer System, 2010, 04: 691-695.
[16] KU Lunwei, LO Yongsheng, CHEN Hsinhsi. Using opinion scores ofwords for sentence-level opinion extraction[C]//Proceedings of the 6th NACSIS Test Collections for IR Workshop Meeting on Evaluation of Information Access Technologies.Tokyo:NTCIR Press, 2007:316-322.
[17] YANG Yiming, PEDERSEN Jan. A comparative study on feature selection in text categorization[C]//Proceeding of the 14th International Conference on Machine Learning.San Francisco: Morgan Kaufmann Press, 1997: 412-420.
[1] 牟廉明. 自适应特征选择加权k子凸包分类[J]. 山东大学学报(工学版), 2018, 48(5): 32-37.
[2] 张璞,刘畅,王永. 基于特征融合和集成学习的建议语句分类模型[J]. 山东大学学报(工学版), 2018, 48(5): 47-54.
[3] 曹雅,邓赵红,王士同. 基于单调约束的径向基函数神经网络模型[J]. 山东大学学报(工学版), 2018, 48(3): 127-133.
[4] 龙柏,曾宪宇,李徵,刘淇. 电商商品嵌入表示分类方法[J]. 山东大学学报(工学版), 2018, 48(3): 17-24.
[5] 林江豪,周咏梅,阳爱民,陈锦. 基于词向量的领域情感词典构建[J]. 山东大学学报(工学版), 2018, 48(3): 40-47.
[6] 谢志峰,吴佳萍,马利庄. 基于卷积神经网络的中文财经新闻分类方法[J]. 山东大学学报(工学版), 2018, 48(3): 34-39.
[7] 王婷婷,翟俊海,张明阳,郝璞. 基于HBase和SimHash的大数据K-近邻算法[J]. 山东大学学报(工学版), 2018, 48(3): 54-59.
[8] 陈嘉杰,王金凤. 基于蚁群算法求解Choquet模糊积分模型[J]. 山东大学学报(工学版), 2018, 48(3): 81-87.
[9] 王换,周忠眉. 一种基于聚类的过抽样算法[J]. 山东大学学报(工学版), 2018, 48(3): 134-139.
[10] 叶明全,高凌云,万春圆. 基于人工蜂群和SVM的基因表达数据分类[J]. 山东大学学报(工学版), 2018, 48(3): 10-16.
[11] 王磊,邓晓刚,曹玉苹,田学民. 基于MLFDA的化工过程故障模式分类方法[J]. 山东大学学报(工学版), 2017, 47(5): 179-186.
[12] 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42.
[13] 何其佳,刘振丙,徐涛,蒋淑洁. 基于LBP和极限学习机的脑部MR图像分类[J]. 山东大学学报(工学版), 2017, 47(2): 86-93.
[14] 郭超,杨燕,江永全,宋祎. 基于多视图分类集成的高铁工况识别[J]. 山东大学学报(工学版), 2017, 47(1): 7-14.
[15] 方昊,李云. 基于多次随机欠采样和POSS方法的软件缺陷检测[J]. 山东大学学报(工学版), 2017, 47(1): 15-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!