您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 15-18.doi: 10.6040/j.issn.1672-3961.1.2014.108

• 机器学习与数据挖掘 • 上一篇    下一篇

基于扩展情感词典及特征加权的情感挖掘方法

徐晓丹, 段正杰, 陈中育   

  1. 浙江师范大学数理与信息工程学院, 浙江 金华 321004
  • 收稿日期:2014-05-12 修回日期:2014-10-28 出版日期:2014-12-20 发布日期:2014-05-12
  • 作者简介:徐晓丹(1978-),女,浙江东阳人,讲师,硕士,主要研究方向为数据挖掘,Web挖掘,自然语言处理.E-mail:xuxiaodan@zjnu.cn
  • 基金资助:
    浙江省教育厅科研资助项目(Y201328291);浙江省语委十二五科研规划资助项目(ZY2011C77);国家自然科学基金资助项目(61272007)

The sentiment mining method based on extended sentiment dictionary and integrated features

XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu   

  1. Mathematics, Physics and Information Engineering College, Zhejiang Normal University, Jinhua 321004, Zhejiang, China
  • Received:2014-05-12 Revised:2014-10-28 Online:2014-12-20 Published:2014-05-12

摘要: 针对情感分类中采用单一特征分类精度不高的问题,提出多特征加权的分类算法:根据扩展的情感词典计算每个词的情感倾向度,经CHI特征选择后,根据情感词的极性强度调整贝叶斯分类模型中该词的正负后验概率,在原值的基础上加上极性强度影响值。实验将该方法和其他3种单特征选择方法在酒店、影视等语料上的分类精度进行了对比,分类精度得到提升。实验结果表明,将词语的情感倾向度的特征融入到分类器中方法,在有效提高情感倾向性分类精度的同时降低了特征维数。

关键词: 特征选择, 情感挖掘, 倾向性分析, 情感词典, 分类

Abstract: In the traditional classification method, only one feature was considered, that was not good enough for the precision. In order to improve the precision, a classification method based on integrated features was provided. First, the emotional tendency value of one word was calculated according to an extended sentiment dictionary; then after the CHI selection, the weights of the positive and negative emotion word posterior probability in the Bayesian model were adjusted acrodding to its tendency value. In the experiments, four kinds of corpus such as hotel and movie reviews were used, compared with other three methods, the integrated features method was better. The results showed the precision of classification was improved and the dimension of the feature was reduced.

Key words: orientation analysis, sentiment lexicon, sentiment mining, feature-selection, classificaiton

中图分类号: 

  • TP391
[1] PANG Bo, LEE Lillian. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2):11-35.
[2] LIN Weihao, WILSON Theresa, WIEBE Janyce. Identifying perspectives at the document and sentence levels[C]//Proceeding of the Conference on Natural Language Learning (CoNLL). Morristown:ACL Press, 2006:109-116.
[3] KIM Soomin, HOVY Eduard. Crystal: Analyzing predictive opinions on the Web[C]//Proceeding of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Morristown: ACL Press, 2007:1056-1064.
[4] 赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848. ZHAO Yanyan, QIN Bing, LIU Ting. Sentiment analysis[J]. Journal of Software, 2010, 21(8):1834-1848.
[5] 吴琼,谭松波,程学旗.中文情感倾向性分析的相关研究进展[J].信息技术快报,2010,8:16-31. WU qiong, TAN Songbo, CHENG Xueqi.The progress in the study of chinese text orientation analysis[J]. Information Technology Letter, 2010, 8:16-31.
[6] HU Mingqing, LIU Bing. Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York: ACM Press, 2004:168-177.
[7] PANG Bo, LEE Lillian, VAITHYANATHAN Shivakumar.Sentiment classification using machine learning techniques[C]//Proceeding of Empirical Methods in Natural Language Processing.Morristown:ACL Press, 2002:79-86.
[8] YU Hong, HATZIVASSILOGLOU Vasileios.Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences[C]//Proceedings of the EMNLP 2003. Morristown: ACL Press, 2003:129-136.
[9] RAO Delip, RAVICHANDRAN Deepak. Semi-Supervised polarity lexicon induction[C]//Proceedings of the EACL 2009. Morristown:ACL Press, 2009:675-682.
[10] TAKAMURA Hiroya, INUI Takashi, OKUMURA Manabu. Extracting semantic orientation of words using spin model[C]//Proceedings of the Association for Computational Linguistics. Morristown: ACL Press, 2005:133-140.
[11] LIU Qun, LI Sujian. Word similarity computing based on howNet[C]//Proceedings of the 3th Chinese Lexical Semantic Workshop. Taibei:CLSW Press, 2002:45-56. [12] 江敏,肖诗斌,王弘蔚,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报, 2008, 22(5):84-89. JIANG Min, XIAO Shibin, WANG Hongwei, et al. An improved word similarity computing method based on hownet[J].Journal of Chinese Information Processing, 2008, 22(5):84-89.
[13] 朱嫣岚,闵锦,周雅倩,等.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. ZHU Yanlan, MIN Jin, ZHOU Yaqian, et al.Sementic orientation computing based on howNet[J]. Journal of Chinese Information Processing, 2006, 20(1):14-20.
[14] TURNEY Peter. Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of ACL. Morristown:ACL Press, 2002:417-424.
[15] 杨超,冯时,王大玲,等.基于情感扩展技术的网络舆情倾向性分析[J].小型微型计算机系统,2010,04: 691-695. YANG Chao, FENG Shi, WANG Daling, et al. Analysis on Web public opinion orientation on extending sentiment lexicon[J].Journal of Chinese Computer System, 2010, 04: 691-695.
[16] KU Lunwei, LO Yongsheng, CHEN Hsinhsi. Using opinion scores ofwords for sentence-level opinion extraction[C]//Proceedings of the 6th NACSIS Test Collections for IR Workshop Meeting on Evaluation of Information Access Technologies.Tokyo:NTCIR Press, 2007:316-322.
[17] YANG Yiming, PEDERSEN Jan. A comparative study on feature selection in text categorization[C]//Proceeding of the 14th International Conference on Machine Learning.San Francisco: Morgan Kaufmann Press, 1997: 412-420.
[1] 唐杰烽,张佳,龙锦益. 基于全局冗余最小的快速多标签特征选择方法[J]. 山东大学学报 (工学版), 2025, 55(6): 21-34.
[2] 吴正健,吾尔尼沙·买买提,杨耀威,阿力木江·艾沙,库尔班·吾布力. 基于DRCoALTP的印刷体文档图像多文种识别方法[J]. 山东大学学报 (工学版), 2025, 55(1): 51-57.
[3] 白琳,俱通,王浩,雷明珠,潘晓英. 面向不平衡数据的提升均衡集成学习算法[J]. 山东大学学报 (工学版), 2024, 54(4): 59-66.
[4] 陈晓江,杨晓奇,陈广豪,刘伍颖. 混合BERT和宽度学习的低时间复杂度短文本分类[J]. 山东大学学报 (工学版), 2024, 54(4): 51-58.
[5] 宋辉,张轶哲,张功萱,孟元. 基于类权重和最小化预测熵的测试时集成方法[J]. 山东大学学报 (工学版), 2024, 54(3): 36-43.
[6] 聂秀山,巩蕊,董飞,郭杰,马玉玲. 短视频场景分类方法综述[J]. 山东大学学报 (工学版), 2024, 54(3): 1-11.
[7] 徐金华,罗义凯,李昱燃,李岩. 基于时频分解与深度学习的轨道客流预测[J]. 山东大学学报 (工学版), 2024, 54(2): 60-68.
[8] 马坤,刘筱云,李乐平,纪科,陈贞翔,杨波. 用于意图识别的自适应多标签信息学习模型[J]. 山东大学学报 (工学版), 2024, 54(1): 45-51.
[9] 于泓,杜娟,魏琳,张利. 计及行为特征的市场化用户电量数据拟合方法[J]. 山东大学学报 (工学版), 2023, 53(4): 113-119.
[10] 李颖,王建坤. 基于监督图正则化和信息融合的轻度认知障碍分类方法[J]. 山东大学学报 (工学版), 2023, 53(4): 65-73.
[11] 张喜龙,韩萌,陈志强,武红鑫,李慕航. 动态集成选择的不平衡漂移数据流Boosting分类算法[J]. 山东大学学报 (工学版), 2023, 53(4): 83-92.
[12] 刘财辉,周琪,叶晓文. 一种基于改进ReliefF算法的入侵检测模型[J]. 山东大学学报 (工学版), 2023, 53(2): 1-10.
[13] 许传臻,袭肖明,李维翠,孙仪,杨璐. 基于自适应多分辨率特征学习的CNV分型网络[J]. 山东大学学报 (工学版), 2022, 52(4): 69-75.
[14] 袁高腾,周晓峰,郭宏乐. 基于特征选择算法的ECG信号分类[J]. 山东大学学报 (工学版), 2022, 52(4): 38-44.
[15] 孟令灿,聂秀山,张雪. 基于遮挡目标去除的公交车拥挤度分类算法[J]. 山东大学学报 (工学版), 2022, 52(4): 83-88.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[2] 王杉,李田泽 . 一种绕线转子感应电机控制的新方法[J]. 山东大学学报(工学版), 2008, 38(3): 86 -89 .
[3] 蔡晓军1 ,张擎1 ,柴乔林1 ,孔苏丽2 . 基于能量均衡的n分多路径路由算法[J]. 山东大学学报(工学版), 2009, 39(2): 141 -145 .
[4] 薛成骞,董建文,孟宪锋,常虹,曹宁,陈华英,李木森 . C/C+HA骨植入材料对杂交波尔山羊生理生化机能的影响[J]. 山东大学学报(工学版), 2008, 38(3): 73 -76 .
[5] 孙媛媛 徐衍亮 姚之宁. 旁磁制动单相感应电动机制动力的分析与计算[J]. 山东大学学报(工学版), 2009, 39(5): 120 -123 .
[6] 庞志俭 张长桥. 甲基丙烯酸十二酯基二元共聚制备缔合减阻剂的合成与性能研究[J]. 山东大学学报(工学版), 2009, 39(5): 128 -132 .
[7] 孟健, 李贻斌, 李彬. 四足机器人跳跃步态控制方法[J]. 山东大学学报(工学版), 2015, 45(3): 28 -34 .
[8] 马士伟 梅志荣 张军伟 杜俊. 岩溶隧道涌突水灾害预警与防治技术[J]. 山东大学学报(工学版), 2009, 39(4): 12 -16 .
[9] 王凯,孙奉仲,赵元宾,高明,高山 . 自然通风冷却塔进风口流场模型的建立及计算[J]. 山东大学学报(工学版), 2008, 38(1): 13 -17 .
[10] 翟新献 陈东海 郭念波 勾攀峰. 济三煤矿沿空巷道矿压显现规律研究[J]. 山东大学学报(工学版), 2009, 39(4): 92 -96 .