山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 15-18.doi: 10.6040/j.issn.1672-3961.1.2014.108
徐晓丹, 段正杰, 陈中育
XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu
摘要: 针对情感分类中采用单一特征分类精度不高的问题,提出多特征加权的分类算法:根据扩展的情感词典计算每个词的情感倾向度,经CHI特征选择后,根据情感词的极性强度调整贝叶斯分类模型中该词的正负后验概率,在原值的基础上加上极性强度影响值。实验将该方法和其他3种单特征选择方法在酒店、影视等语料上的分类精度进行了对比,分类精度得到提升。实验结果表明,将词语的情感倾向度的特征融入到分类器中方法,在有效提高情感倾向性分类精度的同时降低了特征维数。
中图分类号:
[1] PANG Bo, LEE Lillian. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2):11-35. [2] LIN Weihao, WILSON Theresa, WIEBE Janyce. Identifying perspectives at the document and sentence levels[C]//Proceeding of the Conference on Natural Language Learning (CoNLL). Morristown:ACL Press, 2006:109-116. [3] KIM Soomin, HOVY Eduard. Crystal: Analyzing predictive opinions on the Web[C]//Proceeding of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Morristown: ACL Press, 2007:1056-1064. [4] 赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848. ZHAO Yanyan, QIN Bing, LIU Ting. Sentiment analysis[J]. Journal of Software, 2010, 21(8):1834-1848. [5] 吴琼,谭松波,程学旗.中文情感倾向性分析的相关研究进展[J].信息技术快报,2010,8:16-31. WU qiong, TAN Songbo, CHENG Xueqi.The progress in the study of chinese text orientation analysis[J]. Information Technology Letter, 2010, 8:16-31. [6] HU Mingqing, LIU Bing. Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York: ACM Press, 2004:168-177. [7] PANG Bo, LEE Lillian, VAITHYANATHAN Shivakumar.Sentiment classification using machine learning techniques[C]//Proceeding of Empirical Methods in Natural Language Processing.Morristown:ACL Press, 2002:79-86. [8] YU Hong, HATZIVASSILOGLOU Vasileios.Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences[C]//Proceedings of the EMNLP 2003. Morristown: ACL Press, 2003:129-136. [9] RAO Delip, RAVICHANDRAN Deepak. Semi-Supervised polarity lexicon induction[C]//Proceedings of the EACL 2009. Morristown:ACL Press, 2009:675-682. [10] TAKAMURA Hiroya, INUI Takashi, OKUMURA Manabu. Extracting semantic orientation of words using spin model[C]//Proceedings of the Association for Computational Linguistics. Morristown: ACL Press, 2005:133-140. [11] LIU Qun, LI Sujian. Word similarity computing based on howNet[C]//Proceedings of the 3th Chinese Lexical Semantic Workshop. Taibei:CLSW Press, 2002:45-56. [12] 江敏,肖诗斌,王弘蔚,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报, 2008, 22(5):84-89. JIANG Min, XIAO Shibin, WANG Hongwei, et al. An improved word similarity computing method based on hownet[J].Journal of Chinese Information Processing, 2008, 22(5):84-89. [13] 朱嫣岚,闵锦,周雅倩,等.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. ZHU Yanlan, MIN Jin, ZHOU Yaqian, et al.Sementic orientation computing based on howNet[J]. Journal of Chinese Information Processing, 2006, 20(1):14-20. [14] TURNEY Peter. Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of ACL. Morristown:ACL Press, 2002:417-424. [15] 杨超,冯时,王大玲,等.基于情感扩展技术的网络舆情倾向性分析[J].小型微型计算机系统,2010,04: 691-695. YANG Chao, FENG Shi, WANG Daling, et al. Analysis on Web public opinion orientation on extending sentiment lexicon[J].Journal of Chinese Computer System, 2010, 04: 691-695. [16] KU Lunwei, LO Yongsheng, CHEN Hsinhsi. Using opinion scores ofwords for sentence-level opinion extraction[C]//Proceedings of the 6th NACSIS Test Collections for IR Workshop Meeting on Evaluation of Information Access Technologies.Tokyo:NTCIR Press, 2007:316-322. [17] YANG Yiming, PEDERSEN Jan. A comparative study on feature selection in text categorization[C]//Proceeding of the 14th International Conference on Machine Learning.San Francisco: Morgan Kaufmann Press, 1997: 412-420. |
[1] | 牟廉明. 自适应特征选择加权k子凸包分类[J]. 山东大学学报(工学版), 2018, 48(5): 32-37. |
[2] | 张璞,刘畅,王永. 基于特征融合和集成学习的建议语句分类模型[J]. 山东大学学报(工学版), 2018, 48(5): 47-54. |
[3] | 曹雅,邓赵红,王士同. 基于单调约束的径向基函数神经网络模型[J]. 山东大学学报(工学版), 2018, 48(3): 127-133. |
[4] | 龙柏,曾宪宇,李徵,刘淇. 电商商品嵌入表示分类方法[J]. 山东大学学报(工学版), 2018, 48(3): 17-24. |
[5] | 林江豪,周咏梅,阳爱民,陈锦. 基于词向量的领域情感词典构建[J]. 山东大学学报(工学版), 2018, 48(3): 40-47. |
[6] | 谢志峰,吴佳萍,马利庄. 基于卷积神经网络的中文财经新闻分类方法[J]. 山东大学学报(工学版), 2018, 48(3): 34-39. |
[7] | 王婷婷,翟俊海,张明阳,郝璞. 基于HBase和SimHash的大数据K-近邻算法[J]. 山东大学学报(工学版), 2018, 48(3): 54-59. |
[8] | 陈嘉杰,王金凤. 基于蚁群算法求解Choquet模糊积分模型[J]. 山东大学学报(工学版), 2018, 48(3): 81-87. |
[9] | 王换,周忠眉. 一种基于聚类的过抽样算法[J]. 山东大学学报(工学版), 2018, 48(3): 134-139. |
[10] | 叶明全,高凌云,万春圆. 基于人工蜂群和SVM的基因表达数据分类[J]. 山东大学学报(工学版), 2018, 48(3): 10-16. |
[11] | 王磊,邓晓刚,曹玉苹,田学民. 基于MLFDA的化工过程故障模式分类方法[J]. 山东大学学报(工学版), 2017, 47(5): 179-186. |
[12] | 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42. |
[13] | 何其佳,刘振丙,徐涛,蒋淑洁. 基于LBP和极限学习机的脑部MR图像分类[J]. 山东大学学报(工学版), 2017, 47(2): 86-93. |
[14] | 郭超,杨燕,江永全,宋祎. 基于多视图分类集成的高铁工况识别[J]. 山东大学学报(工学版), 2017, 47(1): 7-14. |
[15] | 方昊,李云. 基于多次随机欠采样和POSS方法的软件缺陷检测[J]. 山东大学学报(工学版), 2017, 47(1): 15-21. |
|