山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 120-126.doi: 10.6040/j.issn.1672-3961.0.2017.407
沈冀,马志强*,李图雅,张力
SHEN Ji, MA Zhiqiang*, LI Tuya, ZHANG Li
摘要: 针对短文本在情感极性判断上准确率不高的缺点,在隐含狄利克雷分配(latent Dirichlet allocation, LDA)的基础上提出一种适用于短文本的情感分析模型。该模型在短文本中按词性寻找情感词汇,并对其进行有约束的词语扩充形成扩充集合,增强情感词汇之间的共现频率。将扩充集合加入文本中已发现的情感词汇,使得短文本长度增加并且模型可以提取到情感信息,模型通过这种方法将主题聚类变成情感主题聚类。该模型使用4 000条带有正负情感极性的短文本进行验证,结果表明该模型准确率比情感主题联合模型提高约11%,比隐含情感模型提高约9.5%,同时可以发现更多的情感词汇,证明该模型对于短文本能够提取更丰富的情感特征并在情感极性分类上准确率较高。
中图分类号:
[1] NASUKAWA T, YI J. Sentiment analysis:capturing favorability using natural language processing[C] //International Conference on Knowledge Capture. New York, NY, USA: ACM Press, 2003:70-77. [2] AGARWAL B, PORIA S, MITTAL N, et al. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach[J]. Cognitive Computation, 2015, 7(4):487-499. [3] 周咏梅, 阳爱民, 林江豪. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3):36-40. ZHOU Yongmei, YANG Aimin, LIN Jianghao. A method of building Chinese microblog sentiment lexicon[J].Journal of Shandong University(Engineering Science), 2014, 44(3):36-40. [4] 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014(6):15-18. XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu. The sentiment mining method based on extended sentiment dictionary and integrated features[J]. Journal of Shandong University(Engineering Science), 2014(6):15-18. [5] 周咏梅, 杨佳能, 阳爱民. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013(6):27-33. ZHOU Yongmei, YANG Jianeng, YANG Aimin. A method on building Chinese sentiment lexicon for text sentimentanalysis[J].Journal of Shandong University(Engineering Science), 2013(6):27-33. [6] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1):19-23. ZHOU Zhe, SHANG Lin. A sentiment analysis method based on dynamic lexicon and three-way decision[J]. Journal of Shandong University(Engineering Science), 2015, 45(1):19-23. [7] LIN C, HE Y. Joint sentiment/topic model for sentiment analysis[C] // ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM Press, 2009:375-384. [8] 卢玲, 王越, 杨武. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6):7-11. LU Ling, WANG Yue, YANG Wu. A method of sentiment classification for Chinese comments based on naive Bayesian[J]. Journal of Shandong University(Engineering Science), 2013, 43(6):7-11. [9] BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J].The Journal of Machine Learning Research, 2003, 3:993-1022. [10] RUBIN T, CHAMBERS A, SMYTH P, et al. Statistical topic models for multi-label document classification[J]. Machine Learning, 2011, 88(1-2):157-208. [11] ANDRZEJEWSKI D, BUTTLER D. Latent topic feedback for information retrieval[C] //Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM Press, 2011:600-608. [12] LACOSTE-JULIEN S, FEI S, JORDAN M. DiscLDA: Discriminative learning for dimensionality reduction and classification[C] //Proceedings of NIPS Neural Information Processing. Cambridge, MA, USA:MIT Press, 2008:897-904 [13] ZHU J, AHMED A, XING E. MedLDA:maximum margin supervised topic models for regression and classification[C] //International Conference on Machine Learning. New York, USA:ACM Press,2009, 13(4):1257-1264 [14] OSTROWSKI D A. Using latent dirichlet allocation for topic modelling in twitter[C] //IEEE International Conference on Semantic Computing. New York, USA:IEEE Press, 2015:493-497. [15] YAN X, GUO J, LAN Y, et al. A biterm topic model for short exts[C] //Proceedings of the 22nd International Conference on World Wide Web. New York, USA:ACM Press, 2013:1445-1456. [16] HE Y. Latent sentiment model for weakly-supervised crosslingual sentiment classification[J].Advances in Information Retrieval, 2011, 6611:214-225. [17] JO Y, OH A. Aspect and sentiment unification model for online review analysis[C] // Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York, USA: ACM Press, 2011:815-824. [18] LIN C, HE Y. Joint sentiment topic model for sentiment analysis[C] //Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA:ACM Press,2009:375- 384. [19] 张佳明, 王波, 唐浩浩, 等.基于Biterm主题模型的无监督微博情感倾向性分析[J].计算机工程, 2015, 41(7):219-223, 229. ZHANG Jiaming, WANG Bo, TANG Haohao, et al. Unsupervised sentiment orientation analysis on micro-blog based on biterm topic model[J]. Computer Engineering, 2015, 41(7):219-223, 229. [20] 欧阳继红, 刘燕辉, 李熙铭, 等.基于LDA 的多粒度主题情感混合模型[J].电子学报, 2015, 43(9):1875,1880. OUYANG Jihong, LIU Yanhui, LI Ximing, et al. Multi-grain sentiment/topic model based on LDA[J]. Acta Electronica Sinica, 2015, 43(9):1875-1880. [21] 郝洁, 谢珺, 苏婧琼, 等. 基于词加权LDA算法的无监督情感分类[J].智能系统学报,2016,11(4):539-545. HAO Jie, XIE Jun, SU Jingqiong, et al. An unsupervised approach for sentiment classification based on weighted latent dirichlet allocation[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4):539-545. [22] 孙艳, 周学广, 付伟.等.基于主题情感混合模型的无监督文本情感分析[J].北京大学学报(自然科学版),2013, 49(1):102-108. SUN Yan, ZHOU Xueguang, FU Wei. Unsupervised topic and sentiment unification model for sentiment analysis[J].Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(1):102-108. |
[1] | 闫盈盈,黄瑞章,王瑞,马灿,刘博伟,黄庭. 一种长文本辅助短文本的文本理解方法[J]. 山东大学学报(工学版), 2018, 48(3): 67-74. |
[2] | 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23. |
[3] | 周咏梅1,阳爱民1,林江豪2. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3): 36-40. |
[4] | 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33. |
[5] | 魏巍,张艳宁. 基于半监督隐含狄利克雷分配的人脸姿态判别方法[J]. 山东大学学报(工学版), 2011, 41(3): 17-22. |
[6] | 陈斌 陈松灿 潘志松 李斌. 异常检测综述[J]. 山东大学学报(工学版), 2009, 39(6): 13-23. |
|