您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 120-126.doi: 10.6040/j.issn.1672-3961.0.2017.407

• • 上一篇    下一篇

面向短文本情感分析的词扩充LDA模型

沈冀,马志强*,李图雅,张力   

  1. 内蒙古工业大学信息工程学院, 内蒙古 呼和浩特 010080
  • 收稿日期:2017-05-09 出版日期:2018-06-20 发布日期:2017-05-09
  • 通讯作者: 马志强(1972— ),男,内蒙古自治区呼和浩特人,副教授,主要研究方向为语音识别,机器学习. E-mail:675898486@qq.com E-mail:2247935158@qq.com
  • 作者简介:沈冀(1994— ),男,江苏徐州人,硕士研究生,主要研究方向为自然语言处理,机器学习. E-mail:2247935158@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61650205);内蒙古自治区自然科学基金资助项目(2014MS0608)

A word extend LDA model for short text sentiment

SHEN Ji, MA Zhiqiang*, LI Tuya, ZHANG Li   

  1. College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, Inner Mongolia, China
  • Received:2017-05-09 Online:2018-06-20 Published:2017-05-09

摘要: 针对短文本在情感极性判断上准确率不高的缺点,在隐含狄利克雷分配(latent Dirichlet allocation, LDA)的基础上提出一种适用于短文本的情感分析模型。该模型在短文本中按词性寻找情感词汇,并对其进行有约束的词语扩充形成扩充集合,增强情感词汇之间的共现频率。将扩充集合加入文本中已发现的情感词汇,使得短文本长度增加并且模型可以提取到情感信息,模型通过这种方法将主题聚类变成情感主题聚类。该模型使用4 000条带有正负情感极性的短文本进行验证,结果表明该模型准确率比情感主题联合模型提高约11%,比隐含情感模型提高约9.5%,同时可以发现更多的情感词汇,证明该模型对于短文本能够提取更丰富的情感特征并在情感极性分类上准确率较高。

关键词: 短文本, 情感分析, 隐含狄利克雷分配, 无监督学习, 词扩充, 文档-主题生成模型

Abstract: Faced with low accuracy of sentiment polarity analysis for short text, this research presented an sentiment analysis model for short text based on latent dirichlet allocation. The model searched for the emotional words by the part of speech in the short texts and expanded them restrainedly to an extended set, enhanced the co-occurrence frequency between emotional words. The model added the expanded set to the discovered emotional words in short texts, increasing length of the short texts, extracting emotional information and turning topic clustering into emotion topic clustering. The model used 4 000 positive and negative short texts to experiments. The results showed that our model improved sentiment classification 11.8% than joint sentiment topic model model and 9.5% than latent sentiment model model; more emotional words were found at the same time. It proved that the model extracted richer emotion features for short texts and had a higher accuracy of classification in sentiment analysis.

Key words: short text, word extend function, latent Dirichlet allocation, unsupervised learning, sentiment analysis, document-topic generative model

中图分类号: 

  • TP311
[1] NASUKAWA T, YI J. Sentiment analysis:capturing favorability using natural language processing[C] //International Conference on Knowledge Capture. New York, NY, USA: ACM Press, 2003:70-77.
[2] AGARWAL B, PORIA S, MITTAL N, et al. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach[J]. Cognitive Computation, 2015, 7(4):487-499.
[3] 周咏梅, 阳爱民, 林江豪. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3):36-40. ZHOU Yongmei, YANG Aimin, LIN Jianghao. A method of building Chinese microblog sentiment lexicon[J].Journal of Shandong University(Engineering Science), 2014, 44(3):36-40.
[4] 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014(6):15-18. XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu. The sentiment mining method based on extended sentiment dictionary and integrated features[J]. Journal of Shandong University(Engineering Science), 2014(6):15-18.
[5] 周咏梅, 杨佳能, 阳爱民. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013(6):27-33. ZHOU Yongmei, YANG Jianeng, YANG Aimin. A method on building Chinese sentiment lexicon for text sentimentanalysis[J].Journal of Shandong University(Engineering Science), 2013(6):27-33.
[6] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1):19-23. ZHOU Zhe, SHANG Lin. A sentiment analysis method based on dynamic lexicon and three-way decision[J]. Journal of Shandong University(Engineering Science), 2015, 45(1):19-23.
[7] LIN C, HE Y. Joint sentiment/topic model for sentiment analysis[C] // ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM Press, 2009:375-384.
[8] 卢玲, 王越, 杨武. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6):7-11. LU Ling, WANG Yue, YANG Wu. A method of sentiment classification for Chinese comments based on naive Bayesian[J]. Journal of Shandong University(Engineering Science), 2013, 43(6):7-11.
[9] BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J].The Journal of Machine Learning Research, 2003, 3:993-1022.
[10] RUBIN T, CHAMBERS A, SMYTH P, et al. Statistical topic models for multi-label document classification[J]. Machine Learning, 2011, 88(1-2):157-208.
[11] ANDRZEJEWSKI D, BUTTLER D. Latent topic feedback for information retrieval[C] //Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM Press, 2011:600-608.
[12] LACOSTE-JULIEN S, FEI S, JORDAN M. DiscLDA: Discriminative learning for dimensionality reduction and classification[C] //Proceedings of NIPS Neural Information Processing. Cambridge, MA, USA:MIT Press, 2008:897-904
[13] ZHU J, AHMED A, XING E. MedLDA:maximum margin supervised topic models for regression and classification[C] //International Conference on Machine Learning. New York, USA:ACM Press,2009, 13(4):1257-1264
[14] OSTROWSKI D A. Using latent dirichlet allocation for topic modelling in twitter[C] //IEEE International Conference on Semantic Computing. New York, USA:IEEE Press, 2015:493-497.
[15] YAN X, GUO J, LAN Y, et al. A biterm topic model for short exts[C] //Proceedings of the 22nd International Conference on World Wide Web. New York, USA:ACM Press, 2013:1445-1456.
[16] HE Y. Latent sentiment model for weakly-supervised crosslingual sentiment classification[J].Advances in Information Retrieval, 2011, 6611:214-225.
[17] JO Y, OH A. Aspect and sentiment unification model for online review analysis[C] // Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York, USA: ACM Press, 2011:815-824.
[18] LIN C, HE Y. Joint sentiment topic model for sentiment analysis[C] //Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA:ACM Press,2009:375- 384.
[19] 张佳明, 王波, 唐浩浩, 等.基于Biterm主题模型的无监督微博情感倾向性分析[J].计算机工程, 2015, 41(7):219-223, 229. ZHANG Jiaming, WANG Bo, TANG Haohao, et al. Unsupervised sentiment orientation analysis on micro-blog based on biterm topic model[J]. Computer Engineering, 2015, 41(7):219-223, 229.
[20] 欧阳继红, 刘燕辉, 李熙铭, 等.基于LDA 的多粒度主题情感混合模型[J].电子学报, 2015, 43(9):1875,1880. OUYANG Jihong, LIU Yanhui, LI Ximing, et al. Multi-grain sentiment/topic model based on LDA[J]. Acta Electronica Sinica, 2015, 43(9):1875-1880.
[21] 郝洁, 谢珺, 苏婧琼, 等. 基于词加权LDA算法的无监督情感分类[J].智能系统学报,2016,11(4):539-545. HAO Jie, XIE Jun, SU Jingqiong, et al. An unsupervised approach for sentiment classification based on weighted latent dirichlet allocation[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4):539-545.
[22] 孙艳, 周学广, 付伟.等.基于主题情感混合模型的无监督文本情感分析[J].北京大学学报(自然科学版),2013, 49(1):102-108. SUN Yan, ZHOU Xueguang, FU Wei. Unsupervised topic and sentiment unification model for sentiment analysis[J].Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(1):102-108.
[1] 闫盈盈,黄瑞章,王瑞,马灿,刘博伟,黄庭. 一种长文本辅助短文本的文本理解方法[J]. 山东大学学报(工学版), 2018, 48(3): 67-74.
[2] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23.
[3] 周咏梅1,阳爱民1,林江豪2. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3): 36-40.
[4] 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33.
[5] 魏巍,张艳宁. 基于半监督隐含狄利克雷分配的人脸姿态判别方法[J]. 山东大学学报(工学版), 2011, 41(3): 17-22.
[6] 陈斌 陈松灿 潘志松 李斌. 异常检测综述[J]. 山东大学学报(工学版), 2009, 39(6): 13-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!