JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2018, Vol. 48 ›› Issue (3): 120-126.doi: 10.6040/j.issn.1672-3961.0.2017.407

Previous Articles     Next Articles

A word extend LDA model for short text sentiment

SHEN Ji, MA Zhiqiang*, LI Tuya, ZHANG Li   

  1. College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, Inner Mongolia, China
  • Received:2017-05-09 Online:2018-06-20 Published:2017-05-09

Abstract: Faced with low accuracy of sentiment polarity analysis for short text, this research presented an sentiment analysis model for short text based on latent dirichlet allocation. The model searched for the emotional words by the part of speech in the short texts and expanded them restrainedly to an extended set, enhanced the co-occurrence frequency between emotional words. The model added the expanded set to the discovered emotional words in short texts, increasing length of the short texts, extracting emotional information and turning topic clustering into emotion topic clustering. The model used 4 000 positive and negative short texts to experiments. The results showed that our model improved sentiment classification 11.8% than joint sentiment topic model model and 9.5% than latent sentiment model model; more emotional words were found at the same time. It proved that the model extracted richer emotion features for short texts and had a higher accuracy of classification in sentiment analysis.

Key words: short text, word extend function, latent Dirichlet allocation, unsupervised learning, sentiment analysis, document-topic generative model

CLC Number: 

  • TP311
[1] NASUKAWA T, YI J. Sentiment analysis:capturing favorability using natural language processing[C] //International Conference on Knowledge Capture. New York, NY, USA: ACM Press, 2003:70-77.
[2] AGARWAL B, PORIA S, MITTAL N, et al. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach[J]. Cognitive Computation, 2015, 7(4):487-499.
[3] 周咏梅, 阳爱民, 林江豪. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3):36-40. ZHOU Yongmei, YANG Aimin, LIN Jianghao. A method of building Chinese microblog sentiment lexicon[J].Journal of Shandong University(Engineering Science), 2014, 44(3):36-40.
[4] 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014(6):15-18. XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu. The sentiment mining method based on extended sentiment dictionary and integrated features[J]. Journal of Shandong University(Engineering Science), 2014(6):15-18.
[5] 周咏梅, 杨佳能, 阳爱民. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013(6):27-33. ZHOU Yongmei, YANG Jianeng, YANG Aimin. A method on building Chinese sentiment lexicon for text sentimentanalysis[J].Journal of Shandong University(Engineering Science), 2013(6):27-33.
[6] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1):19-23. ZHOU Zhe, SHANG Lin. A sentiment analysis method based on dynamic lexicon and three-way decision[J]. Journal of Shandong University(Engineering Science), 2015, 45(1):19-23.
[7] LIN C, HE Y. Joint sentiment/topic model for sentiment analysis[C] // ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM Press, 2009:375-384.
[8] 卢玲, 王越, 杨武. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6):7-11. LU Ling, WANG Yue, YANG Wu. A method of sentiment classification for Chinese comments based on naive Bayesian[J]. Journal of Shandong University(Engineering Science), 2013, 43(6):7-11.
[9] BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J].The Journal of Machine Learning Research, 2003, 3:993-1022.
[10] RUBIN T, CHAMBERS A, SMYTH P, et al. Statistical topic models for multi-label document classification[J]. Machine Learning, 2011, 88(1-2):157-208.
[11] ANDRZEJEWSKI D, BUTTLER D. Latent topic feedback for information retrieval[C] //Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM Press, 2011:600-608.
[12] LACOSTE-JULIEN S, FEI S, JORDAN M. DiscLDA: Discriminative learning for dimensionality reduction and classification[C] //Proceedings of NIPS Neural Information Processing. Cambridge, MA, USA:MIT Press, 2008:897-904
[13] ZHU J, AHMED A, XING E. MedLDA:maximum margin supervised topic models for regression and classification[C] //International Conference on Machine Learning. New York, USA:ACM Press,2009, 13(4):1257-1264
[14] OSTROWSKI D A. Using latent dirichlet allocation for topic modelling in twitter[C] //IEEE International Conference on Semantic Computing. New York, USA:IEEE Press, 2015:493-497.
[15] YAN X, GUO J, LAN Y, et al. A biterm topic model for short exts[C] //Proceedings of the 22nd International Conference on World Wide Web. New York, USA:ACM Press, 2013:1445-1456.
[16] HE Y. Latent sentiment model for weakly-supervised crosslingual sentiment classification[J].Advances in Information Retrieval, 2011, 6611:214-225.
[17] JO Y, OH A. Aspect and sentiment unification model for online review analysis[C] // Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York, USA: ACM Press, 2011:815-824.
[18] LIN C, HE Y. Joint sentiment topic model for sentiment analysis[C] //Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA:ACM Press,2009:375- 384.
[19] 张佳明, 王波, 唐浩浩, 等.基于Biterm主题模型的无监督微博情感倾向性分析[J].计算机工程, 2015, 41(7):219-223, 229. ZHANG Jiaming, WANG Bo, TANG Haohao, et al. Unsupervised sentiment orientation analysis on micro-blog based on biterm topic model[J]. Computer Engineering, 2015, 41(7):219-223, 229.
[20] 欧阳继红, 刘燕辉, 李熙铭, 等.基于LDA 的多粒度主题情感混合模型[J].电子学报, 2015, 43(9):1875,1880. OUYANG Jihong, LIU Yanhui, LI Ximing, et al. Multi-grain sentiment/topic model based on LDA[J]. Acta Electronica Sinica, 2015, 43(9):1875-1880.
[21] 郝洁, 谢珺, 苏婧琼, 等. 基于词加权LDA算法的无监督情感分类[J].智能系统学报,2016,11(4):539-545. HAO Jie, XIE Jun, SU Jingqiong, et al. An unsupervised approach for sentiment classification based on weighted latent dirichlet allocation[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4):539-545.
[22] 孙艳, 周学广, 付伟.等.基于主题情感混合模型的无监督文本情感分析[J].北京大学学报(自然科学版),2013, 49(1):102-108. SUN Yan, ZHOU Xueguang, FU Wei. Unsupervised topic and sentiment unification model for sentiment analysis[J].Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(1):102-108.
[1] YAN Yingying, HUANG Ruizhang, WANG Rui, MA Can, LIU Bowei, HUANG Ting. A document understanding method for short texts by auxiliary long documents [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(3): 67-74.
[2] ZHOU Zhe, SHANG Lin. A sentiment analysis method based on dynamic lexicon and three-way decision [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2015, 45(1): 19-23.
[3] LU Wenyang, XU Jiayi, YANG Yubin. LDA-based link prediction in social network [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 26-31.
[4] ZHOU Yongmei1, YANG Aimin1, LIN Jianghao2. A method of building Chinese microblog sentiment lexicon [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(3): 36-40.
[5] WEI Wei, ZHANG Yanning. Pose estimation based on semi-supervised latent Dirichlet allocation [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 17-22.
[6] CHEN Bin, CHEN Song-Can, PAN Zhi-Song, LI Bin. Survey of outlier detection technologies [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(6): 13-23.
Full text



No Suggested Reading articles found!