山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 120-126.doi: 10.6040/j.issn.1672-3961.0.2017.407
沈冀,马志强*,李图雅,张力
SHEN Ji, MA Zhiqiang*, LI Tuya, ZHANG Li
摘要: 针对短文本在情感极性判断上准确率不高的缺点,在隐含狄利克雷分配(latent Dirichlet allocation, LDA)的基础上提出一种适用于短文本的情感分析模型。该模型在短文本中按词性寻找情感词汇,并对其进行有约束的词语扩充形成扩充集合,增强情感词汇之间的共现频率。将扩充集合加入文本中已发现的情感词汇,使得短文本长度增加并且模型可以提取到情感信息,模型通过这种方法将主题聚类变成情感主题聚类。该模型使用4 000条带有正负情感极性的短文本进行验证,结果表明该模型准确率比情感主题联合模型提高约11%,比隐含情感模型提高约9.5%,同时可以发现更多的情感词汇,证明该模型对于短文本能够提取更丰富的情感特征并在情感极性分类上准确率较高。
中图分类号:
| [1] NASUKAWA T, YI J. Sentiment analysis:capturing favorability using natural language processing[C] //International Conference on Knowledge Capture. New York, NY, USA: ACM Press, 2003:70-77. [2] AGARWAL B, PORIA S, MITTAL N, et al. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach[J]. Cognitive Computation, 2015, 7(4):487-499. [3] 周咏梅, 阳爱民, 林江豪. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3):36-40. ZHOU Yongmei, YANG Aimin, LIN Jianghao. A method of building Chinese microblog sentiment lexicon[J].Journal of Shandong University(Engineering Science), 2014, 44(3):36-40. [4] 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014(6):15-18. XU Xiaodan, DUAN Zhengjie, CHEN Zhongyu. The sentiment mining method based on extended sentiment dictionary and integrated features[J]. Journal of Shandong University(Engineering Science), 2014(6):15-18. [5] 周咏梅, 杨佳能, 阳爱民. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013(6):27-33. ZHOU Yongmei, YANG Jianeng, YANG Aimin. A method on building Chinese sentiment lexicon for text sentimentanalysis[J].Journal of Shandong University(Engineering Science), 2013(6):27-33. [6] 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1):19-23. ZHOU Zhe, SHANG Lin. A sentiment analysis method based on dynamic lexicon and three-way decision[J]. Journal of Shandong University(Engineering Science), 2015, 45(1):19-23. [7] LIN C, HE Y. Joint sentiment/topic model for sentiment analysis[C] // ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM Press, 2009:375-384. [8] 卢玲, 王越, 杨武. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6):7-11. LU Ling, WANG Yue, YANG Wu. A method of sentiment classification for Chinese comments based on naive Bayesian[J]. Journal of Shandong University(Engineering Science), 2013, 43(6):7-11. [9] BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J].The Journal of Machine Learning Research, 2003, 3:993-1022. [10] RUBIN T, CHAMBERS A, SMYTH P, et al. Statistical topic models for multi-label document classification[J]. Machine Learning, 2011, 88(1-2):157-208. [11] ANDRZEJEWSKI D, BUTTLER D. Latent topic feedback for information retrieval[C] //Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM Press, 2011:600-608. [12] LACOSTE-JULIEN S, FEI S, JORDAN M. DiscLDA: Discriminative learning for dimensionality reduction and classification[C] //Proceedings of NIPS Neural Information Processing. Cambridge, MA, USA:MIT Press, 2008:897-904 [13] ZHU J, AHMED A, XING E. MedLDA:maximum margin supervised topic models for regression and classification[C] //International Conference on Machine Learning. New York, USA:ACM Press,2009, 13(4):1257-1264 [14] OSTROWSKI D A. Using latent dirichlet allocation for topic modelling in twitter[C] //IEEE International Conference on Semantic Computing. New York, USA:IEEE Press, 2015:493-497. [15] YAN X, GUO J, LAN Y, et al. A biterm topic model for short exts[C] //Proceedings of the 22nd International Conference on World Wide Web. New York, USA:ACM Press, 2013:1445-1456. [16] HE Y. Latent sentiment model for weakly-supervised crosslingual sentiment classification[J].Advances in Information Retrieval, 2011, 6611:214-225. [17] JO Y, OH A. Aspect and sentiment unification model for online review analysis[C] // Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York, USA: ACM Press, 2011:815-824. [18] LIN C, HE Y. Joint sentiment topic model for sentiment analysis[C] //Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA:ACM Press,2009:375- 384. [19] 张佳明, 王波, 唐浩浩, 等.基于Biterm主题模型的无监督微博情感倾向性分析[J].计算机工程, 2015, 41(7):219-223, 229. ZHANG Jiaming, WANG Bo, TANG Haohao, et al. Unsupervised sentiment orientation analysis on micro-blog based on biterm topic model[J]. Computer Engineering, 2015, 41(7):219-223, 229. [20] 欧阳继红, 刘燕辉, 李熙铭, 等.基于LDA 的多粒度主题情感混合模型[J].电子学报, 2015, 43(9):1875,1880. OUYANG Jihong, LIU Yanhui, LI Ximing, et al. Multi-grain sentiment/topic model based on LDA[J]. Acta Electronica Sinica, 2015, 43(9):1875-1880. [21] 郝洁, 谢珺, 苏婧琼, 等. 基于词加权LDA算法的无监督情感分类[J].智能系统学报,2016,11(4):539-545. HAO Jie, XIE Jun, SU Jingqiong, et al. An unsupervised approach for sentiment classification based on weighted latent dirichlet allocation[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4):539-545. [22] 孙艳, 周学广, 付伟.等.基于主题情感混合模型的无监督文本情感分析[J].北京大学学报(自然科学版),2013, 49(1):102-108. SUN Yan, ZHOU Xueguang, FU Wei. Unsupervised topic and sentiment unification model for sentiment analysis[J].Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(1):102-108. |
| [1] | 刘国军,范天祥,王乃正,张正达,齐广智. 基于高斯分布和Householder flow的无监督图嵌入算法[J]. 山东大学学报 (工学版), 2024, 54(4): 35-41. |
| [2] | 陈晓江,杨晓奇,陈广豪,刘伍颖. 混合BERT和宽度学习的低时间复杂度短文本分类[J]. 山东大学学报 (工学版), 2024, 54(4): 51-58. |
| [3] | 闵海根,雷小平,李杰,童星,吴霞,方煜坤. 基于双层混合集成的自动驾驶汽车故障检测[J]. 山东大学学报 (工学版), 2022, 52(6): 30-40. |
| [4] | 傅桂霞,邹国锋,毛帅,潘金凤,尹丽菊. 融合Gabor特征与卷积特征的小样本行人重识别[J]. 山东大学学报 (工学版), 2021, 51(3): 22-29. |
| [5] | 杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21. |
| [6] | 蔡国永,贺歆灏,储阳阳. 基于空间注意力和卷积神经网络的视觉情感分析[J]. 山东大学学报 (工学版), 2020, 50(4): 8-13. |
| [7] | 冯超,徐鲲鹏,陈黎飞. 符号序列的LDA主题特征表示方法[J]. 山东大学学报 (工学版), 2020, 50(2): 60-65. |
| [8] | 胡龙茂,胡学钢. 基于多维相似度和情感词扩充的相同产品特征识别[J]. 山东大学学报 (工学版), 2020, 50(2): 50-59. |
| [9] | 蔡国永, 林强, 任凯琪. 基于域对抗网络和BERT的跨领域文本情感分析[J]. 山东大学学报 (工学版), 2020, 50(1): 1-7. |
| [10] | 秦军,张远鹏,蒋亦樟,杭文龙. 多代表点自约束的模糊迁移聚类[J]. 山东大学学报 (工学版), 2019, 49(2): 107-115. |
| [11] | 高明霞,李经纬. 基于word2vec词模型的中文短文本分类方法[J]. 山东大学学报 (工学版), 2019, 49(2): 34-41. |
| [12] | 钱春琳,张兴芳,孙丽华. 基于在线评论情感分析的改进协同过滤推荐模型[J]. 山东大学学报 (工学版), 2019, 49(1): 47-54. |
| [13] | 周荣翔,贾修一. 中文反语识别特征分析[J]. 山东大学学报 (工学版), 2019, 49(1): 41-46. |
| [14] | 闫盈盈,黄瑞章,王瑞,马灿,刘博伟,黄庭. 一种长文本辅助短文本的文本理解方法[J]. 山东大学学报(工学版), 2018, 48(3): 67-74. |
| [15] | 周哲, 商琳. 一种基于动态词典和三支决策的情感分析方法[J]. 山东大学学报(工学版), 2015, 45(1): 19-23. |
|