山东大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (3): 40-47.doi: 10.6040/j.issn.1672-3961.0.2017.403
林江豪1,2,周咏梅1,2*,阳爱民1,2,陈锦1,3
LIN Jianghao1,2, ZHOU Yongmei1,2*, YANG Aimin1,2, CHEN Jin1,3
摘要: 针对现有领域情感词典在情感和语义表达等方面的不足,提出一种基于词向量的领域情感词典构建方法。利用25万篇新闻语料和10万余条酒店评论数据,训练得到word2vec模型;选择80个情感明显、内容丰富、词性多样化的情感词作为种子词集;利用TF-IDF值在词汇重要程度的度量作用,在酒店评论中获得9 860个领域候选情感词汇;通过计算候选情感词与种子词的词向量之间的语义相似度,将情感词映射到高维向量空间,实现了情感词的特征向量表示(Senti2vec)。将Senti2vec应用于情感词极性分类和文本情感分析任务中,试验结果表明,Senti2vec能实现情感词的语义表示和情感表示;基于特定领域语料的语义相似计算,使得提取的情感特征更具有领域特性,同时不受候选情感词集范围的约束。
中图分类号:
[1] XU G, MENG X F, WANG H F. Build Chinese emotion lexicons using a graph-based algorithm and multiple resources[C] //Proceedings of the 23rd international conference on computational linguistics. Beijing, China:ACM, 2010:1209-1217. [2] BACCIANELLA S, ESULI A, SEBASTIANI F. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining[C] //Proceedings of International Conference on Language Resources and Evaluation, LREC 2010. Malta:LREC, 2010:83-90. [3] DAI L L, XIA Y N, LIU B, et al. Measuring semantic similarity between words using HowNet[C] //Proceedings of the 2008 International Conference on Computer Science and Information Technology. Singapore:IEEE, 2008:601-605. [4] TABOADA M, BROOKE J, TOFILOSKI M, et al. Lexicon-based methods for sentiment analysis[J]. Computational linguistics, 2011, 37(2): 267-307. [5] DRAGUT E C, WANG H, SISTLA P, et al. Polarity consistency checking for domain independent sentiment dictionaries[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(3): 838-851. [6] VO D T, ZHANG Y. Dont count, predict! an automatic approach to learning sentiment lexicons for short text[C] //Proceedings of the 54th annual meeting of the association for computational linguistics. Berlin, Germany:ACL, 2016: 219. [7] 朱嫣岚,闵锦,周雅倩,等.基于hownet的词汇语义倾向计算[J].中文信息学报, 2006, 20(1):14-20. ZHU Yanlan, MIN Jin, ZHOU Yaqian, et al. Semantic orientation computing based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1): 14-20. [8] 柳位平,朱艳辉,栗春亮,等.中文基础情感词词典构建方法研究[J].计算机应用,2009,29(11):2882-2884. LIU Weiping, ZHU Yanhui, LI Chunliang, et al. Research on building Chinese basic semantic lexicon[J]. Journal of Computer Applications, 2009, 29(11): 2882-2884. [9] 周咏梅,阳爱民,杨佳能. 一种新闻评论情感词典的构建方法[J]. 计算机科学,2014,41(08):67-69. ZHOU Yongmei, YANG Aimin, YANG Jianeng. Construction method of sentiment lexicon for news reviews[J]. Computer Science, 2014, 41(08):67-69. [10] YANG Aimin, LIN Jianghao, ZHOU Yongmei, et al. Research on building a Chinese sentiment lexicon based on SO-PMI[J]. Applied Mechanics and Materials, 2013, 263-266(1):1688-1693. [11] 周咏梅,阳爱民,林江豪. 中文微博情感词典构建方法[J]. 山东大学学报(工学版),2014,44(3):36-40. ZHOU Yongmei, YANG Aimin, LIN Jianghao. A method of building Chinese microblog sentiment lexicon[J]. Journal of Shandong University(Engineering Science), 2014, 44(3):36-40. [12] WANG G, ARAKI K. Modifying SO-PMI for Japanese weblog opinion mining by using a balancing factor and detecting neutral expressions[C] //Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. New York, America:ACL, 2007:189-192. [13] 彭丽针,吴扬扬. 基于维基百科社区挖掘的词语语义相似度计算[J]. 计算机科学, 2016,43(4):45-49. PENG Lizhen, WU Yangyang. Semantic similarity computing based on community mining of wikipedia[J]. Computer Science, 2016, 43(4):45-49. [14] 陶富民,高军,王腾蛟,等. 面向话题的新闻评论的情感特征选取[J]. 中文信息学报,2010, 24(3):37-43. TAO Fumin, GAO Jun, WANG Tenjiao, et al. Topic oriented sentimental feature selection method for news comments[J].Journal of Chinese Information Processing, 2010, 24(3):37-43. [15] 李素科,蒋严冰. 基于情感特征聚类的半监督情感分类[J]. 计算机研究与发展,2013, 50(12):2570-2577. LI Suke, JIANG Yanbing. Semi-supervised sentiment classification based on sentiment feature clustering[J]. Journal of Computer Research and Development, 2013, 50(12):2570-2577. [16] 贺飞艳,何炎祥,刘楠,等. 面向微博短文本的细粒度情感特征抽取方法[J]. 北京大学学报(自然科学版),2014,42(1):48-54. HE Feiyan, HE Yanxiang, LIU Nan, et al. A microblog short text oriented multi-class feature extraction method of fine-grained sentiment analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 42(1):48-54. [17] 吴金源,冀俊忠,赵学武,等. 基于特征选择技术的情感词权重计算[J]. 北京工业大学学报,2016, 42(1):142-151. WU Jinyuan, JI Junzhong, ZHAO Xuewu, et al. Weight calculation of emotional word based on feature selection technique[J].Journal of Beijing University of Technology, 2016, 42(1):142-151. [18] HAMOUDA A, MAREI M, ROHAIM M.Building machine learning based senti-word lexicon for sentiment analysis[J]. Journal of Advances in Information Technology, 2011, 2(4):199-203. [19] PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:ACL,2014:1532-1543. [20] TSVETKOV Y, FARUQUI M, DYER C. Correlation-based intrinsic evaluation of word vector representations[C] //Proceedings of the Workshop on Evaluating Vector-Space Representations for Nlp. 2016. Berlin,Germany:ACL, 2016:111-115. [21] CAMACHO-COLLADOS J, NAVIGLI R. Find the word that does not belong: A framework for an intrinsic evaluation of word vector representations[C] //Proceedings of the Workshop on Evaluating Vector-Space Representations for Nlp. 2016. Berlin, Germany: ACL, 2016:43-50. [22] LAURENS V D M. Accelerating t-SNE using tree-based algorithms[J]. Journal of Machine Learning Research, 2014, 15(1):3221-3245. [23] 周咏梅,杨佳能,阳爱民. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版),2013,43(6):27-33. ZHOU Yongmei, YANG Jianeng, YANG Aimin. A method on building Chinese sentiment lexicon for text sentiment analysis[J]. Journal of Shandong University(Engineering Science), 2013, 43(6):27-33. [24] 杨鼎,阳爱民. 一种基于情感词典和朴素贝叶斯的中文文本情感分类方法[J]. 计算机应用研究,2010,27(10):3737-3739,3743. YANG Ding, YANG Aimin. Classification approach of Chinese texts sentiment based on semantic lexicon and naïve Bayesian[J].Application Research of Computers, 2010, 27(10):3737-3739,3743. |
[1] | 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15. |
[2] | 钱肃驰, 彭甫镕, 陆建峰. 基于语义相似度的标签优化[J]. 山东大学学报(工学版), 2015, 45(2): 37-42. |
[3] | 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014, 44(6): 15-18. |
[4] | 周咏梅1,阳爱民1,林江豪2. 中文微博情感词典构建方法[J]. 山东大学学报(工学版), 2014, 44(3): 36-40. |
[5] | 尹坤,尹红风*,杨燕,贾真. 基于SimRank的百度百科词条语义相似度计算[J]. 山东大学学报(工学版), 2014, 44(3): 29-35. |
[6] | 卢玲1,王越2,杨武1. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6): 7-11. |
[7] | 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33. |
|