JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2018, Vol. 48 ›› Issue (3): 40-47.doi: 10.6040/j.issn.1672-3961.0.2017.403

Building of domain sentiment lexicon based on word2vec

LIN Jianghao1,2, ZHOU Yongmei1,2*, YANG Aimin1,2, CHEN Jin1,3   

  1. 1. Laboratory for Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510006, Guangdong, China;
    2. School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006, Guangdong, China;
    3. International College, Guangdong University of Foreign Studies, Guangzhou 510420, Guangdong, China
  • Received:2017-08-23 Online:2018-06-20 Published:2017-08-23

Abstract: In order to fill the gap of sentimental and semantic representation in domain sentiment lexicon, a construction method of domain sentiment lexicon via word vectors was proposed. The word2vec model was trained based on 250 thousand news texts and 100 thousand hotel review texts. Eighty sentimental words, which possed obvious sentiment, rich content and diverse POS, were chosen as a set of seed words. Meanwhile, 9 860 candidate sentimental words among the hotel review texts were acquired via the measuring value of TR-IDF. The semantic similarity between the candidate sentimental words and the seed words was calculated based on their word vectors, and the sentimental words were mapped to the high dimensional vector space and the feature vector representation(Senti2vec)was extracted. Senti2vec was applied into the polarity classification of sentimental words and sentimental text analysis. The experimental results showed that Senti2vec could represent the meaning and sentiment of sentimental words. Senti2vec was based on semantic similarity calculation from data of specific domain, which enabled this method more adaptable into different domains.

Key words: word2vec, sentiment word, sentimental feature vector, semantic similarity, domain sentiment lexicon

CLC Number: 

  • TP391.1
