%A Mingxia GAO,Jingwei LI %T Chinese short text classification method based on word2vec embedding %0 Journal Article %D 2019 %J Journal of Shandong University(Engineering Science) %R 10.6040/j.issn.1672-3961.0.2018.197 %P 34-41 %V 49 %N 2 %U {http://gxbwk.njournal.sdu.edu.cn/CN/abstract/article_1806.shtml} %8 2019-04-20 %X

In the short text classification process, the weak feature expression of the limitation of the number of words restricted the classification effect. To solve this problem, a Chinese short text classification method based on embedding trained by word2vec from Wikipedia (CSTC-EWW) was proposed, and a series of experiments for short texts with 4 topics from the iask.com website were finished. This method firstly trained the embedding by word2vec from Wikipedia corpus. the feature of short text based on the embedding was established. Naive Bayes and SVM was used to classify short text. The experimental results showed the following conclusions: CSTC-EWW could effectively classify short texts and the best F-value could reach 81.8%; Comparing the text feature expression of BOW model weighted by TF-IDF and the method of extending feature from Wikipedia, the classification results of CSTC-EWW were significantly better and F-measure of CSTC-EWW on car could be increased by 45.2%.