A document understanding method for short texts by auxiliary long documents

YAN Yingying1,2, HUANG Ruizhang1,2*, WANG Rui1,2, MA Can1,2, LIU Bowei1,2, HUANG Ting1,2   

  1. 1. School of Computer Science and Technology, Guizhou University, Guiyang 550025, Guizhou, China;
    2. Guizhou Provincial Key Laboratory of Public Big Data, Guiyang 550025, Guizhou, China
  Received:2017-08-23 Online:2018-06-20 Published:2017-08-23

Abstract: Based on the dirichlet-multinomial regression(DMR)model, a dual dirichlet-multinomial regression(DDMR)model that short texts were understood by auxiliary long documents was proposed. A topic set was shared by long documents and short texts which came from different data sources, and two dirichlet priors were used to generate the topic allocation of long documents and short texts, which enabled the topic knowledge of long documents to be transferred to short texts and improved understanding of the short text. The experiments showed that the DDMR model had a great effect on the topical discovery of short texts.

Key words: short text understanding, dual dirichlet-multinomial regression model, topic model

