山东大学学报(工学版) ›› 2009, Vol. 39 ›› Issue (5): 27-31.

董乃鹏 赵合计 SCHOMMER Christoph   

  1. 董乃鹏,赵合计:山东大学计算机科学与技术学院, 山东 济南 250101; SCHOMMER Christoph: 卢森堡大学信息与计算机学院, 卢森堡,  2311
  • 收稿日期:2008-12-10 出版日期:2009-10-16 发布日期:2009-10-16

A fingerprint engine for author profiling

  1. 1. Department of Computer Science and Technology, Shandong University, Jinan 250101, China;
    2. Department of Information and Computer Sciences, Luxembourg 2311, Luxembourg
  • Received:2008-12-10 Online:2009-10-16 Published:2009-10-16
  • About author:DONG Nai-peng(1982-),female, born in Shandong Zouping, Master candidate, her main research is software engineering.  E-mail:dongnaipeng@gmial.com



关键词: 作者特征提取;文本处理;自然语言处理;数据挖掘;人工智能


With the development of the internet, digital texts are proliferating. Protection of a copyright has become increasingly important in recent years. To solve the copyright problem, one way is to profile an author's writing style. By comparing writing styles, we could tell whether a text has been written by a certain author. Most of the current researches in author profiling focused on examining linguistic attributes or finding new attributes. However, the appropriate profiling of an author is still a challenging task. This paper aims to build a model to fingerprint an author, and took texts of an author of a certain domain as input and produced a profile of the author as output. Using this fingerprint engine we can tell with a certain probability whether an input text has been written by an author among a list of possible authors. This paper focused on author profiling of English texts. Writing styles were measured using linguistic attributes and linguistic measurements. Statistical methods, such as standard deviation analysis and principal components analysis, were used to evaluate the linguistic measurement's efficiency.

Key words: author profiling; text analysis; natural language processing; data mining; artificial intelligence

