您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 32-37.doi: 10.6040/j.issn.1672-3961.1.2014.163

• 机器学习与数据挖掘 • 上一篇    下一篇

基于实体消歧的中文实体关系抽取

邵发1, 黄银阁1, 周兰江1,2, 郭剑毅1,2, 余正涛1,2, 张金鹏1   

  1. 1. 昆明理工大学 信息工程与自动化学院, 云南 昆明 650500;
    2. 昆明理工大学 智能信息处理重点实验室, 云南 昆明 650500
  • 收稿日期:2013-12-29 修回日期:2014-11-19 出版日期:2014-12-20 发布日期:2013-12-29
  • 通讯作者: 郭剑毅(1964-),女,云南昆明人,教授,硕士,主要研究方向为信息抽取,机器学习和知识构建.E-mail:gjade86@hotmail.com E-mail:gjade86@hotmail.com
  • 作者简介:邵发(1991-),男,河南周口人,硕士研究生,主要研究方向为自然语言处理.E-mail:1129632620@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61262041,61472168);云南省教育厅基金重大专项资助项目(KKJI201203001);云南省科技厅重点资助项目(KKSD201303007)

Chinese entity relation extraction based on entity disambiguation

SHAO Fa1, HUANG Yinge1, ZHOU Lanjiang1,2, GUO Jianyi1,2, YU Zhengtao1,2, ZHANG Jinpeng1   

  1. 1. School of Information Engineering and Automation, Kunming 650500, Yunnan, China;
    2. Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2013-12-29 Revised:2014-11-19 Online:2014-12-20 Published:2013-12-29

摘要: 针对开放文本中中文实体关系抽取的一词多义问题,提出一种基于实体消歧的中文实体关系抽取方法。首先,从知网中挖掘出具有潜在语义关系的实体对,并利用贝叶斯分类的语义消歧方法实现从知网到维基百科的实体映射,以获取高质量的关系实例;然后,根据这些关系实例抽取出其对应文本中共现的句子实例,构建基本的抽取模式;最后通过模式合并的方法生成新模式,再使用新模式来抽取新实例。实验结果表明,该方法与没有进行语义消歧和模式合并的方法相比准确率有所提高。

关键词: 模式合并, 关系抽取, 实体消歧, 维基百科, 贝叶斯分类

Abstract: To solve the polysemy problem in Chinese Entity Relation Extraction in open text, a Chinese entity relation extraction method based on entity disambiguation was proposed. First, mining entity relation pairs from HowNet,and the entities were mapped from HowNet to Wikipedia by using disambiguation method based on Bayesian classification so as to obtain high-quality relationship instance; Then, extracting the sentence instances in the corresponding context with these relation instances, to construct a basic extraction pattern; Finally, extracting new cases use the new pattern. The experimental results showed that the accuracy of the proposed method was higher than the methods without semantic disambiguation and pattern merging.

Key words: Bayesian classification, entity disambiguation, pattern merging, entity relation extraction, Wikipedia

中图分类号: 

  • TP391
[1] RICARDO B Y, BERTHIER R N. Modern information retrieval[M]. New York: ACM press, 1999:3-9.
[2] 王继成, 萧嵘, 孙正兴, 等. Web 信息检索研究进展[J]. 计算机研究与发展, 2001, 38(2):187-193. WANG Jicheng, XIAO Rong, SUN Zhengxing, et al. State of the art of information retrieval on the web[J]. Journal of Computer Research and Development, 2001, 38(2):187-193.
[3] 郑实福, 刘挺,秦兵,等. 自动问答综述[J]. 中文信息学报, 2002, 16(6): 46-52. ZHENG Shifu, LIU Ting, QING Bing, et al. Overview of Question-Answering[J]. Journal of Chinese Information, 2002, 16(6): 46-52.
[4] MOLLA D, VICEDO J L. Question answering in restricted domains: an overview[J]. Computational Linguistics, 2007, 33(1): 41-61.
[5] 程妮, 崔建海, 王军. 国外信息过滤系统的研究综述[J]. 现代图书情报技术, 2005, 21(6): 30-38. CHENG Ni, CUI Jianhai, WANG Jun. Overview of research on foreign information filtering systems[J]. New Technology of Library and Information Service, 2005, 21(6): 30-38.
[6] 刘群. 统计机器翻译综述[J]. 中文信息学报, 2003, 17(4): 1-12. LIU Qun. Survey on statistical machine translation[J]. Journal of Chinese information, 2003, 17(4):1-12.
[7] 杜金华, 张萌, 宗成庆, 等. 中国机器翻译研究的机遇与挑战——第八届全国机器翻译研讨会总结与展望[J]. 中文信息学报, 2013, 27(4): 1-8. DU Jinhua, ZHANG Meng, ZONG Chengqing, et al. Opportunities and challenges for machine translation research in China—summary and prospects for the eighth China workshop on machine translation[J]. Journal of Chinese information, 2013, 27(4):1-8.
[8] LEE J, FINK D. Knowledge mapping: encouragements and impediments to adoption[J]. Journal of Knowledge Management, 2013, 17(1): 16-28.
[9] 赵军,刘康,周光有,等.开放式文本信息抽取[J]. 中文信息学报, 2011, 25(6): 98-110. ZHAO Jun, LIU Kang, ZHOU Guangyou, et al. Open text information extraction[J]. Journal of Chinese information, 2011, 25(6): 98-108.
[10] AGICHTEIN E, GRAVANO L. Snowball: Extracting relations from large plain-text collections[C]//Proceedings of the Fifth ACM Conference on Digital Libraries. New York: Association for Computing Machinery, 2000: 85-94.
[11] WELD D S, HOFFMANN R, WU F. Using wikipedia to bootstrap open information extraction[J]. ACM SIGMOD Record, 2009, 37(4): 62-68.
[12] YAN Y, OKAZAKI N, MATSUO Y, et al. Unsupervised relation extraction by mining Wikipedia texts using information from the Web[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009:1021-1029.
[13] MARIA R C, ENRIQUE A, PABLO C Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia[C]//Proceedings of 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005. Alicante, Spain: Springer Berlin Heidelberg, 2005:67-79.
[14] 张苇如,孙乐,韩先培.基于维基百科和模式聚类的实体关系抽取方法[J].中文信息学报, 2012, 26(2): 75-81. ZHANG Weiru, SUN Le, HAN Xianpei. A entity relation extraction method based on Wikipedia and pattern clustering[J]. Journal of Chinese Information, 2012, 26(2):75-81.
[15] 赵军.命名实体识别,排歧和跨语言关联[J].中文信息学报, 2009, 23(2):3-17. ZHAO Jun. Named entity recognition, WSD and cross language[J]. Journal of Chinese Information, 2009, 23(2): 3-17.
[16] CUCERZAN S. Large-scale named entity disambiguation based on Wikipedia data[C]//EMNLP-CoNLL. 2007. Prague, Czech Republic: DBLP, 2007: 708-716.
[17] 董振东, 董强. 知网和汉语研究[J]. 当代语言学, 2004, 3(1):33-44. DONG Zhendong, DONG Qiang. Construction of a knowledge system and its impact on Chinese research[J]. Contemporary Linguistics, 2004, 3(1):33-44.
[18] 石洪波,王志海,黄厚宽,等.一种限定性的双层贝叶斯分类模型[J].软件学报, 2004, 15(2):193-198. SHI Hongbo, WANG Zhihai, HANG Houkuan, et al. A restricted double-level bayesian classification model[J]. Journal of software, 2004, 15(2):193-198.
[19] 张云涛,龚玲,王永成.基于语料库的朴素贝叶斯方法的词义消歧[J].中南大学学报, 2005, 8(1):483-485. ZHANG Yuntao, GONG Ling, WANG Yongcheng. Corpus-based word sense disambiguation using Naive Bayesian[J].Journal of Central South University, 2005, 8(1):483-485.
[20] NGUYEN D, MATSUO Y, ISHIZUKA M. Subtree mining for relation extraction from Wikipedia[C]//Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. Stroudsburg, PA, USA: Association for Computational Linguistics, 2007:125-128.
[21] 王宏鼎, 谭少华, 唐世渭, 等. 基于模式元素语义关系的模式合并方法研究[J]. 北京大学学报: 自然科学版, 2007, 43(3):405-411. WANG Hongding, TAN Shaohua, TANG Shiwei, et al. Schema Merging Study with semantic relationships of Schema elements[J]. Journal of Peking University: Information Science Edition, 2007, 43(3):405-411.
[1] 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15.
[2] 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26.
[3] 于江德1,赵红丹1,郑勃举1,余正涛2. 基于中文人名用字特征的性别判定方法[J]. 山东大学学报(工学版), 2014, 44(1): 13-18.
[4] 卢玲1,王越2,杨武1. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6): 7-11.
[5] 朱娜娜1, 2, 张化祥1, 2*, 刘丽1, 2. 基于改进FCM算法和贝叶斯分类的图像自动标注[J]. 山东大学学报(工学版), 2013, 43(6): 12-16.
[6] 雷春雅1,郭剑毅1,2,余正涛1,2,毛存礼1,2,张少敏1,黄甫1. 基于自扩展与最大熵的领域实体关系自动抽取[J]. 山东大学学报(工学版), 2010, 40(5): 141-145.
[7] 崔宝今 林鸿飞 张霄. 基于半监督学习的蛋白质关系抽取研究[J]. 山东大学学报(工学版), 2009, 39(3): 16-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!