基于实体消歧的中文实体关系抽取

doi:10.6040/j.issn.1672-3961.1.2014.163

山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 32-37.doi: 10.6040/j.issn.1672-3961.1.2014.163

基于实体消歧的中文实体关系抽取

邵发¹, 黄银阁¹, 周兰江^1,2, 郭剑毅^1,2, 余正涛^1,2, 张金鹏¹

1. 昆明理工大学信息工程与自动化学院, 云南昆明 650500;
2. 昆明理工大学智能信息处理重点实验室, 云南昆明 650500

收稿日期:2013-12-29 修回日期:2014-11-19 出版日期:2014-12-20 发布日期:2013-12-29
通讯作者: 郭剑毅(1964-),女,云南昆明人,教授,硕士,主要研究方向为信息抽取,机器学习和知识构建.E-mail:gjade86@hotmail.com E-mail:gjade86@hotmail.com
作者简介:邵发(1991-),男,河南周口人,硕士研究生,主要研究方向为自然语言处理.E-mail:1129632620@qq.com
基金资助:
国家自然科学基金资助项目(61262041,61472168);云南省教育厅基金重大专项资助项目(KKJI201203001);云南省科技厅重点资助项目(KKSD201303007)

Chinese entity relation extraction based on entity disambiguation

SHAO Fa¹, HUANG Yinge¹, ZHOU Lanjiang^1,2, GUO Jianyi^1,2, YU Zhengtao^1,2, ZHANG Jinpeng¹

1. School of Information Engineering and Automation, Kunming 650500, Yunnan, China;
2. Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, Yunnan, China

Received:2013-12-29 Revised:2014-11-19 Online:2014-12-20 Published:2013-12-29

摘要/Abstract

摘要： 针对开放文本中中文实体关系抽取的一词多义问题,提出一种基于实体消歧的中文实体关系抽取方法。首先,从知网中挖掘出具有潜在语义关系的实体对,并利用贝叶斯分类的语义消歧方法实现从知网到维基百科的实体映射,以获取高质量的关系实例;然后,根据这些关系实例抽取出其对应文本中共现的句子实例,构建基本的抽取模式;最后通过模式合并的方法生成新模式,再使用新模式来抽取新实例。实验结果表明,该方法与没有进行语义消歧和模式合并的方法相比准确率有所提高。

关键词: 模式合并, 关系抽取, 实体消歧, 维基百科, 贝叶斯分类

Abstract: To solve the polysemy problem in Chinese Entity Relation Extraction in open text, a Chinese entity relation extraction method based on entity disambiguation was proposed. First, mining entity relation pairs from HowNet,and the entities were mapped from HowNet to Wikipedia by using disambiguation method based on Bayesian classification so as to obtain high-quality relationship instance; Then, extracting the sentence instances in the corresponding context with these relation instances, to construct a basic extraction pattern; Finally, extracting new cases use the new pattern. The experimental results showed that the accuracy of the proposed method was higher than the methods without semantic disambiguation and pattern merging.

Key words: Bayesian classification, entity disambiguation, pattern merging, entity relation extraction, Wikipedia

中图分类号:

TP391

邵发, 黄银阁, 周兰江, 郭剑毅, 余正涛, 张金鹏. 基于实体消歧的中文实体关系抽取[J]. 山东大学学报(工学版), 2014, 44(6): 32-37.

SHAO Fa, HUANG Yinge, ZHOU Lanjiang, GUO Jianyi, YU Zhengtao, ZHANG Jinpeng. Chinese entity relation extraction based on entity disambiguation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2014, 44(6): 32-37.

参考文献

[1] RICARDO B Y, BERTHIER R N. Modern information retrieval[M]. New York: ACM press, 1999:3-9.
[2] 王继成, 萧嵘, 孙正兴, 等. Web 信息检索研究进展[J]. 计算机研究与发展, 2001, 38(2):187-193. WANG Jicheng, XIAO Rong, SUN Zhengxing, et al. State of the art of information retrieval on the web[J]. Journal of Computer Research and Development, 2001, 38(2):187-193.
[3] 郑实福, 刘挺,秦兵,等. 自动问答综述[J]. 中文信息学报, 2002, 16(6): 46-52. ZHENG Shifu, LIU Ting, QING Bing, et al. Overview of Question-Answering[J]. Journal of Chinese Information, 2002, 16(6): 46-52.
[4] MOLLA D, VICEDO J L. Question answering in restricted domains: an overview[J]. Computational Linguistics, 2007, 33(1): 41-61.
[5] 程妮, 崔建海, 王军. 国外信息过滤系统的研究综述[J]. 现代图书情报技术, 2005, 21(6): 30-38. CHENG Ni, CUI Jianhai, WANG Jun. Overview of research on foreign information filtering systems[J]. New Technology of Library and Information Service, 2005, 21(6): 30-38.
[6] 刘群. 统计机器翻译综述[J]. 中文信息学报, 2003, 17(4): 1-12. LIU Qun. Survey on statistical machine translation[J]. Journal of Chinese information, 2003, 17(4):1-12.
[7] 杜金华, 张萌, 宗成庆, 等. 中国机器翻译研究的机遇与挑战——第八届全国机器翻译研讨会总结与展望[J]. 中文信息学报, 2013, 27(4): 1-8. DU Jinhua, ZHANG Meng, ZONG Chengqing, et al. Opportunities and challenges for machine translation research in China—summary and prospects for the eighth China workshop on machine translation[J]. Journal of Chinese information, 2013, 27(4):1-8.
[8] LEE J, FINK D. Knowledge mapping: encouragements and impediments to adoption[J]. Journal of Knowledge Management, 2013, 17(1): 16-28.
[9] 赵军,刘康,周光有,等.开放式文本信息抽取[J]. 中文信息学报, 2011, 25(6): 98-110. ZHAO Jun, LIU Kang, ZHOU Guangyou, et al. Open text information extraction[J]. Journal of Chinese information, 2011, 25(6): 98-108.
[10] AGICHTEIN E, GRAVANO L. Snowball: Extracting relations from large plain-text collections[C]//Proceedings of the Fifth ACM Conference on Digital Libraries. New York: Association for Computing Machinery, 2000: 85-94.
[11] WELD D S, HOFFMANN R, WU F. Using wikipedia to bootstrap open information extraction[J]. ACM SIGMOD Record, 2009, 37(4): 62-68.
[12] YAN Y, OKAZAKI N, MATSUO Y, et al. Unsupervised relation extraction by mining Wikipedia texts using information from the Web[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009:1021-1029.
[13] MARIA R C, ENRIQUE A, PABLO C Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia[C]//Proceedings of 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005. Alicante, Spain: Springer Berlin Heidelberg, 2005:67-79.
[14] 张苇如,孙乐,韩先培.基于维基百科和模式聚类的实体关系抽取方法[J].中文信息学报, 2012, 26(2): 75-81. ZHANG Weiru, SUN Le, HAN Xianpei. A entity relation extraction method based on Wikipedia and pattern clustering[J]. Journal of Chinese Information, 2012, 26(2):75-81.
[15] 赵军.命名实体识别,排歧和跨语言关联[J].中文信息学报, 2009, 23(2):3-17. ZHAO Jun. Named entity recognition, WSD and cross language[J]. Journal of Chinese Information, 2009, 23(2): 3-17.
[16] CUCERZAN S. Large-scale named entity disambiguation based on Wikipedia data[C]//EMNLP-CoNLL. 2007. Prague, Czech Republic: DBLP, 2007: 708-716.
[17] 董振东, 董强. 知网和汉语研究[J]. 当代语言学, 2004, 3(1):33-44. DONG Zhendong, DONG Qiang. Construction of a knowledge system and its impact on Chinese research[J]. Contemporary Linguistics, 2004, 3(1):33-44.
[18] 石洪波,王志海,黄厚宽,等.一种限定性的双层贝叶斯分类模型[J].软件学报, 2004, 15(2):193-198. SHI Hongbo, WANG Zhihai, HANG Houkuan, et al. A restricted double-level bayesian classification model[J]. Journal of software, 2004, 15(2):193-198.
[19] 张云涛,龚玲,王永成.基于语料库的朴素贝叶斯方法的词义消歧[J].中南大学学报, 2005, 8(1):483-485. ZHANG Yuntao, GONG Ling, WANG Yongcheng. Corpus-based word sense disambiguation using Naive Bayesian[J].Journal of Central South University, 2005, 8(1):483-485.
[20] NGUYEN D, MATSUO Y, ISHIZUKA M. Subtree mining for relation extraction from Wikipedia[C]//Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. Stroudsburg, PA, USA: Association for Computational Linguistics, 2007:125-128.
[21] 王宏鼎, 谭少华, 唐世渭, 等. 基于模式元素语义关系的模式合并方法研究[J]. 北京大学学报: 自然科学版, 2007, 43(3):405-411. WANG Hongding, TAN Shaohua, TANG Shiwei, et al. Schema Merging Study with semantic relationships of Schema elements[J]. Journal of Peking University: Information Science Edition, 2007, 43(3):405-411.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于实体消歧的中文实体关系抽取

Chinese entity relation extraction based on entity disambiguation

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

多维度评价

本文评价

推荐阅读 0

[1]	徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15.
[2]	刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26.
[3]	于江德1,赵红丹1,郑勃举1,余正涛2. 基于中文人名用字特征的性别判定方法[J]. 山东大学学报(工学版), 2014, 44(1): 13-18.
[4]	卢玲1,王越2,杨武1. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6): 7-11.
[5]	朱娜娜1, 2, 张化祥1, 2*, 刘丽1, 2. 基于改进FCM算法和贝叶斯分类的图像自动标注[J]. 山东大学学报(工学版), 2013, 43(6): 12-16.
[6]	雷春雅1,郭剑毅1,2,余正涛1,2,毛存礼1,2,张少敏1,黄甫1. 基于自扩展与最大熵的领域实体关系自动抽取[J]. 山东大学学报(工学版), 2010, 40(5): 141-145.
[7]	崔宝今林鸿飞张霄. 基于半监督学习的蛋白质关系抽取研究[J]. 山东大学学报(工学版), 2009, 39(3): 16-21.