山东大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (6): 32-37.doi: 10.6040/j.issn.1672-3961.1.2014.163
邵发1, 黄银阁1, 周兰江1,2, 郭剑毅1,2, 余正涛1,2, 张金鹏1
SHAO Fa1, HUANG Yinge1, ZHOU Lanjiang1,2, GUO Jianyi1,2, YU Zhengtao1,2, ZHANG Jinpeng1
摘要: 针对开放文本中中文实体关系抽取的一词多义问题,提出一种基于实体消歧的中文实体关系抽取方法。首先,从知网中挖掘出具有潜在语义关系的实体对,并利用贝叶斯分类的语义消歧方法实现从知网到维基百科的实体映射,以获取高质量的关系实例;然后,根据这些关系实例抽取出其对应文本中共现的句子实例,构建基本的抽取模式;最后通过模式合并的方法生成新模式,再使用新模式来抽取新实例。实验结果表明,该方法与没有进行语义消歧和模式合并的方法相比准确率有所提高。
中图分类号:
[1] RICARDO B Y, BERTHIER R N. Modern information retrieval[M]. New York: ACM press, 1999:3-9. [2] 王继成, 萧嵘, 孙正兴, 等. Web 信息检索研究进展[J]. 计算机研究与发展, 2001, 38(2):187-193. WANG Jicheng, XIAO Rong, SUN Zhengxing, et al. State of the art of information retrieval on the web[J]. Journal of Computer Research and Development, 2001, 38(2):187-193. [3] 郑实福, 刘挺,秦兵,等. 自动问答综述[J]. 中文信息学报, 2002, 16(6): 46-52. ZHENG Shifu, LIU Ting, QING Bing, et al. Overview of Question-Answering[J]. Journal of Chinese Information, 2002, 16(6): 46-52. [4] MOLLA D, VICEDO J L. Question answering in restricted domains: an overview[J]. Computational Linguistics, 2007, 33(1): 41-61. [5] 程妮, 崔建海, 王军. 国外信息过滤系统的研究综述[J]. 现代图书情报技术, 2005, 21(6): 30-38. CHENG Ni, CUI Jianhai, WANG Jun. Overview of research on foreign information filtering systems[J]. New Technology of Library and Information Service, 2005, 21(6): 30-38. [6] 刘群. 统计机器翻译综述[J]. 中文信息学报, 2003, 17(4): 1-12. LIU Qun. Survey on statistical machine translation[J]. Journal of Chinese information, 2003, 17(4):1-12. [7] 杜金华, 张萌, 宗成庆, 等. 中国机器翻译研究的机遇与挑战——第八届全国机器翻译研讨会总结与展望[J]. 中文信息学报, 2013, 27(4): 1-8. DU Jinhua, ZHANG Meng, ZONG Chengqing, et al. Opportunities and challenges for machine translation research in China—summary and prospects for the eighth China workshop on machine translation[J]. Journal of Chinese information, 2013, 27(4):1-8. [8] LEE J, FINK D. Knowledge mapping: encouragements and impediments to adoption[J]. Journal of Knowledge Management, 2013, 17(1): 16-28. [9] 赵军,刘康,周光有,等.开放式文本信息抽取[J]. 中文信息学报, 2011, 25(6): 98-110. ZHAO Jun, LIU Kang, ZHOU Guangyou, et al. Open text information extraction[J]. Journal of Chinese information, 2011, 25(6): 98-108. [10] AGICHTEIN E, GRAVANO L. Snowball: Extracting relations from large plain-text collections[C]//Proceedings of the Fifth ACM Conference on Digital Libraries. New York: Association for Computing Machinery, 2000: 85-94. [11] WELD D S, HOFFMANN R, WU F. Using wikipedia to bootstrap open information extraction[J]. ACM SIGMOD Record, 2009, 37(4): 62-68. [12] YAN Y, OKAZAKI N, MATSUO Y, et al. Unsupervised relation extraction by mining Wikipedia texts using information from the Web[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009:1021-1029. [13] MARIA R C, ENRIQUE A, PABLO C Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia[C]//Proceedings of 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005. Alicante, Spain: Springer Berlin Heidelberg, 2005:67-79. [14] 张苇如,孙乐,韩先培.基于维基百科和模式聚类的实体关系抽取方法[J].中文信息学报, 2012, 26(2): 75-81. ZHANG Weiru, SUN Le, HAN Xianpei. A entity relation extraction method based on Wikipedia and pattern clustering[J]. Journal of Chinese Information, 2012, 26(2):75-81. [15] 赵军.命名实体识别,排歧和跨语言关联[J].中文信息学报, 2009, 23(2):3-17. ZHAO Jun. Named entity recognition, WSD and cross language[J]. Journal of Chinese Information, 2009, 23(2): 3-17. [16] CUCERZAN S. Large-scale named entity disambiguation based on Wikipedia data[C]//EMNLP-CoNLL. 2007. Prague, Czech Republic: DBLP, 2007: 708-716. [17] 董振东, 董强. 知网和汉语研究[J]. 当代语言学, 2004, 3(1):33-44. DONG Zhendong, DONG Qiang. Construction of a knowledge system and its impact on Chinese research[J]. Contemporary Linguistics, 2004, 3(1):33-44. [18] 石洪波,王志海,黄厚宽,等.一种限定性的双层贝叶斯分类模型[J].软件学报, 2004, 15(2):193-198. SHI Hongbo, WANG Zhihai, HANG Houkuan, et al. A restricted double-level bayesian classification model[J]. Journal of software, 2004, 15(2):193-198. [19] 张云涛,龚玲,王永成.基于语料库的朴素贝叶斯方法的词义消歧[J].中南大学学报, 2005, 8(1):483-485. ZHANG Yuntao, GONG Ling, WANG Yongcheng. Corpus-based word sense disambiguation using Naive Bayesian[J].Journal of Central South University, 2005, 8(1):483-485. [20] NGUYEN D, MATSUO Y, ISHIZUKA M. Subtree mining for relation extraction from Wikipedia[C]//Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. Stroudsburg, PA, USA: Association for Computational Linguistics, 2007:125-128. [21] 王宏鼎, 谭少华, 唐世渭, 等. 基于模式元素语义关系的模式合并方法研究[J]. 北京大学学报: 自然科学版, 2007, 43(3):405-411. WANG Hongding, TAN Shaohua, TANG Shiwei, et al. Schema Merging Study with semantic relationships of Schema elements[J]. Journal of Peking University: Information Science Edition, 2007, 43(3):405-411. |
[1] | 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15. |
[2] | 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. |
[3] | 于江德1,赵红丹1,郑勃举1,余正涛2. 基于中文人名用字特征的性别判定方法[J]. 山东大学学报(工学版), 2014, 44(1): 13-18. |
[4] | 卢玲1,王越2,杨武1. 一种基于朴素贝叶斯的中文评论情感分类方法研究[J]. 山东大学学报(工学版), 2013, 43(6): 7-11. |
[5] | 朱娜娜1, 2, 张化祥1, 2*, 刘丽1, 2. 基于改进FCM算法和贝叶斯分类的图像自动标注[J]. 山东大学学报(工学版), 2013, 43(6): 12-16. |
[6] | 雷春雅1,郭剑毅1,2,余正涛1,2,毛存礼1,2,张少敏1,黄甫1. 基于自扩展与最大熵的领域实体关系自动抽取[J]. 山东大学学报(工学版), 2010, 40(5): 141-145. |
[7] | 崔宝今 林鸿飞 张霄. 基于半监督学习的蛋白质关系抽取研究[J]. 山东大学学报(工学版), 2009, 39(3): 16-21. |
|