山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (2): 78-87.doi: 10.6040/j.issn.1672-3961.0.2024.050
• 机器学习与数据挖掘 • 上一篇
王禹鸥1,苑迎春1,2*,何振学1,王克俭1
WANG Yuou1, YUAN Yingchun1,2*, HE Zhenxue1, WANG Kejian1
摘要: 针对远程监督关系抽取不能充分利用句子上下文高层信息、易带来噪声标注的问题,提出一种基于改进鲁棒优化的双向编码器表征预训练模型(robustly optimized bidirectional encoder representations from Transformers pretraining approach, RoBERTa)、多实例学习(multiple-instance learning, MI)和双重注意力(dual attention, DA)机制的关系抽取方法。在RoBERTa中引入全词动态掩码,获取文本上下文信息,获得词级别语义向量;将特征向量输入双向门控循环单元(bidirectional gated recurrent unit, BiGRU),挖掘文本深层次语义表征;引入多实例学习,通过学习实例级别特征缩小关系抽取类别范围;引入双重注意力机制,结合词语级注意力机制和句子级注意力机制的优势,充分捕捉句子中实体词语特征信息和对有效语句的关注度,增强句子表达能力。试验结果表明,在公开数据集纽约时报(New York Times, NYT)数据集和谷歌IISc远程监督(Google IISc distant supervision, GIDS)数据集中,关系抽取方法的F1值分别为88.63%、90.13%,均优于主流对比方法,能够有效降低远程监督噪声影响,实现关系抽取,为构建知识图谱提供理论基础。
中图分类号:
[1] 袁泉, 陈昌平, 陈泽, 等. 基于BERT的两次注意力机制远程监督关系抽取[J]. 计算机应用, 2024, 44(4): 1080-1085. YUAN Quan, CHEN Changping, CHEN Ze, et al. Twice attention mechanism distantly supervised relation extraction based on BERT[J]. Journal of Computer Applications, 2024, 44(4): 1080-1085. [2] GRISHMAN R. Information extraction: techniques and challenges[C] //Information Extraction A Multidis-ciplinary Approach to an Emerging Information Technology. Berlin, Germany: Springer, 1997: 10-27. [3] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600. LIU Qiao, LI Yang, DUAN Hong, et al. Knowledge graph construction techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600. [4] 王传栋, 徐娇, 张永. 实体关系抽取综述[J]. 计算机工程与应用, 2020, 56(12): 25-36. WANG Chuandong, XU Jiao, ZHANG Yong. Survey of entity relation extraction[J]. Computer Engineering and Applications, 2020, 56(12): 25-36. [5] 李枫林, 柯佳. 基于深度学习框架的实体关系抽取研究进展[J]. 情报科学, 2018, 36(3): 169-176. LI Fenglin, KE Jia. Research progress of entity relation extraction base on deep learning framework[J]. Information Science, 2018, 36(3): 169-176. [6] MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data[C] //Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Singapore:ACL, 2009: 1003-1011. [7] BONAN M, RALPH G, LI W. Distant supervision for relation extraction with an incomplete knowledge base[C] //Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, USA: ACL, 2013: 777-782. [8] 郑志蕴, 徐亚媚, 李伦, 等. 融合位置特征注意力与关系增强机制的远程监督关系抽取[J]. 小型微型计算机系统, 2023, 44(12): 2678-2684. ZHENG Zhiyun, XU Yamei, LI Lun, et al. Distantly supervised relation extraction with position feature attention and relation enhancement[J]. Journal of Chinese Computer Systems, 2023, 44(12): 2678-2684. [9] 张欢, 李卫疆. 基于类型注意力和GCN的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(2): 316-324. ZHANG Huan, LI Weijiang. Distant supervision relation extraction based on type attention and GCN[J]. Computer Engineering & Science, 2024, 46(2): 316-324. [10] 崔仕林, 闫蓉. 基于SoftLexicon和注意力机制的中文因果关系抽取[J]. 中文信息学报, 2023, 37(4):81-89. CUI Shilin, YAN Rong. Chinese causality extraction based on SoftLexicon and attention mechanism[J]. Journal of Chinese Information Processing, 2023, 37(4): 81-89. [11] 李浩, 刘永坚, 解庆, 等. 基于多层次注意力机制的远程监督关系抽取模型[J]. 计算机科学, 2019, 46(10): 252-257. LI Hao, LIU Yongjian, XIE Qing, et al. Distant supervision relation extraction model based on multi-level attention mechanism[J]. Computer Science, 2019, 46(10): 252-257. [12] WEI Q, JI Z C, SI Y Q, et al. Relation extraction from clinical narratives using pre-trained language models[C] // American Medical Informatics Association Annual Symposium. Washington, USA: AMIA, 2019: 1236-1245. [13] SU P, VIJAY-SHANKER K. Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction[J]. BMC Bioinformatics, 2022, 23(1): 120. [14] FENG P, ZHANG X, ZHAO J, et al. Relation extraction based on prompt information and feature reuse[J]. Data Intelligence, 2023, 5(3): 824-840. [15] FAN C Y. The entity relationship extraction method using improved RoBERTa and multi-task learning[J]. Computers, Materials & Continua, 2023, 77(2): 1719-1738. [16] YE Q, CAI T T, JI X, et al. Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts[J]. BMC Medical Informatics and Decision Making, 2023, 23(1): 34. [17] 张鲁, 段友祥, 刘娟, 等. 基于RoBERTa和加权图卷积网络的中文地质实体关系抽取[J]. 计算机科学, 2024, 51(8): 297-303. ZHANG Lu, DUAN Youxiang, LIU Juan, et al. Chinese geological entity relation extraction based on RoBERTa and weighted graph convolutional networks[J]. Computer Science, 2024, 51(8): 297-303. [18] ZHOU G D, SU J, ZHANG J, et al. Exploring various knowledge in relation extraction[C] //Proceedings of the 43rd Annual Meeting on Association for Compu-tational Linguistics. Ann Arbor, USA: ACL, 2005: 427-434. [19] JIANG X, WANG Q, LI P, et al. Relation extraction with multi-instance multi-label convolutional neural networks[C] //Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics. Osaka, Japan: COLING, 2016: 1471-1480. [20] 刘哲. 基于句子级注意力机制的远程监督实体关系抽取[D]. 南昌: 江西财经大学, 2021. LIU Zhe. Distant supervised entity relationship extraction based on sentence-level attention mechanism[D]. Nanchang: Jiangxi University of Finance and Eco-nomics, 2021. [21] YUAN Y, LIU L, TANG S, et al. Cross-relation cross-bag attention for distantly-supervised relation extraction[C] //Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. Honolulu, USA: AAAI, 2019: 419-426. [22] 王红, 李晗, 李浩飞. 民航突发事件领域本体关系提取方法的研究[J]. 计算机科学与探索, 2020, 14(2): 285-293. WANG Hong, LI Han, LI Haofei. Research of relation extraction method of civil aviation emergency domain ontology[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(2): 285-293. [23] ZHANG J, CAO M L. Distant supervision for relation extraction with hierarchical attention-based networks[J]. Expert Systems with Applications, 2023, 220: 119727. [24] LI R, XIAO Q, YANG J X, et al. Few-shot relation extraction via the entity feature enhancement and attention-based prototypical network[J]. International Journal of Intelligent Systems, 2023, 1: 1186977. [25] ZHAI Z W, FAN R L, HUANG J, et al. A novel joint extraction model based on cross-attention mechanism and global pointer using context shield window[J]. Computer Speech & Language, 2024, 87: 101643. [26] HUANG S J, CHEN Y, ZHOU E J, et al. A RoBERTa based model for identifying non-substantive factual elements of the case[C] // 2021 2nd International Conference on Intelligent Computing and Human-Computer Interaction(ICHCI). Shenyang, China: IEEE, 2021: 65-71. [27] ZHU Z D, SU J D, HONG X B. Improving relation extraction using semantic role and multi-task learning[C] //Proceedings of the Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence. Singapore: Springer, 2021: 93-105. [28] ZENG D, LIU K, CHEN Y, et al. Distant supervision for relation extraction via piecewise convolutional neural networks[C] //Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: ACL, 2015: 1753-1762. [29] JAT S,KHANDELWAL S, TALUKDAR P. Improving distantly supervised relation extraction using word and entity based attention[EB/OL].(2018-04-19)[2024-03-05]. https://arxiv.org/abs/1804.06987 [30] BASTOS A, NADGERI A, SINGH K, et al. RECON:relation extraction using knowledge graph context in a graph neural network[C] //Proceedings of the Web Conference 2021. Ljubljana, Slovenia: ACM, 2021: 1673-1685. [31] 赵晋斌, 王琦, 马黎雨, 等. 基于知识图谱的远程监督关系抽取降噪方法[J]. 火力与指挥控制, 2023, 48(10): 160-169. ZHAO Jinbin, WANG Qi, MA Liyu, et al. A noise reduction method for distant supervision relation extraction based on knowledge graph[J]. Fire Control & Command Control, 2023, 48(10): 160-169. [32] CAI F Z, HU Q, ZHOU R J, et al. REEGAT: RoBERTa entity embedding and graph attention networks enhanced sentence representation for relation extraction[J]. Electronics, 2023, 12(11): 2429. |
[1] | 郑泾飞,廖永新,王华珍,何霆. 基于提及图和显式路径的文档级关系抽取方法[J]. 山东大学学报 (工学版), 2023, 53(6): 16-25. |
[2] | 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15. |
[3] | 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. |
[4] | 邵发, 黄银阁, 周兰江, 郭剑毅, 余正涛, 张金鹏. 基于实体消歧的中文实体关系抽取[J]. 山东大学学报(工学版), 2014, 44(6): 32-37. |
[5] | 雷春雅1,郭剑毅1,2,余正涛1,2,毛存礼1,2,张少敏1,黄甫1. 基于自扩展与最大熵的领域实体关系自动抽取[J]. 山东大学学报(工学版), 2010, 40(5): 141-145. |
[6] | 崔宝今 林鸿飞 张霄. 基于半监督学习的蛋白质关系抽取研究[J]. 山东大学学报(工学版), 2009, 39(3): 16-21. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||
Full text 2
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Abstract 26
|
|
|||||||||||||||||||||||||||||||||||||||||||||
Cited |
|
|||||||||||||||||||||||||||||||||||||||||||||
Shared | ||||||||||||||||||||||||||||||||||||||||||||||
Discussed |
|