您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (2): 22-26.doi: 10.6040/j.issn.1672-3961.1.2014.259

• 机器学习与数据挖掘 • 上一篇    下一篇

一种基于树核函数的半监督关系抽取方法研究

刘晓勇   

  1. 广东技术师范学院计算机科学学院, 广东 广州 510665
  • 收稿日期:2014-03-26 修回日期:2014-10-15 出版日期:2015-04-20 发布日期:2014-03-26
  • 作者简介:刘晓勇(1979-),男,河南信阳人,副教授,博士,主要研究方向为数据挖掘与智能优化算法.E-mail:lxyong420@126.com
  • 基金资助:
    广东高校优秀青年教师培养计划资助项目(Yq2013108)

A semi-supervised method based on tree kernel for relationship extraction

LIU Xiaoyong   

  1. School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, Guangdong, China
  • Received:2014-03-26 Revised:2014-10-15 Online:2015-04-20 Published:2014-03-26

摘要: 为了解决传统的半监督关系抽取算法易产生的"语义变异"问题,提出一种新的基于树核函数的半监督关系抽取算法。该算法主要采用树核函数和种子集约束扩展两个策略,弱化"语义变异"现象带来的关系抽取不够准确的问题,提高关系识别的正确率。在基准数据集PopBank上的试验研究表明,提出的使用约束机制扩充种子集的半监督学习方法在4个评价指标上(Precision, Recall, F-measure, Accuracy)均优于常用的两种关系抽取方法,从而验证了该算法与其他算法相比能够具有较好的关系抽取能力。

关键词: 支持向量机, 语义变异, 树核函数, 关系抽取, 半监督方法

Abstract: It was difficult for traditional semi-supervised relation extraction methods to solve "semantic variation" problem. A new semi-supervised relation extraction algorithm based on ensemble learning was prorosed and named L-EC-RE, which used two strategies, one was tree kernel and the other was constrained extension seed set. Experimental study on PopBank benchmark data sets showed that L-EC-RE had better performance than two usual relation extraction algorithms in four assessment criteria, which were Precision, Recall, F-measure and Accuracy.

Key words: relationship extraction, semi-supervised method, semantic variation, tree kernel, support vector machine

中图分类号: 

  • TP181
[1] MONCECCHI G, MINEL J L, WONSEVER D.A survey of kernel methods for relation extraction[C]//Proceedings of Workshop on NLP and Web-based technologies. Bahía Blanca, Argentine:Springer, 2010:1-9.
[2] ZHANG Z. Weakly-supervised relation classification for information extraction[C]//Proceedings of the thirteenth ACM international conference on Information and knowledge management. Washington D C, USA:ACM, 2004:581-588.
[3] CHEN J, JI D, TAN C L, et al. Relation extraction using label propagation based semi-supervised learning[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Sydney, Australia:Association for Computational Linguistics, 2006:129-136.
[4] CHEN J, JI D, TAN C L, et al. Semi-supervised relation extraction with label propagation[C]//Proceedings of the Human Language Technology Conference of the NAACL. New York, USA:Association for Computational Linguistics, 2006:25-28.
[5] QIAN L, ZHOU G, KONG F, et al. Semi-supervised learning for semantic relation classification using stratified sampling strategy[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: Association for Computational Linguistics, 2009:1437-1445.
[6] ROZENFELD B, FELDMAN R. Self-supervised relation extraction from the Web[J]. Knowledge and Information Systems, 2008, 17(1):17-33.
[7] GREENWOOD M A, STEVENSON M. Improving semi-supervised acquisition of relation extraction patterns[C]//Proceedings of the Workshop on Information Extraction Beyond the Document. Sydney, Australia:Association for Computational Linguistics, 2006:29-35.
[8] XU F Y, USZKOREIT H, LI H. A seed-driven bottom-up machine learning framework for extracting relations of various complexity[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic:Association for Computational Linguistics, 2007:584-591.
[9] XU F Y, USZKOREIT H, LI Hong, et al. Adaptation of Relation Extraction Rules to New Domains[C]//Proceedings of the Poster Session of the Sixth International Conference on Language Resources and Evaluation. Marrakech, Morocco: European Language Resources Association, 2008:2446-2450.
[10] USZKOREIT H, XU F Y, LI H. Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction[M]//HORACEK H, METAIS E, MUNOZ R, et al. Natural Language Processing and Information Systems. Berlin:Springer-Verlag Berlin, 2010:8-23.
[11] XU F Y, USZKOREIT Hanz, SEBASTIAN Krause, et al. Boosting Relation Extraction with Limited Closed-World Knowledge[C]//Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China:Association for Computational Linguistics, 2010:1354-1362.
[12] 何婷婷, 徐超, 李晶. 基于种子自扩展的命名实体关系抽取方法[J]. 计算机工程, 2006, 32(21):183-184. HE Tingting, XU Chao, LI Jing. Named entity relation extraction method based on seed self-expansion[J]. Computer Engineering, 2006, 32(21):183-184.
[13] 陈锦秀, 姬东鸿. 基于图的半监督关系抽取[J]. 软件学报, 2008, 19(11):2843-2852. CHEN Jinxiu, JI Donghong. Graph-based semi-supervised relation extraction[J]. Journal of Software, 2008, 19(11):2843-2852.
[14] 崔宝今,林鸿飞,张霄.基于半监督学习的蛋白质关系投取研究[J]. 山东大学学报:工学版,2009,39(3):16-21. CUI Baojin,LIN Hongfei,ZHANG Xiao. Research of protein-protein interaction extraction based on semi-supervised learning[J].Journal of Shandong University:Engineering Science, 2009, 39(3):16-21.
[15] 王艳华,杨志豪,李彦鹏,等. 基于监督学习和半监督学习的蛋白质关系抽取[J]. 江西师范大学学报:自然科学版,2013,37(4):392-396. WANG Yanhua, YANG Zhihao, LI Yanpeng, et al. Protein interaction extraction based on the combination of supervised and semi-supervised learning method[J].Journal of Jiangxi Normal University:Natural Science Edition, 2013, 37(4):392-396.
[16] 王艳华. 面向生物医学领域的信息抽取研究[D].大连:大连理工大学,2013:12-22. WANG Yanhua. A study of information extraction for biomedical field[D].Dalian: Dalian University of Technology, 2013:12-22.
[17] 陈立玮,冯岩松,赵东岩. 基于弱监督学习的海量网络数据关系抽取[J]. 计算机研究与发展,2013, 50(9):1825-1835. CHEN Liwei, FENG Yansong, ZHAO Dongyan. Extracting relations from the web via weakly suprevised learning[J].Journal of Computer Research and Development, 2013, 50(9):1825-1835.
[18] 程显毅,朱倩. 未定义类型的关系抽取的半监督学习框架研究[J]. 南京大学学报:自然科学版, 2012,48(4):466-474. CHENG Xianyi, ZHU Qian. A study of relation extraction of undefined relation type based on semi-supervised learning framework[J].Journal of Nanjing University:Natural Sciences, 2012, 48(4):466-474.
[19] ABNEY S. Bootstrpping[C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA:Association for Computational Linguistics, 2002:360-367.
[20] CURRAN J R, MURPHY T, SCHOLZ B. Minimising semantic drift with Mutual Exclusion Bootstrapping[C]//Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics. Melbourne, Australia:Pacic Association for Computational Linguistics, 2007:172-180.
[21] COLLINS M, DUFFY N, PARK F. New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA:Association for Computational Linguistics, 2002:263-270.
[22] MOSCHITTI A. A study on convolution kernels for shallow semantic parsing[C]//Proceedings of the 42th Conference on Association for Computational Linguistic.Barcelona, Spain:Association for Computational Linguistics, 2004:335-342.
[23] MOSCHITTI A. Making tree kernels practical for natural language learning[C]//Proceedings of the 11th International Conference on European Association for Computational Linguistics(EACL). Trento, Italy:Association for Computer Linguistics, 2006:113-120.
[24] VISHWANATHAN S V N, SMOLA A J. Fast kernels for string and tree matching[C]//Proceedings of 18th Annual Conference on Neural Information Processing Systems. Quebec, Canada:[s.n.], 2004:113-130.
[25] Department of Linguistics, University of Colorado Boulder. PopBank[DB/OL]. (2012-06-24)[2014-03-26].http://verbs.colorado.edu/~mpalmer/projects/ace.html.
[1] 王禹鸥,苑迎春,何振学,王克俭. 改进RoBERTa、多实例学习和双重注意力机制的关系抽取方法[J]. 山东大学学报 (工学版), 2025, 55(2): 78-87.
[2] 郑泾飞,廖永新,王华珍,何霆. 基于提及图和显式路径的文档级关系抽取方法[J]. 山东大学学报 (工学版), 2023, 53(6): 16-25.
[3] 亓晓燕,刘恒杰,侯秋华,刘啸宇,谭延超,王连成. 融合LSTM和SVM的钢铁企业电力负荷短期预测[J]. 山东大学学报 (工学版), 2021, 51(4): 91-98.
[4] 马昕,王雪. 基于Laplacian支持向量机和序列信息的microRNA-结合残基预测[J]. 山东大学学报 (工学版), 2020, 50(2): 76-82.
[5] 梁志祥,刘晓明,牟颖,刘玉田. 基于深度学习的新能源爬坡事件预测方法[J]. 山东大学学报 (工学版), 2019, 49(5): 24-28.
[6] 严云洋,张慧珍,刘以安,高尚兵. 基于GMM与三维LBP纹理的视频火焰检测[J]. 山东大学学报 (工学版), 2019, 49(1): 1-9.
[7] 李兴,侯振杰,梁久祯,常兴治. 基于线性加速度的多节点人体行为识别[J]. 山东大学学报 (工学版), 2018, 48(6): 56-66.
[8] 叶明全,高凌云,万春圆. 基于人工蜂群和SVM的基因表达数据分类[J]. 山东大学学报(工学版), 2018, 48(3): 10-16.
[9] 韩学山,王俊雄,孙东磊,李文博,张心怡,韦志清. 计及空间关联冗余的节点负荷预测方法[J]. 山东大学学报(工学版), 2017, 47(6): 7-12.
[10] 刘岩,李幼军,陈萌. 基于EMD和SVM的抑郁症静息态脑电信号分类研究[J]. 山东大学学报(工学版), 2017, 47(3): 21-26.
[11] 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42.
[12] 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15.
[13] 刘杰, 杨鹏, 吕文生, 刘阿古达木, 刘俊秀. 基于气象因素的PM2.5质量浓度预测模型[J]. 山东大学学报(工学版), 2015, 45(6): 76-83.
[14] 浩庆波, 牟少敏, 尹传环, 昌腾腾, 崔文斌. 一种基于聚类的快速局部支持向量机算法[J]. 山东大学学报(工学版), 2015, 45(1): 13-18.
[15] 李发权, 杨立才, 颜红博. 基于PCA-SVM多生理信息融合的情绪识别方法[J]. 山东大学学报(工学版), 2014, 44(6): 70-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[2] 孙玉利,李法德,左敦稳,戚美 . 直立分室式流体连续通电加热系统的升温特性[J]. 山东大学学报(工学版), 2006, 36(6): 19 -23 .
[3] 赵科军 王新军 刘洋 仇一泓. 基于结构化覆盖网的连续 top-k 联接查询算法[J]. 山东大学学报(工学版), 2009, 39(5): 32 -37 .
[4] 徐晓丹, 段正杰, 陈中育. 基于扩展情感词典及特征加权的情感挖掘方法[J]. 山东大学学报(工学版), 2014, 44(6): 15 -18 .
[5] 赵治广,王登杰,田云飞 . 基于灰色理论的路基沉降研究[J]. 山东大学学报(工学版), 2007, 37(3): 86 -88 .
[6] 龚毅光,龚异光,白俊杰,王宁生 . 基于SCA规范的DNC系统及其实现[J]. 山东大学学报(工学版), 2008, 38(1): 5 -8 .
[7] 王效杰,许先朋,王华贞,黄帅帅,李 平 . SARI算法安全性分析和改进方法[J]. 山东大学学报(工学版), 2007, 37(2): 93 -96 .
[8] 牟薪苇,谢绍斌,鞠占生 . 短波地空通信链路电磁计算与仿真[J]. 山东大学学报(工学版), 2007, 37(6): 71 -73 .
[9] 钱承山,吴庆宪,姜长生,王岩青 . 非线性系统模糊平滑切换多模型控制[J]. 山东大学学报(工学版), 2007, 37(3): 31 -35 .
[10] 李景龙,李术才,李树忱,王刚,孙克国 . 泰安抽水蓄能水电站地下厂房围岩稳定性数值模拟及监测分析[J]. 山东大学学报(工学版), 2008, 38(2): 77 -82 .