您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (2): 22-26.doi: 10.6040/j.issn.1672-3961.1.2014.259

• 机器学习与数据挖掘 • 上一篇    下一篇

一种基于树核函数的半监督关系抽取方法研究

刘晓勇   

  1. 广东技术师范学院计算机科学学院, 广东 广州 510665
  • 收稿日期:2014-03-26 修回日期:2014-10-15 出版日期:2015-04-20 发布日期:2014-03-26
  • 作者简介:刘晓勇(1979-),男,河南信阳人,副教授,博士,主要研究方向为数据挖掘与智能优化算法.E-mail:lxyong420@126.com
  • 基金资助:
    广东高校优秀青年教师培养计划资助项目(Yq2013108)

A semi-supervised method based on tree kernel for relationship extraction

LIU Xiaoyong   

  1. School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, Guangdong, China
  • Received:2014-03-26 Revised:2014-10-15 Online:2015-04-20 Published:2014-03-26

摘要: 为了解决传统的半监督关系抽取算法易产生的"语义变异"问题,提出一种新的基于树核函数的半监督关系抽取算法。该算法主要采用树核函数和种子集约束扩展两个策略,弱化"语义变异"现象带来的关系抽取不够准确的问题,提高关系识别的正确率。在基准数据集PopBank上的试验研究表明,提出的使用约束机制扩充种子集的半监督学习方法在4个评价指标上(Precision, Recall, F-measure, Accuracy)均优于常用的两种关系抽取方法,从而验证了该算法与其他算法相比能够具有较好的关系抽取能力。

关键词: 支持向量机, 语义变异, 树核函数, 关系抽取, 半监督方法

Abstract: It was difficult for traditional semi-supervised relation extraction methods to solve "semantic variation" problem. A new semi-supervised relation extraction algorithm based on ensemble learning was prorosed and named L-EC-RE, which used two strategies, one was tree kernel and the other was constrained extension seed set. Experimental study on PopBank benchmark data sets showed that L-EC-RE had better performance than two usual relation extraction algorithms in four assessment criteria, which were Precision, Recall, F-measure and Accuracy.

Key words: relationship extraction, semi-supervised method, semantic variation, tree kernel, support vector machine

中图分类号: 

  • TP181
[1] MONCECCHI G, MINEL J L, WONSEVER D.A survey of kernel methods for relation extraction[C]//Proceedings of Workshop on NLP and Web-based technologies. Bahía Blanca, Argentine:Springer, 2010:1-9.
[2] ZHANG Z. Weakly-supervised relation classification for information extraction[C]//Proceedings of the thirteenth ACM international conference on Information and knowledge management. Washington D C, USA:ACM, 2004:581-588.
[3] CHEN J, JI D, TAN C L, et al. Relation extraction using label propagation based semi-supervised learning[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Sydney, Australia:Association for Computational Linguistics, 2006:129-136.
[4] CHEN J, JI D, TAN C L, et al. Semi-supervised relation extraction with label propagation[C]//Proceedings of the Human Language Technology Conference of the NAACL. New York, USA:Association for Computational Linguistics, 2006:25-28.
[5] QIAN L, ZHOU G, KONG F, et al. Semi-supervised learning for semantic relation classification using stratified sampling strategy[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: Association for Computational Linguistics, 2009:1437-1445.
[6] ROZENFELD B, FELDMAN R. Self-supervised relation extraction from the Web[J]. Knowledge and Information Systems, 2008, 17(1):17-33.
[7] GREENWOOD M A, STEVENSON M. Improving semi-supervised acquisition of relation extraction patterns[C]//Proceedings of the Workshop on Information Extraction Beyond the Document. Sydney, Australia:Association for Computational Linguistics, 2006:29-35.
[8] XU F Y, USZKOREIT H, LI H. A seed-driven bottom-up machine learning framework for extracting relations of various complexity[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic:Association for Computational Linguistics, 2007:584-591.
[9] XU F Y, USZKOREIT H, LI Hong, et al. Adaptation of Relation Extraction Rules to New Domains[C]//Proceedings of the Poster Session of the Sixth International Conference on Language Resources and Evaluation. Marrakech, Morocco: European Language Resources Association, 2008:2446-2450.
[10] USZKOREIT H, XU F Y, LI H. Analysis and Improvement of Minimally Supervised Machine Learning for Relation Extraction[M]//HORACEK H, METAIS E, MUNOZ R, et al. Natural Language Processing and Information Systems. Berlin:Springer-Verlag Berlin, 2010:8-23.
[11] XU F Y, USZKOREIT Hanz, SEBASTIAN Krause, et al. Boosting Relation Extraction with Limited Closed-World Knowledge[C]//Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China:Association for Computational Linguistics, 2010:1354-1362.
[12] 何婷婷, 徐超, 李晶. 基于种子自扩展的命名实体关系抽取方法[J]. 计算机工程, 2006, 32(21):183-184. HE Tingting, XU Chao, LI Jing. Named entity relation extraction method based on seed self-expansion[J]. Computer Engineering, 2006, 32(21):183-184.
[13] 陈锦秀, 姬东鸿. 基于图的半监督关系抽取[J]. 软件学报, 2008, 19(11):2843-2852. CHEN Jinxiu, JI Donghong. Graph-based semi-supervised relation extraction[J]. Journal of Software, 2008, 19(11):2843-2852.
[14] 崔宝今,林鸿飞,张霄.基于半监督学习的蛋白质关系投取研究[J]. 山东大学学报:工学版,2009,39(3):16-21. CUI Baojin,LIN Hongfei,ZHANG Xiao. Research of protein-protein interaction extraction based on semi-supervised learning[J].Journal of Shandong University:Engineering Science, 2009, 39(3):16-21.
[15] 王艳华,杨志豪,李彦鹏,等. 基于监督学习和半监督学习的蛋白质关系抽取[J]. 江西师范大学学报:自然科学版,2013,37(4):392-396. WANG Yanhua, YANG Zhihao, LI Yanpeng, et al. Protein interaction extraction based on the combination of supervised and semi-supervised learning method[J].Journal of Jiangxi Normal University:Natural Science Edition, 2013, 37(4):392-396.
[16] 王艳华. 面向生物医学领域的信息抽取研究[D].大连:大连理工大学,2013:12-22. WANG Yanhua. A study of information extraction for biomedical field[D].Dalian: Dalian University of Technology, 2013:12-22.
[17] 陈立玮,冯岩松,赵东岩. 基于弱监督学习的海量网络数据关系抽取[J]. 计算机研究与发展,2013, 50(9):1825-1835. CHEN Liwei, FENG Yansong, ZHAO Dongyan. Extracting relations from the web via weakly suprevised learning[J].Journal of Computer Research and Development, 2013, 50(9):1825-1835.
[18] 程显毅,朱倩. 未定义类型的关系抽取的半监督学习框架研究[J]. 南京大学学报:自然科学版, 2012,48(4):466-474. CHENG Xianyi, ZHU Qian. A study of relation extraction of undefined relation type based on semi-supervised learning framework[J].Journal of Nanjing University:Natural Sciences, 2012, 48(4):466-474.
[19] ABNEY S. Bootstrpping[C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA:Association for Computational Linguistics, 2002:360-367.
[20] CURRAN J R, MURPHY T, SCHOLZ B. Minimising semantic drift with Mutual Exclusion Bootstrapping[C]//Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics. Melbourne, Australia:Pacic Association for Computational Linguistics, 2007:172-180.
[21] COLLINS M, DUFFY N, PARK F. New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA:Association for Computational Linguistics, 2002:263-270.
[22] MOSCHITTI A. A study on convolution kernels for shallow semantic parsing[C]//Proceedings of the 42th Conference on Association for Computational Linguistic.Barcelona, Spain:Association for Computational Linguistics, 2004:335-342.
[23] MOSCHITTI A. Making tree kernels practical for natural language learning[C]//Proceedings of the 11th International Conference on European Association for Computational Linguistics(EACL). Trento, Italy:Association for Computer Linguistics, 2006:113-120.
[24] VISHWANATHAN S V N, SMOLA A J. Fast kernels for string and tree matching[C]//Proceedings of 18th Annual Conference on Neural Information Processing Systems. Quebec, Canada:[s.n.], 2004:113-130.
[25] Department of Linguistics, University of Colorado Boulder. PopBank[DB/OL]. (2012-06-24)[2014-03-26].http://verbs.colorado.edu/~mpalmer/projects/ace.html.
[1] 叶明全,高凌云,万春圆. 基于人工蜂群和SVM的基因表达数据分类[J]. 山东大学学报(工学版), 2018, 48(3): 10-16.
[2] 韩学山,王俊雄,孙东磊,李文博,张心怡,韦志清. 计及空间关联冗余的节点负荷预测方法[J]. 山东大学学报(工学版), 2017, 47(6): 7-12.
[3] 刘岩,李幼军,陈萌. 基于EMD和SVM的抑郁症静息态脑电信号分类研究[J]. 山东大学学报(工学版), 2017, 47(3): 21-26.
[4] 李素姝,王士同,李滔. 基于LS-SVM与模糊补准则的特征选择方法[J]. 山东大学学报(工学版), 2017, 47(3): 34-42.
[5] 徐庆, 段利国, 李爱萍, 阴桂梅. 基于实体词语义相似度的中文实体关系抽取[J]. 山东大学学报(工学版), 2015, 45(6): 7-15.
[6] 刘杰, 杨鹏, 吕文生, 刘阿古达木, 刘俊秀. 基于气象因素的PM2.5质量浓度预测模型[J]. 山东大学学报(工学版), 2015, 45(6): 76-83.
[7] 浩庆波, 牟少敏, 尹传环, 昌腾腾, 崔文斌. 一种基于聚类的快速局部支持向量机算法[J]. 山东大学学报(工学版), 2015, 45(1): 13-18.
[8] 邵发, 黄银阁, 周兰江, 郭剑毅, 余正涛, 张金鹏. 基于实体消歧的中文实体关系抽取[J]. 山东大学学报(工学版), 2014, 44(6): 32-37.
[9] 李发权, 杨立才, 颜红博. 基于PCA-SVM多生理信息融合的情绪识别方法[J]. 山东大学学报(工学版), 2014, 44(6): 70-76.
[10] 周咏梅1,杨佳能2,阳爱民2. 面向文本情感分析的中文情感词典构建方法[J]. 山东大学学报(工学版), 2013, 43(6): 27-33.
[11] 王昊,华继学,范晓诗. 基于双联支持向量机的入侵检测技术[J]. 山东大学学报(工学版), 2013, 43(6): 53-56.
[12] 施珺,朱敏. 一种基于灰色系统和支持向量机的预测优化模型[J]. 山东大学学报(工学版), 2012, 42(5): 7-11.
[13] 赵加敏,冯爱民*,刘学军. 局部密度嵌入的结构单类支持向量机[J]. 山东大学学报(工学版), 2012, 42(4): 13-18.
[14] 潘冬寅,朱发,徐昇,业宁*. 结肠癌基因表达谱的特征选取研究[J]. 山东大学学报(工学版), 2012, 42(2): 23-29.
[15] 孙鹏,程世庆*,谢敬思,张海瑞. 预测混合生物质灰熔点的CV-GA-SVM模型[J]. 山东大学学报(工学版), 2012, 42(2): 108-111.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!