山东大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (6): 7-15.doi: 10.6040/j.issn.1672-3961.2.2015.085
徐庆1, 段利国1, 李爱萍1,2, 阴桂梅3
XU Qing1, DUAN Liguo1, LI Aiping1,2, YIN Guimei3
摘要: 为了探索语义相似度在中文实体关系抽取上的作用,提出由实体词在《同义词词林》中的5层编码构建成的《同义词词林》编码树和由关系实例中的实体词,各个类别中所有实体词计算相似度后求得的平均值构建成的实体词语义相似度树2种新特征,并连同已有的《同义词词林》编码、实体类型信息共4种特征探究其对抽取性能的影响。单一特征的试验中,实体类型特征效果最好,F值达到了小类84.9、大类83.2;组合特征的试验中,实体类型和《同义词词林》编码树的组合特征效果最好,大类小类的F值都比实体类型特征提高了2.5,3种组合特征性能不升反降。试验结果表明《同义词词林》编码树是对实体类型的有效补充,但过多的特征会造成信息冗余,使抽取性能下降。
中图分类号:
[1] 秦兵,刘安安,刘挺. 无指导的中文开放式实体关系抽取[J]. 计算机研究与发展,2015,52(5):1029-1035. QIN Bin, LIU Anan, LIU Ting. Unsupervised Chinese open entity relation extraction[J]. Journal of Computer Research and Development, 2015, 52(5):1029-1035. [2] 贾真,何大可,尹红风,等. 基于无监督学习的部分-整体关系获取[J]. 西南交通大学学报,2014, 49(4):590-596. JIA Zhen, HE Dake, YIN Fongfeng, et al. Acquisition of part-whole relations based on unsupervised learning[J]. Journal of Southwest Jiaotong University, 2014, 49(4):590-596. [3] 杨博,蔡东风,杨华. 开放式信息抽取研究进展[J]. 中文信息学报,2014,28(4):1-11. YANG Fu, CAI Dongfeng, YANG Hua. Progress in open information extraction[J]. Journal of Chinese Information Processing, 2014, 28(4):1-11. [4] 李付民,杨静,贺樑. 基于中文句法结构的关系挖掘[J]. 计算机工程,2014,40(7):143-147. LI Fumin, YANG Jing, HE Liang. Relation extraction based on Chinese syntactic structure[J]. Computer Engineering, 2014, 40(7):143-147. [5] 刘琦,肖仰华,汪卫. 一种面向海量中文文本的典型类属关系识别方法[J]. 计算机工程,2015,41(2):26-30. LIU Qi, XIAO Yanghua, WANG Wei. A Recognition approach of typical generic relationship for massive Chinese text[J]. Computer Engineering, 2015, 41(2):26-30. [6] 张苇如,孙乐,韩先培. 基于维基百科和模式聚类的实体关系抽取方法[J]. 中文信息学报,2012,26(2):75-81. ZHANG Weiru, SUN Le, HAN Xianpei. A entity relation extraction method based on Wikepadia and pattern clustering[J]. Journal of Chinese Information Processing, 2012, 26(2):75-81. [7] 车万翔,刘挺,李生. 实体关系自动抽取[J]. 中文信息学报,2005,19(2):1-6. CHE Wanxiang, LIU Ting, LI Sheng. Automatic entity relation extraction[J]. Journal of Chinese Information Processing, 2005, 19(2):1-6. [8] 徐健,张智雄,吴振新. 实体关系抽取的技术方法综述[J]. 现代图书情报技术,2008(8):18-23. XU Jian, ZHANG Zhixiong, WU Zhenxin. Review on techniques of entity relation extraction[J]. Xiandai Tushu Qingbao Jishu, 2008(8):18-23. [9] 欧阳丹彤,瞿剑峰,叶育鑫. 关系抽取中基于本体的远监督样本扩充[J]. 软件学报,2014,25(9):2088-2101. OUYANG Dantong, ZHAI Jianfeng, YE Yuxin. Extending training set in distant supervision by ontology for relation extraction[J]. Journal of Software, 2014, 25(9):2088-2101. [10] 贾真,何大可,杨燕, 等. 基于弱监督学习的中文网络百科关系抽取[J]. 智能系统学报,2015,10(1):113-119. JIA Zhen, HE Dake, YANG Yang, et al. Relation extraction from Chinese online encyclopedia based on weakly supervised learning[J]. CAAL Transactions on Intelligent Systems, 2015, 10(1):113-119. [11] 朱苏阳,惠浩添,钱龙华,等. 基于自监督学习的维基百科家庭关系抽取[J]. 计算机应用,2015,35(4):1013-1016. ZHU Suyang, HUI Haotian, QIAN Longhua, et al. Family relation extraction from Wikipedia by self-supervised learning[J]. Journal of Computer Applications, 2015, 35(4):1013-1016. [12] 董静,孙乐,冯元勇,等. 中文实体关系抽取中的特征选择研究[J]. 中文信息学报,2007,21(4):80-85. DONG Jing, SUN Le, FENG Yuanyong, et al. Chinese automatic entity relation extraction[J]. Journal of Chinese Information Processing, 2007, 21(4):80-85. [13] 刘路,李弼程,张先飞. 基于正反例训练的SVM命名实体关系抽取[J]. 计算机应用,2008,28(6):1444-1446. LIU Lu, LI Bicheng, ZHANG Xianfei. Named entity relation extraction based on SVM training by positive and negative cases[J]. Computer Applications, 2008, 28(6):1444-1446. [14] 刘克彬,李芳,刘磊,等. 基于核函数中文关系自动抽取系统的实现[J]. 计算机研究与发展,2007,44(8):1406-1411. LIU Kebin, LI Fang, LIU Lei, et al. Implementation of a kernel-based Chinese relation extraction system[J]. Journal of Computer Research and Development, 2007, 44(8):1406-1411. [15] 郭喜跃,何婷婷,胡小华,等. 基于句法语义特征的中文实体关系抽取[J]. 中文信息学报,2014,28(6):183-189. GUO Xiyue, HE Tingting, HU Xiaohua, et al. Chinese named entity relation extraction based on syntactic and semantic features[J]. Journal of Chinese Information Processing, 2014, 28(6):183-189. [16] QIAN Longhua, ZHOU Guodong, ZHU Qiaoming. Employing constituent dependency information for tree kernel-based semantic relation extraction between named entities[J]. ACM Transactions on Asian Language Information Processing(TALIP), 2011, 10(3):15:1-15:24. [17] QIAN Longhua, ZHOU Guodong, KONG Fang. Exploiting constituent dependencies for tree kernel-based semantic relation extraction[J]. ACM Transaction on Asian Language Information Processing, 2011, 10(3):697-704. [18] ZHANG M, ZHANG J, SU J, et al. D. A composite kernel to extract relations between entities with both flat and structured features[C]//Proceedings of COLING-ACL. Sydney, Australia, Association for Computational Linguistics Stroudsburg, 2006:825-832. [19] 刘丹丹,彭成,钱龙华, 等. 词汇语义信息对中文实体关系抽取影响的比较[J]. 计算机应用,2012,32(8):2238-2244. LIU Dandan, PENG Cheng, QIAN Longhua, et al. Comparative analysis of impact of lexical semantic information on Chinese entity relation extraction[J]. Journal of Computer Applications, 2012, 32(8):2238-2244. [20] 梅家驹,竺一鸣,高蕴琦, 等. 编纂汉语类义词典的尝试-《同义词词林》简介[J]. 辞书研究,1983,01:133-138. MEI Jiaju, ZHU Yiming, GAO Yunqi, et al. The introduction of TongYiCi CiLin[J]. Lexicographical Studies, 1983, 01:133-138. [21] 刘丹丹,彭成,钱龙华, 等. 《同义词词林》在中文实体关系抽取中的作用[J]. 中文信息学报,2014,28(2):91-99. LIU Dandan, PENG Cheng, QIAN Longhua, et al. The effect of TongYiCi CiLin in Chinese entity relation extraction[J]. Journal of Chinese Information Processing, 2014, 28(2):91-99. [22] 田久乐,赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报:信息科学版,2010,26(6):602-608. TIAN Jiule, ZHAO Wei. Word similarity algorithm based on Tongyici Cilin in semantic web adaptive learning system[J]. Journal of Jilin University:Information Science Editon, 2010, 26(6):602-608. [23] 陈鹏,郭剑毅,余正涛, 等. 融合领域知识短语树核函数的中文领域实体关系抽取[J]. 南京大学学报:自然科学版,2015,51(1):181-186. CHEN Peng, GUO Jianyi, YU Zhengtao, et al. Chinese domain entity relation extraction based on domain knowledge phrasal tree[J]. Journal of Nanjing University:Natural Sciences, 2015, 51(1):181-186. [24] 刘志刚,李德仁,秦前清, 等. 支持向量机在多类分类问题中的推广[J]. 计算机工程与应用,2004,40(7):10-13. LIU Zhigang, LI Deren, QIN Qianqing, et al. An analytical overview of methods for multi-category support vertor machines[J]. Computer Engineering and Applications, 2004, 40(7):10-13. [25] 虞欢欢,钱龙华,周国栋,等. 基于合一句法和实体语义树的中文语义关系抽取[J]. 中文信息学报,2010,24(5):17-23. YU Huanhuan, QIAN Longhua, ZHOU Guodong, et al. Chinese semantic relation extraction based on unified syntactic and entity semantic tree[J]. Journal of Chinese Information Processing, 2010, 24(5):17-23. |
[1] | 林江豪,周咏梅,阳爱民,陈锦. 基于词向量的领域情感词典构建[J]. 山东大学学报(工学版), 2018, 48(3): 40-47. |
[2] | 钱肃驰, 彭甫镕, 陆建峰. 基于语义相似度的标签优化[J]. 山东大学学报(工学版), 2015, 45(2): 37-42. |
[3] | 刘晓勇. 一种基于树核函数的半监督关系抽取方法研究[J]. 山东大学学报(工学版), 2015, 45(2): 22-26. |
[4] | 尹坤,尹红风*,杨燕,贾真. 基于SimRank的百度百科词条语义相似度计算[J]. 山东大学学报(工学版), 2014, 44(3): 29-35. |
|