您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2022, Vol. 52 ›› Issue (6): 105-114.doi: 10.6040/j.issn.1672-3961.0.2021.304

• 机器学习与数据挖掘 • 上一篇    下一篇

基于空洞卷积块架构的命名实体识别模型

袁钺1(),王艳丽2,刘勘2,*()   

  1. 1. 北京大学信息管理系,北京 100871
    2. 中南财经政法大学信息与安全工程学院,湖北 武汉 430073
  • 收稿日期:2021-06-09 出版日期:2022-12-20 发布日期:2022-12-23
  • 通讯作者: 刘勘 E-mail:yuangyue@qq.com;liukan@zuel.edu.cn
  • 作者简介:袁钺(1996—),男,湖北武汉人,博士研究生,主要研究方向为计算型情报分析、自然语言处理、深度学习。E-mail: yuangyue@qq.com
  • 基金资助:
    中央高校基本科研业务费交叉学科创新研究项目(2722021EK016)

Named entity recognition model based on dilated convolutional block architecture

Yue YUAN1(),Yanli WANG2,Kan LIU2,*()   

  1. 1. Department of Information Management, Peking University, Beijing 100871, China
    2. School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, Hubei, China
  • Received:2021-06-09 Online:2022-12-20 Published:2022-12-23
  • Contact: Kan LIU E-mail:yuangyue@qq.com;liukan@zuel.edu.cn

摘要:

受到空洞卷积的启发提出面向二维文本嵌入的列式空洞卷积,设计空洞卷积块架构,基于此架构提出命名实体识别模型并开展进一步试验。在命名实体识别试验中,提出的模型的精密度、召回率和F1超越了其他基线模型,分别达到了0.918 7、0.879 4和0.898 6,表明空洞卷积块架构能够获取包含更多上下文信息的文本特征,从而支持模型对上下文长距离依赖特征的捕获和处理。感受野试验表明需要适当调整空洞率以减轻空洞卷积给模型带来的“网格效应”。提出的基于空洞卷积块架构能有效执行命名实体识别任务。

关键词: 命名实体识别, 空洞卷积块架构, 感受野, 神经网络, 深度学习

Abstract:

Inspired by the dilated convolution, a column-wise dilated convolution towards two dimensional text embedding was proposed and a dilated convolutional block architecture was designed. A named entity recognition model based on the architecture was built for further experiments. In the named entity recognition experiment, the model surpassed other baseline models in the metrics of precision, recall, and F1 value, respectively reaching 0.918 7, 0.879 4, and 0.898 6, indicating that the dilated convolutional block architecture obtained features from context information, thereby supporting the extraction of the long-term dependency. The receptive field experiment showed that it was necessary to jointly adjust the dilation rate and the convolution kernel size to reduce the "gridding effect". The dilated convolutional block architecture proposed could effectively perform the task of named entity recognition.

Key words: named entity recognition, dilated convolutional block architecture, receptive field, neural network, deep learning

中图分类号: 

  • TP391

图1

增加卷积核尺寸或空洞率感受野变化示例"

图2

堆叠卷积层时标准卷积和空洞卷积感受野变化"

图3

空洞卷积块结构"

图4

二维文本嵌入与处理图像的空洞卷积"

图5

列式空洞卷积示例"

图6

空洞卷积块架构"

图7

基于空洞卷积块架构的命名实体识别模型"

图8

基于空洞卷积层的命名实体识别模型"

表1

数据集统计"

新闻句子数量 平均句子长度(汉字字符) 最长句子长度(汉字字符) 最短句子长度(汉字字符) 人物标签数量 地点标签数量 组织标签数量
50 729 47 1476 5 19 588 39 394 21 902

表2

命名实体识别试验结果(地点)"

是否有CRF层 模型 P R F1
没有CRF层 Bi-LSTM-Softmax 0.750 2 0.715 0 0.732 2
IDCNN-Softmax 0.769 9 0.770 9 0.770 4
DCL-Bi-LSTM-Softmax 0.792 9 0.758 4 0.775 3
DCBA-Bi-LSTM-Softmax 0.797 6 0.824 8 0.811 0
有CRF层 Bi-LSTM-CRF 0.869 9 0.808 5 0.838 0
IDCNN-CRF 0.900 9 0.824 8 0.861 2
DCL-Bi-LSTM-CRF 0.906 1 0.861 7 0.883 3
DCBA-Bi-LSTM-CRF 0.918 7 0.879 4 0.898 6

表3

命名实体识别试验结果(人物)"

是否有CRF层 模型 P R F1
没有CRF层 Bi-LSTM-Softmax 0.764 5 0.770 7 0.767 6
IDCNN-Softmax 0.797 8 0.783 8 0.790 7
DCL-Bi-LSTM-Softmax 0.816 9 0.816 5 0.816 7
DCBA-Bi-LSTM-Softmax 0.823 0 0.796 9 0.809 7
有CRF层 Bi-LSTM-CRF 0.846 0 0.817 0 0.831 3
IDCNN-CRF 0.849 7 0.815 0 0.832 0
DCL-Bi-LSTM-CRF 0.897 4 0.837 7 0.866 5
DCBA-Bi-LSTM-CRF 0.889 1 0.824 1 0.855 3

表4

命名实体识别试验结果(组织)"

是否有CRF层 模型 P R F1
没有CRF层 Bi-LSTM-Softmax 0.543 3 0.622 1 0.580 0
IDCNN-Softmax 0.631 8 0.676 9 0.653 6
DCL-Bi-LSTM-Softmax 0.633 8 0.652 9 0.643 2
DCBA-Bi-LSTM-Softmax 0.681 1 0.733 3 0.706 2
有CRF层 Bi-LSTM-CRF 0.755 6 0.731 8 0.743 5
IDCNN-CRF 0.774 9 0.770 8 0.772 9
DCL-Bi-LSTM-CRF 0.835 4 0.777 6 0.805 4
DCBA-Bi-LSTM-CRF 0.835 3 0.815 2 0.825 1

表5

命名实体识别试验结果(合计)"

是否有CRF层 模型 P R F1
没有CRF层 Bi-LSTM-Softmax 0.704 4 0.712 9 0.708 6
IDCNN-Softmax 0.747 1 0.754 8 0.751 0
DCL-Bi-LSTM-Softmax 0.765 0 0.754 4 0.759 6
DCBA-Bi-LSTM-Softmax 0.779 0 0.796 2 0.787 5
有CRF层 Bi-LSTM-CRF 0.837 0 0.794 7 0.815 3
IDCNN-CRF 0.855 8 0.810 1 0.832 3
DCL-Bi-LSTM-CRF 0.888 3 0.835 9 0.861 3
DCBA-Bi-LSTM-CRF 0.891 0 0.847 9 0.868 9

图9

使用CRF的模型学习曲线"

图10

使用Softmax的模型学习曲线"

表6

感受野试验设置及结果(设置1)"

模型 r S(高×宽×数量) 感受野 P R F1
DCL-Bi-LSTM-Softmax 1 10×50×1 10 0.642 8 0.646 2 0.644 5
DCBA-Bi-LSTM-Softmax 1 10×50×1 10 0.689 0 0.721 7 0.705 0

表7

感受野试验设置及结果(设置2)"

模型 r S(高×宽×数量) 感受野 P R F1
DCL-Bi-LSTM-Softmax 2 10×50×1 19 0.658 0 0.646 5 0.652 2
DCBA-Bi-LSTM-Softmax 2 10×50×1 19 0.723 4 0.744 2 0.733 6

表8

感受野试验设置及结果(设置3)"

模型 r S(高×宽×数量) 感受野 P R F1
DCL-Bi-LSTM-Softmax 4 10×50×1 37 0.673 2 0.657 5 0.665 3
DCBA-Bi-LSTM-Softmax 4 10×50×1 37 0.754 1 0.756 1 0.755 1

表9

感受野试验设置及结果(设置4)"

模型 r S(高×宽×数量) 感受野 P R F1
DCL-Bi-LSTM-Softmax 8 10×50×1 73 0.650 2 0.643 7 0.647 0
DCBA-Bi-LSTM-Softmax 8 10×50×1 73 0.753 2 0.751 6 0.752 4
1 PANCHENDRARAJAN R, AMARESAN A. Bidirectional LSTM-CRF for named entity recognition[C]//Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation. Hong Kong, China: Association for Computational Linguistics, 2018: 531-540.
2 LI L , XU W , YU H . Character-level neural network model based on Nadam optimization and its application in clinical concept extraction[J]. Neurocomputing, 2020, 414 (16): 182- 190.
3 SHARMA R , MORWAL S , AGARWAL B , et al. A deep neural network-based model for named entity recognition for Hindi language[J]. Neural Computing and Applications, 2020, 32 (20): 16191- 16203.
doi: 10.1007/s00521-020-04881-z
4 WU C , WU F , QI T , et al. Detecting entities of works for chinese chatbot[J]. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2020, 19 (6): 1- 13.
5 LI X , ZHANG H , ZHOU X H . Chinese clinical named entity recognition with variant neural structures based on BERT methods[J]. Journal of Biomedical Informatics, 2020, 107 (18): 103422.
6 JIA C, SHI Y, YANG Q, et al. Entity enhanced bert pre-training for chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Punta Cana, Dominica: Association for Computational Linguistics, 2020: 6384-6396.
7 HAN Y, YAN Y, HAN Y, et al. Chinese grammatical error diagnosis based on RoBERTa-BiLSTM-CRF model[C]//Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications. Suzhou, China: Association for Computational Lingu-istics, 2020: 97-101.
8 YANG Z, DAI Z, YANG Y, et al. Xlnet: generalized autoregressive pretraining for language understanding[C]// Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2019: 5753-5763.
9 DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: Association for Computational Linguistics, 2019: 4171-4186.
10 ZHANG Z, HAN X, LIU Z, et al. ERNIE: enhanced language representation with informative entities[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019: 1441-1451.
11 CHEN L C , PAPANDREOU G , KOKKINOS I , et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848.
12 WANG Z , JI S . Smoothed dilated convolutions for improved dense prediction[J]. Data Mining and Knowledge Discovery, 2021, 35 (4): 1- 27.
13 MEHTA S, RASTEGARI M, CASPI A, et al. Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018: 552-568.
14 STRUBELL E, VERGA P, BELANGER D, et al. Fast and accurate entity recognition with iterated dilated convolutions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017: 2670-2680.
15 KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA: Association for Computational Linguistics, 2014: 655-665.
16 GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, USA: AISTATS, 2011: 315-323.
17 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
18 WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Tahoe City, USA: IEEE, 2018: 1451-1460.
19 LIPPMANN R, CAMPBELL W, CAMPBELL J. An overview of the darpa data driven discovery of models (d3m) program[C]//Proceedings of 29th Conference on Neural Information Processing Systems. Barcelona, Spain: MIT Press, 2016: 1-2.
[1] 黄芳,王欣,高国海,沈玲珍,付勋,方宇. 融合主客观评价的图数据Top-k频繁模式挖掘[J]. 山东大学学报 (工学版), 2025, 55(6): 1-12.
[2] 王禹鸥,苑迎春,何振学,何晨. 融合多特征和多头自注意力机制的高校学业命名实体识别[J]. 山东大学学报 (工学版), 2025, 55(6): 35-44.
[3] 邵孟伟,袁世飞,周宏志,王乃华. 基于BP神经网络和遗传算法的翅片管结构优化[J]. 山东大学学报 (工学版), 2025, 55(6): 76-82.
[4] 李常刚,李宝亮,曹永吉,王佳颖. 人工智能在电力系统潮流计算中的应用综述及展望[J]. 山东大学学报 (工学版), 2025, 55(5): 1-17.
[5] 邓彬, 张宗包, 赵文猛, 罗新航, 吴秋伟. 基于云边协同和图神经网络的电动汽车充电站负荷预测方法[J]. 山东大学学报 (工学版), 2025, 55(5): 62-69.
[6] 周群颖,隋家成,张继,王洪元. 基于自监督卷积和无参数注意力机制的工业品表面缺陷检测[J]. 山东大学学报 (工学版), 2025, 55(4): 40-47.
[7] 薛冰冰,王勇,杨维浩,王川,于迪,王旭. 基于ETC收费数据的高速公路交通流数据修复及实时预测[J]. 山东大学学报 (工学版), 2025, 55(3): 58-71.
[8] 董明书,陈俐企,马川义,张珠皓,孙仁娟,管延华,庄培芝. 沥青路面内部裂缝雷达图像智能判识算法研究[J]. 山东大学学报 (工学版), 2025, 55(3): 72-79.
[9] 贾轩,许吉凯,任艺婧,刘德才,许强,张利. 基于样本扩容和数据驱动的台区理论线损计算方法[J]. 山东大学学报 (工学版), 2025, 55(3): 158-164.
[10] 祝明,石承龙,吕潘,刘现荣,孙驰,陈建城,范宏运. 基于优化长短时记忆网络的深基坑变形预测方法及其工程应用[J]. 山东大学学报 (工学版), 2025, 55(3): 141-148.
[11] 李伟豪,王苹苹,许万博,魏本征. 结构先验引导的多模态腰椎MRI图像分割算法[J]. 山东大学学报 (工学版), 2025, 55(1): 66-76.
[12] 孙尚渠,张恭禄,蒋志斌,李朝阳. 盾构滚刀磨损的影响因素敏感性分析及预测[J]. 山东大学学报 (工学版), 2025, 55(1): 86-96.
[13] 林振宇,邵蓥侠. 基于盖根堡多项式最佳平方近似的谱图网络[J]. 山东大学学报 (工学版), 2024, 54(5): 93-100.
[14] 常新功,苏敏惠,周志刚. 基于进化集成的图神经网络解释方法[J]. 山东大学学报 (工学版), 2024, 54(4): 1-12.
[15] 葛一飞,艾孜尔古丽,陈德刚. 融合数据增强和知识迁移的汉维跨语言命名实体识别[J]. 山东大学学报 (工学版), 2024, 54(4): 67-75.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李 侃 . 嵌入式相贯线焊接控制系统开发与实现[J]. 山东大学学报(工学版), 2008, 38(4): 37 -41 .
[2] 施来顺,万忠义 . 新型甜菜碱型沥青乳化剂的合成与性能测试[J]. 山东大学学报(工学版), 2008, 38(4): 112 -115 .
[3] 来翔 . 用胞映射方法讨论一类MKdV方程[J]. 山东大学学报(工学版), 2006, 36(1): 87 -92 .
[4] 余嘉元1 , 田金亭1 , 朱强忠2 . 计算智能在心理学中的应用[J]. 山东大学学报(工学版), 2009, 39(1): 1 -5 .
[5] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[6] 王波,王宁生 . 机电装配体拆卸序列的自动生成及组合优化[J]. 山东大学学报(工学版), 2006, 36(2): 52 -57 .
[7] 张英,郎咏梅,赵玉晓,张鉴达,乔鹏,李善评 . 由EGSB厌氧颗粒污泥培养好氧颗粒污泥的工艺探讨[J]. 山东大学学报(工学版), 2006, 36(4): 56 -59 .
[8] Yue Khing Toh1 , XIAO Wendong2 , XIE Lihua1 . 基于无线传感器网络的分散目标跟踪:实际测试平台的开发应用(英文)[J]. 山东大学学报(工学版), 2009, 39(1): 50 -56 .
[9] 孙炜伟,王玉振. 考虑饱和的发电机单机无穷大系统有限增益镇定[J]. 山东大学学报(工学版), 2009, 39(1): 69 -76 .
[10] 赵然杭,陈守煜 . 水资源数量与质量联合评价理论模型研究[J]. 山东大学学报(工学版), 2006, 36(3): 46 -50 .