基于空洞卷积块架构的命名实体识别模型

doi:10.6040/j.issn.1672-3961.0.2021.304

摘要/Abstract

摘要：

受到空洞卷积的启发提出面向二维文本嵌入的列式空洞卷积，设计空洞卷积块架构，基于此架构提出命名实体识别模型并开展进一步试验。在命名实体识别试验中，提出的模型的精密度、召回率和F₁超越了其他基线模型，分别达到了0.918 7、0.879 4和0.898 6，表明空洞卷积块架构能够获取包含更多上下文信息的文本特征，从而支持模型对上下文长距离依赖特征的捕获和处理。感受野试验表明需要适当调整空洞率以减轻空洞卷积给模型带来的“网格效应”。提出的基于空洞卷积块架构能有效执行命名实体识别任务。

关键词: 命名实体识别, 空洞卷积块架构, 感受野, 神经网络, 深度学习

Abstract:

Inspired by the dilated convolution, a column-wise dilated convolution towards two dimensional text embedding was proposed and a dilated convolutional block architecture was designed. A named entity recognition model based on the architecture was built for further experiments. In the named entity recognition experiment, the model surpassed other baseline models in the metrics of precision, recall, and F₁ value, respectively reaching 0.918 7, 0.879 4, and 0.898 6, indicating that the dilated convolutional block architecture obtained features from context information, thereby supporting the extraction of the long-term dependency. The receptive field experiment showed that it was necessary to jointly adjust the dilation rate and the convolution kernel size to reduce the "gridding effect". The dilated convolutional block architecture proposed could effectively perform the task of named entity recognition.

Key words: named entity recognition, dilated convolutional block architecture, receptive field, neural network, deep learning

中图分类号:

TP391

袁钺,王艳丽,刘勘. 基于空洞卷积块架构的命名实体识别模型[J]. 山东大学学报 (工学版), 2022, 52(6): 105-114.

Yue YUAN,Yanli WANG,Kan LIU. Named entity recognition model based on dilated convolutional block architecture[J]. Journal of Shandong University(Engineering Science), 2022, 52(6): 105-114.

图/表 19

图1

图2

图3

图4

图5

图6

图7

图8

表1

表2

表3

表4

表5

图9

图10

表6

表7

表8

表9

参考文献 19

1	PANCHENDRARAJAN R, AMARESAN A. Bidirectional LSTM-CRF for named entity recognition[C]//Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation. Hong Kong, China: Association for Computational Linguistics, 2018: 531-540.
2	LI L , XU W , YU H . Character-level neural network model based on Nadam optimization and its application in clinical concept extraction[J]. Neurocomputing, 2020, 414 (16): 182- 190.
3	SHARMA R , MORWAL S , AGARWAL B , et al. A deep neural network-based model for named entity recognition for Hindi language[J]. Neural Computing and Applications, 2020, 32 (20): 16191- 16203. doi: 10.1007/s00521-020-04881-z
4	WU C , WU F , QI T , et al. Detecting entities of works for chinese chatbot[J]. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2020, 19 (6): 1- 13.
5	LI X , ZHANG H , ZHOU X H . Chinese clinical named entity recognition with variant neural structures based on BERT methods[J]. Journal of Biomedical Informatics, 2020, 107 (18): 103422.
6	JIA C, SHI Y, YANG Q, et al. Entity enhanced bert pre-training for chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Punta Cana, Dominica: Association for Computational Linguistics, 2020: 6384-6396.
7	HAN Y, YAN Y, HAN Y, et al. Chinese grammatical error diagnosis based on RoBERTa-BiLSTM-CRF model[C]//Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications. Suzhou, China: Association for Computational Lingu-istics, 2020: 97-101.
8	YANG Z, DAI Z, YANG Y, et al. Xlnet: generalized autoregressive pretraining for language understanding[C]// Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2019: 5753-5763.
9	DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: Association for Computational Linguistics, 2019: 4171-4186.
10	ZHANG Z, HAN X, LIU Z, et al. ERNIE: enhanced language representation with informative entities[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019: 1441-1451.
11	CHEN L C , PAPANDREOU G , KOKKINOS I , et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848.
12	WANG Z , JI S . Smoothed dilated convolutions for improved dense prediction[J]. Data Mining and Knowledge Discovery, 2021, 35 (4): 1- 27.
13	MEHTA S, RASTEGARI M, CASPI A, et al. Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018: 552-568.
14	STRUBELL E, VERGA P, BELANGER D, et al. Fast and accurate entity recognition with iterated dilated convolutions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017: 2670-2680.
15	KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA: Association for Computational Linguistics, 2014: 655-665.
16	GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, USA: AISTATS, 2011: 315-323.
17	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
18	WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Tahoe City, USA: IEEE, 2018: 1451-1460.
19	LIPPMANN R, CAMPBELL W, CAMPBELL J. An overview of the darpa data driven discovery of models (d3m) program[C]//Proceedings of 29th Conference on Neural Information Processing Systems. Barcelona, Spain: MIT Press, 2016: 1-2.

相关文章 15

[1]	李旭涛,杨寒玉,卢业飞,张玮. 基于深度学习的遥感图像道路分割[J]. 山东大学学报 (工学版), 2022, 52(6): 139-145.
[2]	王智伟,徐海超,郭相阳,马炯,褚云龙,陈前昌,卢治. 基于卷积神经网络和层次分析的新能源电源调频能力智能预测方法[J]. 山东大学学报 (工学版), 2022, 52(5): 70-76.
[3]	孟令灿,聂秀山,张雪. 基于遮挡目标去除的公交车拥挤度分类算法[J]. 山东大学学报 (工学版), 2022, 52(4): 83-88.
[4]	黄皓,周丽华,黄亚群,姜懿庭. 基于混合深度模型的虚假信息早期检测[J]. 山东大学学报 (工学版), 2022, 52(4): 89-98.
[5]	孟祥飞,张强,胡宴才,张燕,杨仁明. 欠驱动船舶自适应神经网络有限时间跟踪控制[J]. 山东大学学报 (工学版), 2022, 52(4): 214-226.
[6]	杨霄,袭肖明,李维翠,杨璐. 基于层次化双重注意力网络的乳腺多模态图像分类[J]. 山东大学学报 (工学版), 2022, 52(3): 34-41.
[7]	张学思,张婷,刘兆英,江天鹏. 基于轻量型卷积神经网络的海面红外显著性目标检测方法[J]. 山东大学学报 (工学版), 2022, 52(2): 41-49.
[8]	王心哲,邓棋文,王际潮,范剑超. 深度语义分割MRF模型的海洋筏式养殖信息提取[J]. 山东大学学报 (工学版), 2022, 52(2): 89-98.
[9]	尹旭,刘兆英,张婷,李玉鑑. 基于弱监督和半监督学习的红外舰船分割方法[J]. 山东大学学报 (工学版), 2022, 52(2): 99-106.
[10]	蒋桐雨,陈帆,和红杰. 基于非对称U型金字塔重建的轻量级人脸超分辨率网络[J]. 山东大学学报 (工学版), 2022, 52(1): 1-8, 18.
[11]	吴建清,宋修广. 同步定位与建图技术发展综述[J]. 山东大学学报 (工学版), 2021, 51(5): 16-31.
[12]	丁飞,江铭炎. 基于改进狮群算法和BP神经网络模型的房价预测[J]. 山东大学学报 (工学版), 2021, 51(4): 8-16.
[13]	尹晓敏,孟祥剑,侯昆明,陈亚潇,高峰. 一种计及空间相关性的光伏电站历史出力数据的修正方法[J]. 山东大学学报 (工学版), 2021, 51(4): 118-123.
[14]	杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21.
[15]	陶亮,刘宝宁,梁玮. 基于CNN-LSTM 混合模型的心律失常自动检测[J]. 山东大学学报 (工学版), 2021, 51(3): 30-36.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

新闻句子数量	平均句子长度(汉字字符)	最长句子长度(汉字字符)	最短句子长度(汉字字符)	人物标签数量	地点标签数量	组织标签数量
50 729	47	1476	5	19 588	39 394	21 902

是否有CRF层	模型	P	R	F₁
没有CRF层	Bi-LSTM-Softmax	0.750 2	0.715 0	0.732 2
	IDCNN-Softmax	0.769 9	0.770 9	0.770 4
	DCL-Bi-LSTM-Softmax	0.792 9	0.758 4	0.775 3
	DCBA-Bi-LSTM-Softmax	0.797 6	0.824 8	0.811 0
有CRF层	Bi-LSTM-CRF	0.869 9	0.808 5	0.838 0
	IDCNN-CRF	0.900 9	0.824 8	0.861 2
	DCL-Bi-LSTM-CRF	0.906 1	0.861 7	0.883 3
	DCBA-Bi-LSTM-CRF	0.918 7	0.879 4	0.898 6

是否有CRF层	模型	P	R	F₁
没有CRF层	Bi-LSTM-Softmax	0.764 5	0.770 7	0.767 6
	IDCNN-Softmax	0.797 8	0.783 8	0.790 7
	DCL-Bi-LSTM-Softmax	0.816 9	0.816 5	0.816 7
	DCBA-Bi-LSTM-Softmax	0.823 0	0.796 9	0.809 7
有CRF层	Bi-LSTM-CRF	0.846 0	0.817 0	0.831 3
	IDCNN-CRF	0.849 7	0.815 0	0.832 0
	DCL-Bi-LSTM-CRF	0.897 4	0.837 7	0.866 5
	DCBA-Bi-LSTM-CRF	0.889 1	0.824 1	0.855 3

是否有CRF层	模型	P	R	F₁
没有CRF层	Bi-LSTM-Softmax	0.543 3	0.622 1	0.580 0
	IDCNN-Softmax	0.631 8	0.676 9	0.653 6
	DCL-Bi-LSTM-Softmax	0.633 8	0.652 9	0.643 2
	DCBA-Bi-LSTM-Softmax	0.681 1	0.733 3	0.706 2
有CRF层	Bi-LSTM-CRF	0.755 6	0.731 8	0.743 5
	IDCNN-CRF	0.774 9	0.770 8	0.772 9
	DCL-Bi-LSTM-CRF	0.835 4	0.777 6	0.805 4
	DCBA-Bi-LSTM-CRF	0.835 3	0.815 2	0.825 1

是否有CRF层	模型	P	R	F₁
没有CRF层	Bi-LSTM-Softmax	0.704 4	0.712 9	0.708 6
	IDCNN-Softmax	0.747 1	0.754 8	0.751 0
	DCL-Bi-LSTM-Softmax	0.765 0	0.754 4	0.759 6
	DCBA-Bi-LSTM-Softmax	0.779 0	0.796 2	0.787 5
有CRF层	Bi-LSTM-CRF	0.837 0	0.794 7	0.815 3
	IDCNN-CRF	0.855 8	0.810 1	0.832 3
	DCL-Bi-LSTM-CRF	0.888 3	0.835 9	0.861 3
	DCBA-Bi-LSTM-CRF	0.891 0	0.847 9	0.868 9