基于知识蒸馏的自适应多领域情感分析

doi:10.6040/j.issn.1672-3961.0.2020.249

摘要/Abstract

摘要： 提出一种自适应多领域知识蒸馏框架,有效地加速推理和减小模型参数同时确保模型性能,采用知识蒸馏方法对情感分析问题进行研究。针对每个特定领域进行知识蒸馏,模型蒸馏涉及词嵌入层蒸馏、编码层蒸馏(注意力蒸馏、隐藏状态蒸馏)、输出预测层蒸馏等多个方面;针对不同领域,学生模型保持相同的编码器,即共享权重,通过不同的领域特定输出层拟合不同的教师模型。在多个公开数据集上的试验结果表明,单领域知识蒸馏使得模型准确度平均提升2.39%,多领域知识蒸馏使得模型准确度平均提升0.5%。与单领域的知识蒸馏相比,该框架增强了学生模型的泛化能力,提升了性能。

关键词: 知识蒸馏, 自适应, 多领域, 情感分析, 深度学习

Abstract: An adaptive multi-domain knowledge distillation framework was proposed, which effectively accelerated reasoning and reduced model parameters while ensuring model performance. The knowledge distillation method was used to study sentiment analysis problems. When performing knowledge distillation for each specific field, model distillation involved word embedding layer distillation, coding layer distillation(attention distillation, hidden state distillation), output prediction layer distillation and other aspects of distillation, in order to learn all aspects knowledge from the specific field teacher model. Selectively learning the importance of the teacher model corresponding to different fields to the data was proposed, which further improved the accuracy of the prediction results. The experimental results on multiple public datasets showed that after single-domain knowledge distillation increased the model accuracy by an average of 2.39%, while multi-domain knowledge distillation increased the model accuracy by an average of 0.5%. Compared with the knowledge distillation of a single domain, this framework enhanced the generalization ability of the student model and improved the performance.

Key words: knowledge distillation, adaptive, multi-domain, sentiment analysis, deep learning

中图分类号:

TP391

杨修远,彭韬,杨亮,林鸿飞. 基于知识蒸馏的自适应多领域情感分析[J]. 山东大学学报 (工学版), 2021, 51(3): 15-21.

YANG Xiuyuan, PENG Tao, YANG Liang, LIN Hongfei. Adaptive multi-domain sentiment analysis based on knowledge distillation[J]. Journal of Shandong University(Engineering Science), 2021, 51(3): 15-21.

参考文献 24

[1]	MATTHEW E, MARK N, MOHIT I, et al. Deep contextualized word representations[C] // Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2018: 2227-2237.
[2]	DEVLIN J, CHANG M, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2019: 4171-4186.
[3]	YANG Z, DAI Z, YANG Y, et al. Xlnet: generalized autoregressive pretraining for language underst-anding[C] //Proceedings of NeurlPS. New York, USA: MIT Press, 2019: 5753-5763.
[4]	JOSHI M, CHEN D, LIU Y, et al. Spanbert: improving pre-training by representing and predicting spans[J]. Transactions of the Association for Comp-utational Linguistics, 2019, 8(1): 64-77.
[5]	WANG A, AMANPREET S, JULIAN M, et al. Glue: a multi-task benchmark and analysis platform for natural language understanding[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing(EMNLP). Brussels, Belgium: ACL, 2018: 353-355.
[6]	DING M, ZHOU C, CHEN Q, et al. Cognitive graph for multi-hop reading comprehension at scale [C] //Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy: ACL, 2019: 2694-2703.
[7]	KOVALEVA O, ROMANOV A, ROGERS A, et al. Revealing the dark secrets of bert[C] //Proceedings of EMNLP-IJCNLP. Hong Kong, China: ACL, 2019: 4355-4365.
[8]	RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: imagenet classification using binary convolu-tional neural networks[C] //European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016: 525-542.
[9]	HANG S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural network [C] //Proceedings of Neural Information Processing Systems(NeurIPS). New York, USA: MIT Press, 2015: 1135-1143.
[10]	LI J, ZHAO R, HUANG J, et al. Learning small-size DNN with output-distribution-based criteria[C] //Proceedings of Interspeech. Lyon, France: Interspeech, 2014:1910-1914.
[11]	HUANG G, LIU Z, VAN D, et al. Densely connected convolutional networks[C] //Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 4700-4708.
[12]	YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C] //Proceedings of CVPR. Hawaii, USA: IEEE, 2017: 7130-7138.
[13]	WOO S, PARK J, LEE J Y, et al. Cbam: con-volutional block attention module[C] //Proceedings of the European Conference on Computer Vision(ECCV). Munich, Germany: Springer, 2018: 3-19.
[14]	FURLANELLO T, LIPTON Z, TSCHANNEN M, et al. Born-again neural networks[C] //Proceedings of ICML. Stockholm, Sweden: ACM, 2018: 1602-1611.
[15]	YANG C, XIE L, SU C, et al. Snapshot distillation: teacher-student optimization in one generation[C] //Proceedings of CVPR. Long Beach, USA: IEEE, 2019: 2859-2868.
[16]	XU T, LIU C. Data-distortion guided self-distillation for deep neural networks[C] //Proceedings of AAAI. Hawaii, USA: AAAI, 2019: 5565-5572.
[17]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Proceedings of NIPS. New York, USA: MIT Press, 2017: 5998-6008.
[18]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(NeurIPS). New York, USA: MIT Press, 2017: 770-778.
[19]	QIU X, SUN T, XU Y, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technological Sciences, 2020, 29(2): 1-26.
[20]	BA L, CARUANA R. Do deep nets really need to be deep?[C] //Proceedings of Neural Information Processing Systems. New York, USA: MIT Press, 2013: 2654-2662.
[21]	GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [C] //Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: AAAI, 2010: 249-256.
[22]	ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE 2017: 1125-1134.
[23]	SARIKAYA R, HINTON G, DEORAS A. Application of deep belief networks for natural language under-standing[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(4): 778-784.
[24]	LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C] //Proceedings of IJCAI. New York, USA: AAAI, 2016:168-175.

多维度评价

Viewed

Full text

523

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	523

From	Others	local

Times	86	437
Rate	16%	84%

Abstract

1326

Just accepted	Online first	Issue

0	0	1326

From	Others	local

Times	1325	1
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed