山东大学学报 (工学版) ›› 2021, Vol. 51 ›› Issue (3): 15-21.doi: 10.6040/j.issn.1672-3961.0.2020.249
杨修远,彭韬,杨亮*,林鸿飞
YANG Xiuyuan, PENG Tao, YANG Liang*, LIN Hongfei
摘要: 提出一种自适应多领域知识蒸馏框架,有效地加速推理和减小模型参数同时确保模型性能,采用知识蒸馏方法对情感分析问题进行研究。针对每个特定领域进行知识蒸馏,模型蒸馏涉及词嵌入层蒸馏、编码层蒸馏(注意力蒸馏、隐藏状态蒸馏)、输出预测层蒸馏等多个方面;针对不同领域,学生模型保持相同的编码器,即共享权重,通过不同的领域特定输出层拟合不同的教师模型。在多个公开数据集上的试验结果表明,单领域知识蒸馏使得模型准确度平均提升2.39%,多领域知识蒸馏使得模型准确度平均提升0.5%。与单领域的知识蒸馏相比,该框架增强了学生模型的泛化能力,提升了性能。
中图分类号:
[1] | MATTHEW E, MARK N, MOHIT I, et al. Deep contextualized word representations[C] // Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2018: 2227-2237. |
[2] | DEVLIN J, CHANG M, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2019: 4171-4186. |
[3] | YANG Z, DAI Z, YANG Y, et al. Xlnet: generalized autoregressive pretraining for language underst-anding[C] //Proceedings of NeurlPS. New York, USA: MIT Press, 2019: 5753-5763. |
[4] | JOSHI M, CHEN D, LIU Y, et al. Spanbert: improving pre-training by representing and predicting spans[J]. Transactions of the Association for Comp-utational Linguistics, 2019, 8(1): 64-77. |
[5] | WANG A, AMANPREET S, JULIAN M, et al. Glue: a multi-task benchmark and analysis platform for natural language understanding[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing(EMNLP). Brussels, Belgium: ACL, 2018: 353-355. |
[6] | DING M, ZHOU C, CHEN Q, et al. Cognitive graph for multi-hop reading comprehension at scale [C] //Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy: ACL, 2019: 2694-2703. |
[7] | KOVALEVA O, ROMANOV A, ROGERS A, et al. Revealing the dark secrets of bert[C] //Proceedings of EMNLP-IJCNLP. Hong Kong, China: ACL, 2019: 4355-4365. |
[8] | RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: imagenet classification using binary convolu-tional neural networks[C] //European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016: 525-542. |
[9] | HANG S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural network [C] //Proceedings of Neural Information Processing Systems(NeurIPS). New York, USA: MIT Press, 2015: 1135-1143. |
[10] | LI J, ZHAO R, HUANG J, et al. Learning small-size DNN with output-distribution-based criteria[C] //Proceedings of Interspeech. Lyon, France: Interspeech, 2014:1910-1914. |
[11] | HUANG G, LIU Z, VAN D, et al. Densely connected convolutional networks[C] //Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 4700-4708. |
[12] | YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C] //Proceedings of CVPR. Hawaii, USA: IEEE, 2017: 7130-7138. |
[13] | WOO S, PARK J, LEE J Y, et al. Cbam: con-volutional block attention module[C] //Proceedings of the European Conference on Computer Vision(ECCV). Munich, Germany: Springer, 2018: 3-19. |
[14] | FURLANELLO T, LIPTON Z, TSCHANNEN M, et al. Born-again neural networks[C] //Proceedings of ICML. Stockholm, Sweden: ACM, 2018: 1602-1611. |
[15] | YANG C, XIE L, SU C, et al. Snapshot distillation: teacher-student optimization in one generation[C] //Proceedings of CVPR. Long Beach, USA: IEEE, 2019: 2859-2868. |
[16] | XU T, LIU C. Data-distortion guided self-distillation for deep neural networks[C] //Proceedings of AAAI. Hawaii, USA: AAAI, 2019: 5565-5572. |
[17] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Proceedings of NIPS. New York, USA: MIT Press, 2017: 5998-6008. |
[18] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(NeurIPS). New York, USA: MIT Press, 2017: 770-778. |
[19] | QIU X, SUN T, XU Y, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technological Sciences, 2020, 29(2): 1-26. |
[20] | BA L, CARUANA R. Do deep nets really need to be deep?[C] //Proceedings of Neural Information Processing Systems. New York, USA: MIT Press, 2013: 2654-2662. |
[21] | GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [C] //Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: AAAI, 2010: 249-256. |
[22] | ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE 2017: 1125-1134. |
[23] | SARIKAYA R, HINTON G, DEORAS A. Application of deep belief networks for natural language under-standing[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(4): 778-784. |
[24] | LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C] //Proceedings of IJCAI. New York, USA: AAAI, 2016:168-175. |
[1] | 柴庆发,孙守晶,邱吉福,陈明,魏振,丛伟. 气象灾害条件下电网应急物资预测方法[J]. 山东大学学报 (工学版), 2021, 51(3): 76-83. |
[2] | 梁启星,李彬,李志,张慧,荣学文,范永. 基于模型预测控制的四足机器人斜坡自适应调整算法与实现[J]. 山东大学学报 (工学版), 2021, 51(3): 37-44. |
[3] | 廖锦萍,莫毓昌,YAN Ke. 基于C-LSTM的短期用电预测模型和应用[J]. 山东大学学报 (工学版), 2021, 51(2): 90-97. |
[4] | 周恺卿,李航程,莫礼平. 基于全局最优的自适应和声搜索算法[J]. 山东大学学报 (工学版), 2021, 51(2): 47-56. |
[5] | 程春蕊,毛北行. 一类非线性混沌系统的自适应滑模同步[J]. 山东大学学报 (工学版), 2020, 50(5): 1-6. |
[6] | 刘帅,王磊,丁旭涛. 基于Bi-LSTM的脑电情绪识别[J]. 山东大学学报 (工学版), 2020, 50(4): 35-39. |
[7] | 蔡国永,贺歆灏,储阳阳. 基于空间注意力和卷积神经网络的视觉情感分析[J]. 山东大学学报 (工学版), 2020, 50(4): 8-13. |
[8] | 王春彦,邸金红,毛北行. 基于新型趋近律的参数未知分数阶Rucklidge系统的滑模同步[J]. 山东大学学报 (工学版), 2020, 50(4): 40-45. |
[9] | 刘保成,朴燕,宋雪梅. 联合检测的自适应融合目标跟踪[J]. 山东大学学报 (工学版), 2020, 50(3): 51-57. |
[10] | 闫威,张达敏,张绘娟,辛梓芸,陈忠云. 基于混合决策的改进鸟群算法[J]. 山东大学学报 (工学版), 2020, 50(2): 34-43. |
[11] | 李春阳,李楠,冯涛,王朱贺,马靖凯. 基于深度学习的洗衣机异常音检测[J]. 山东大学学报 (工学版), 2020, 50(2): 108-117. |
[12] | 张胜男,王雷,常春红,郝本利. 基于三维剪切波变换和BM4D的图像去噪方法[J]. 山东大学学报 (工学版), 2020, 50(2): 83-90. |
[13] | 曹小洁,李小华,刘辉. 一类非仿射非线性大系统的结构在线扩展[J]. 山东大学学报 (工学版), 2020, 50(1): 35-48. |
[14] | 陈德蕾,王成,陈建伟,吴以茵. 基于门控循环单元与主动学习的协同过滤推荐算法[J]. 山东大学学报 (工学版), 2020, 50(1): 21-27,48. |
[15] | 苏佳林,王元卓,靳小龙,程学旗. 自适应属性选择的实体对齐方法[J]. 山东大学学报 (工学版), 2020, 50(1): 14-20. |
|