山东大学学报 (工学版) ›› 2021, Vol. 51 ›› Issue (3): 15-21.doi: 10.6040/j.issn.1672-3961.0.2020.249
杨修远,彭韬,杨亮*,林鸿飞
YANG Xiuyuan, PENG Tao, YANG Liang*, LIN Hongfei
摘要: 提出一种自适应多领域知识蒸馏框架,有效地加速推理和减小模型参数同时确保模型性能,采用知识蒸馏方法对情感分析问题进行研究。针对每个特定领域进行知识蒸馏,模型蒸馏涉及词嵌入层蒸馏、编码层蒸馏(注意力蒸馏、隐藏状态蒸馏)、输出预测层蒸馏等多个方面;针对不同领域,学生模型保持相同的编码器,即共享权重,通过不同的领域特定输出层拟合不同的教师模型。在多个公开数据集上的试验结果表明,单领域知识蒸馏使得模型准确度平均提升2.39%,多领域知识蒸馏使得模型准确度平均提升0.5%。与单领域的知识蒸馏相比,该框架增强了学生模型的泛化能力,提升了性能。
中图分类号:
| [1] | MATTHEW E, MARK N, MOHIT I, et al. Deep contextualized word representations[C] // Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2018: 2227-2237. |
| [2] | DEVLIN J, CHANG M, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2019: 4171-4186. |
| [3] | YANG Z, DAI Z, YANG Y, et al. Xlnet: generalized autoregressive pretraining for language underst-anding[C] //Proceedings of NeurlPS. New York, USA: MIT Press, 2019: 5753-5763. |
| [4] | JOSHI M, CHEN D, LIU Y, et al. Spanbert: improving pre-training by representing and predicting spans[J]. Transactions of the Association for Comp-utational Linguistics, 2019, 8(1): 64-77. |
| [5] | WANG A, AMANPREET S, JULIAN M, et al. Glue: a multi-task benchmark and analysis platform for natural language understanding[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing(EMNLP). Brussels, Belgium: ACL, 2018: 353-355. |
| [6] | DING M, ZHOU C, CHEN Q, et al. Cognitive graph for multi-hop reading comprehension at scale [C] //Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy: ACL, 2019: 2694-2703. |
| [7] | KOVALEVA O, ROMANOV A, ROGERS A, et al. Revealing the dark secrets of bert[C] //Proceedings of EMNLP-IJCNLP. Hong Kong, China: ACL, 2019: 4355-4365. |
| [8] | RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: imagenet classification using binary convolu-tional neural networks[C] //European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016: 525-542. |
| [9] | HANG S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural network [C] //Proceedings of Neural Information Processing Systems(NeurIPS). New York, USA: MIT Press, 2015: 1135-1143. |
| [10] | LI J, ZHAO R, HUANG J, et al. Learning small-size DNN with output-distribution-based criteria[C] //Proceedings of Interspeech. Lyon, France: Interspeech, 2014:1910-1914. |
| [11] | HUANG G, LIU Z, VAN D, et al. Densely connected convolutional networks[C] //Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 4700-4708. |
| [12] | YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C] //Proceedings of CVPR. Hawaii, USA: IEEE, 2017: 7130-7138. |
| [13] | WOO S, PARK J, LEE J Y, et al. Cbam: con-volutional block attention module[C] //Proceedings of the European Conference on Computer Vision(ECCV). Munich, Germany: Springer, 2018: 3-19. |
| [14] | FURLANELLO T, LIPTON Z, TSCHANNEN M, et al. Born-again neural networks[C] //Proceedings of ICML. Stockholm, Sweden: ACM, 2018: 1602-1611. |
| [15] | YANG C, XIE L, SU C, et al. Snapshot distillation: teacher-student optimization in one generation[C] //Proceedings of CVPR. Long Beach, USA: IEEE, 2019: 2859-2868. |
| [16] | XU T, LIU C. Data-distortion guided self-distillation for deep neural networks[C] //Proceedings of AAAI. Hawaii, USA: AAAI, 2019: 5565-5572. |
| [17] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Proceedings of NIPS. New York, USA: MIT Press, 2017: 5998-6008. |
| [18] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(NeurIPS). New York, USA: MIT Press, 2017: 770-778. |
| [19] | QIU X, SUN T, XU Y, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technological Sciences, 2020, 29(2): 1-26. |
| [20] | BA L, CARUANA R. Do deep nets really need to be deep?[C] //Proceedings of Neural Information Processing Systems. New York, USA: MIT Press, 2013: 2654-2662. |
| [21] | GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [C] //Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: AAAI, 2010: 249-256. |
| [22] | ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE 2017: 1125-1134. |
| [23] | SARIKAYA R, HINTON G, DEORAS A. Application of deep belief networks for natural language under-standing[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(4): 778-784. |
| [24] | LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C] //Proceedings of IJCAI. New York, USA: AAAI, 2016:168-175. |
| [1] | 李坤彪,杨晓晖,张风,徐涛,郭庆北. 基于局部和全局知识蒸馏的危险驾驶行为检测[J]. 山东大学学报 (工学版), 2025, 55(6): 13-20. |
| [2] | 周前,李群,朱丹丹,李仪博. 基于M3C自适应虚拟惯量的海上低频风电系统协调惯量响应控制[J]. 山东大学学报 (工学版), 2025, 55(5): 30-39. |
| [3] | 李常刚,李宝亮,曹永吉,王佳颖. 人工智能在电力系统潮流计算中的应用综述及展望[J]. 山东大学学报 (工学版), 2025, 55(5): 1-17. |
| [4] | 李晓辉,刘小飞,孙炜桐,赵毅,董媛,靳引利. 基于车辆与无人机协同的巡检任务分配与路径规划算法[J]. 山东大学学报 (工学版), 2025, 55(5): 101-109. |
| [5] | 郑晓,陈鹤,周东傲,宫永顺. 基于视频描述增强和双流特征融合的视频异常检测方法[J]. 山东大学学报 (工学版), 2025, 55(5): 110-119. |
| [6] | 周群颖,隋家成,张继,王洪元. 基于自监督卷积和无参数注意力机制的工业品表面缺陷检测[J]. 山东大学学报 (工学版), 2025, 55(4): 40-47. |
| [7] | 杨巨成,路开奎,王嫄. 基于生成对抗网络的知识蒸馏研究综述[J]. 山东大学学报 (工学版), 2025, 55(4): 56-71. |
| [8] | 高君健,廖祝华,刘毅志,赵肄江. 基于分层多智能体强化学习的个性化与信号控制联合路径引导方法[J]. 山东大学学报 (工学版), 2025, 55(3): 34-45. |
| [9] | 薛冰冰,王勇,杨维浩,王川,于迪,王旭. 基于ETC收费数据的高速公路交通流数据修复及实时预测[J]. 山东大学学报 (工学版), 2025, 55(3): 58-71. |
| [10] | 董明书,陈俐企,马川义,张珠皓,孙仁娟,管延华,庄培芝. 沥青路面内部裂缝雷达图像智能判识算法研究[J]. 山东大学学报 (工学版), 2025, 55(3): 72-79. |
| [11] | 吴正健,吾尔尼沙·买买提,杨耀威,阿力木江·艾沙,库尔班·吾布力. 基于DRCoALTP的印刷体文档图像多文种识别方法[J]. 山东大学学报 (工学版), 2025, 55(1): 51-57. |
| [12] | 张梦雨,何振学,赵晓君,王浩然,肖利民,王翔. 基于AMSChOA的MPRM电路面积优化[J]. 山东大学学报 (工学版), 2024, 54(6): 147-155. |
| [13] | 王辰龑,刘轩,超木日力格. 自适应的并行天牛须优化算法[J]. 山东大学学报 (工学版), 2024, 54(5): 74-80. |
| [14] | 方世超,滕旭阳,王子南,陈晗,仇兆炀,毕美华. 基于自适应掩码和生成式修复的图像隐私保护技术[J]. 山东大学学报 (工学版), 2024, 54(5): 111-121. |
| [15] | 常新功,苏敏惠,周志刚. 基于进化集成的图神经网络解释方法[J]. 山东大学学报 (工学版), 2024, 54(4): 1-12. |
|