您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2021, Vol. 51 ›› Issue (3): 15-21.doi: 10.6040/j.issn.1672-3961.0.2020.249

• • 上一篇    下一篇

基于知识蒸馏的自适应多领域情感分析

杨修远,彭韬,杨亮*,林鸿飞   

  1. 大连理工大学计算机科学与技术学院, 辽宁 大连 116023
  • 出版日期:2021-06-20 发布日期:2021-06-24
  • 作者简介:杨修远(1995— ),女,河南南阳人,硕士研究生,主要研究方向为情感分析. E-mail:liang@dlut.edu.cn. *通信作者简介:杨亮(1986— ),男,辽宁大连人,讲师,博士,主要研究方向为情感分析,意见挖掘. E-mail:815754134@qq.com
  • 基金资助:
    国家重点研发计划资助项目(2018YFC0830604);国家自然科学基金资助项目(61702080,61806038);中央高校基本科研业务费专项(DUT19RC(4)016);中国博士后基金资助项目(2018M631788)

Adaptive multi-domain sentiment analysis based on knowledge distillation

YANG Xiuyuan, PENG Tao, YANG Liang*, LIN Hongfei   

  1. College of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China
  • Online:2021-06-20 Published:2021-06-24

摘要: 提出一种自适应多领域知识蒸馏框架,有效地加速推理和减小模型参数同时确保模型性能,采用知识蒸馏方法对情感分析问题进行研究。针对每个特定领域进行知识蒸馏,模型蒸馏涉及词嵌入层蒸馏、编码层蒸馏(注意力蒸馏、隐藏状态蒸馏)、输出预测层蒸馏等多个方面;针对不同领域,学生模型保持相同的编码器,即共享权重,通过不同的领域特定输出层拟合不同的教师模型。在多个公开数据集上的试验结果表明,单领域知识蒸馏使得模型准确度平均提升2.39%,多领域知识蒸馏使得模型准确度平均提升0.5%。与单领域的知识蒸馏相比,该框架增强了学生模型的泛化能力,提升了性能。

关键词: 知识蒸馏, 自适应, 多领域, 情感分析, 深度学习

Abstract: An adaptive multi-domain knowledge distillation framework was proposed, which effectively accelerated reasoning and reduced model parameters while ensuring model performance. The knowledge distillation method was used to study sentiment analysis problems. When performing knowledge distillation for each specific field, model distillation involved word embedding layer distillation, coding layer distillation(attention distillation, hidden state distillation), output prediction layer distillation and other aspects of distillation, in order to learn all aspects knowledge from the specific field teacher model. Selectively learning the importance of the teacher model corresponding to different fields to the data was proposed, which further improved the accuracy of the prediction results. The experimental results on multiple public datasets showed that after single-domain knowledge distillation increased the model accuracy by an average of 2.39%, while multi-domain knowledge distillation increased the model accuracy by an average of 0.5%. Compared with the knowledge distillation of a single domain, this framework enhanced the generalization ability of the student model and improved the performance.

Key words: knowledge distillation, adaptive, multi-domain, sentiment analysis, deep learning

中图分类号: 

  • TP391
[1] MATTHEW E, MARK N, MOHIT I, et al. Deep contextualized word representations[C] // Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2018: 2227-2237.
[2] DEVLIN J, CHANG M, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2019: 4171-4186.
[3] YANG Z, DAI Z, YANG Y, et al. Xlnet: generalized autoregressive pretraining for language underst-anding[C] //Proceedings of NeurlPS. New York, USA: MIT Press, 2019: 5753-5763.
[4] JOSHI M, CHEN D, LIU Y, et al. Spanbert: improving pre-training by representing and predicting spans[J]. Transactions of the Association for Comp-utational Linguistics, 2019, 8(1): 64-77.
[5] WANG A, AMANPREET S, JULIAN M, et al. Glue: a multi-task benchmark and analysis platform for natural language understanding[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing(EMNLP). Brussels, Belgium: ACL, 2018: 353-355.
[6] DING M, ZHOU C, CHEN Q, et al. Cognitive graph for multi-hop reading comprehension at scale [C] //Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy: ACL, 2019: 2694-2703.
[7] KOVALEVA O, ROMANOV A, ROGERS A, et al. Revealing the dark secrets of bert[C] //Proceedings of EMNLP-IJCNLP. Hong Kong, China: ACL, 2019: 4355-4365.
[8] RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: imagenet classification using binary convolu-tional neural networks[C] //European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016: 525-542.
[9] HANG S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural network [C] //Proceedings of Neural Information Processing Systems(NeurIPS). New York, USA: MIT Press, 2015: 1135-1143.
[10] LI J, ZHAO R, HUANG J, et al. Learning small-size DNN with output-distribution-based criteria[C] //Proceedings of Interspeech. Lyon, France: Interspeech, 2014:1910-1914.
[11] HUANG G, LIU Z, VAN D, et al. Densely connected convolutional networks[C] //Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 4700-4708.
[12] YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C] //Proceedings of CVPR. Hawaii, USA: IEEE, 2017: 7130-7138.
[13] WOO S, PARK J, LEE J Y, et al. Cbam: con-volutional block attention module[C] //Proceedings of the European Conference on Computer Vision(ECCV). Munich, Germany: Springer, 2018: 3-19.
[14] FURLANELLO T, LIPTON Z, TSCHANNEN M, et al. Born-again neural networks[C] //Proceedings of ICML. Stockholm, Sweden: ACM, 2018: 1602-1611.
[15] YANG C, XIE L, SU C, et al. Snapshot distillation: teacher-student optimization in one generation[C] //Proceedings of CVPR. Long Beach, USA: IEEE, 2019: 2859-2868.
[16] XU T, LIU C. Data-distortion guided self-distillation for deep neural networks[C] //Proceedings of AAAI. Hawaii, USA: AAAI, 2019: 5565-5572.
[17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Proceedings of NIPS. New York, USA: MIT Press, 2017: 5998-6008.
[18] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(NeurIPS). New York, USA: MIT Press, 2017: 770-778.
[19] QIU X, SUN T, XU Y, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technological Sciences, 2020, 29(2): 1-26.
[20] BA L, CARUANA R. Do deep nets really need to be deep?[C] //Proceedings of Neural Information Processing Systems. New York, USA: MIT Press, 2013: 2654-2662.
[21] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [C] //Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: AAAI, 2010: 249-256.
[22] ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE 2017: 1125-1134.
[23] SARIKAYA R, HINTON G, DEORAS A. Application of deep belief networks for natural language under-standing[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(4): 778-784.
[24] LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C] //Proceedings of IJCAI. New York, USA: AAAI, 2016:168-175.
[1] 李坤彪,杨晓晖,张风,徐涛,郭庆北. 基于局部和全局知识蒸馏的危险驾驶行为检测[J]. 山东大学学报 (工学版), 2025, 55(6): 13-20.
[2] 周前,李群,朱丹丹,李仪博. 基于M3C自适应虚拟惯量的海上低频风电系统协调惯量响应控制[J]. 山东大学学报 (工学版), 2025, 55(5): 30-39.
[3] 李常刚,李宝亮,曹永吉,王佳颖. 人工智能在电力系统潮流计算中的应用综述及展望[J]. 山东大学学报 (工学版), 2025, 55(5): 1-17.
[4] 李晓辉,刘小飞,孙炜桐,赵毅,董媛,靳引利. 基于车辆与无人机协同的巡检任务分配与路径规划算法[J]. 山东大学学报 (工学版), 2025, 55(5): 101-109.
[5] 郑晓,陈鹤,周东傲,宫永顺. 基于视频描述增强和双流特征融合的视频异常检测方法[J]. 山东大学学报 (工学版), 2025, 55(5): 110-119.
[6] 周群颖,隋家成,张继,王洪元. 基于自监督卷积和无参数注意力机制的工业品表面缺陷检测[J]. 山东大学学报 (工学版), 2025, 55(4): 40-47.
[7] 杨巨成,路开奎,王嫄. 基于生成对抗网络的知识蒸馏研究综述[J]. 山东大学学报 (工学版), 2025, 55(4): 56-71.
[8] 高君健,廖祝华,刘毅志,赵肄江. 基于分层多智能体强化学习的个性化与信号控制联合路径引导方法[J]. 山东大学学报 (工学版), 2025, 55(3): 34-45.
[9] 薛冰冰,王勇,杨维浩,王川,于迪,王旭. 基于ETC收费数据的高速公路交通流数据修复及实时预测[J]. 山东大学学报 (工学版), 2025, 55(3): 58-71.
[10] 董明书,陈俐企,马川义,张珠皓,孙仁娟,管延华,庄培芝. 沥青路面内部裂缝雷达图像智能判识算法研究[J]. 山东大学学报 (工学版), 2025, 55(3): 72-79.
[11] 吴正健,吾尔尼沙·买买提,杨耀威,阿力木江·艾沙,库尔班·吾布力. 基于DRCoALTP的印刷体文档图像多文种识别方法[J]. 山东大学学报 (工学版), 2025, 55(1): 51-57.
[12] 张梦雨,何振学,赵晓君,王浩然,肖利民,王翔. 基于AMSChOA的MPRM电路面积优化[J]. 山东大学学报 (工学版), 2024, 54(6): 147-155.
[13] 王辰龑,刘轩,超木日力格. 自适应的并行天牛须优化算法[J]. 山东大学学报 (工学版), 2024, 54(5): 74-80.
[14] 方世超,滕旭阳,王子南,陈晗,仇兆炀,毕美华. 基于自适应掩码和生成式修复的图像隐私保护技术[J]. 山东大学学报 (工学版), 2024, 54(5): 111-121.
[15] 常新功,苏敏惠,周志刚. 基于进化集成的图神经网络解释方法[J]. 山东大学学报 (工学版), 2024, 54(4): 1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孔祥臻,刘延俊,王勇,赵秀华 . 气动比例阀的死区补偿与仿真[J]. 山东大学学报(工学版), 2006, 36(1): 99 -102 .
[2] 陈瑞,李红伟,田靖. 磁极数对径向磁轴承承载力的影响[J]. 山东大学学报(工学版), 2018, 48(2): 81 -85 .
[3] 关小军,韩振强,申孝民,麻晓飞,刘运腾 . 09CuPTiRE钢动态再结晶的热模拟实验与有限元模拟[J]. 山东大学学报(工学版), 2006, 36(5): 17 -20 .
[4] 于海波,李宇,余恬,雷虹 . W波段折叠波导慢波系统的尺寸对其冷特性的影响[J]. 山东大学学报(工学版), 2008, 38(3): 90 -94 .
[5] 潘多涛,刘桂萍,刘长风 . 生物絮凝剂产生菌的筛选及培养条件优化[J]. 山东大学学报(工学版), 2008, 38(3): 99 -103 .
[6] 王伟,毛华永,李国祥,潘世艳,巩厅房,晋世强,郝胜兵 . 一种车用燃油加热器燃烧器的流场数值分析[J]. 山东大学学报(工学版), 2008, 38(3): 64 -68 .
[7] 张道强. 知识保持的嵌入方法[J]. 山东大学学报(工学版), 2010, 40(2): 1 -10 .
[8] 张霄 李术才 张庆松 刘钦 张宁 刘斌. TSP信号采集质量影响因素的现场试验研究[J]. 山东大学学报(工学版), 2009, 39(4): 25 -29 .
[9] 罗运虎, 吴旭文,潘双来,董尔令,孙秀娟,王传江,吴娜 . 需求侧两种可中断负荷与发电侧备用容量的协调[J]. 山东大学学报(工学版), 2007, 37(6): 66 -70 .
[10] 王凯,孙奉仲,赵元宾,高明,高山 . 自然通风冷却塔进风口流场模型的建立及计算[J]. 山东大学学报(工学版), 2008, 38(1): 13 -17 .