Journal of Shandong University(Engineering Science) ›› 2021, Vol. 51 ›› Issue (3): 15-21.doi: 10.6040/j.issn.1672-3961.0.2020.249

Previous Articles     Next Articles

Adaptive multi-domain sentiment analysis based on knowledge distillation

YANG Xiuyuan, PENG Tao, YANG Liang*, LIN Hongfei   

  1. College of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China
  • Online:2021-06-20 Published:2021-06-24

Abstract: An adaptive multi-domain knowledge distillation framework was proposed, which effectively accelerated reasoning and reduced model parameters while ensuring model performance. The knowledge distillation method was used to study sentiment analysis problems. When performing knowledge distillation for each specific field, model distillation involved word embedding layer distillation, coding layer distillation(attention distillation, hidden state distillation), output prediction layer distillation and other aspects of distillation, in order to learn all aspects knowledge from the specific field teacher model. Selectively learning the importance of the teacher model corresponding to different fields to the data was proposed, which further improved the accuracy of the prediction results. The experimental results on multiple public datasets showed that after single-domain knowledge distillation increased the model accuracy by an average of 2.39%, while multi-domain knowledge distillation increased the model accuracy by an average of 0.5%. Compared with the knowledge distillation of a single domain, this framework enhanced the generalization ability of the student model and improved the performance.

Key words: knowledge distillation, adaptive, multi-domain, sentiment analysis, deep learning

CLC Number: 

  • TP391
[1] MATTHEW E, MARK N, MOHIT I, et al. Deep contextualized word representations[C] // Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2018: 2227-2237.
[2] DEVLIN J, CHANG M, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of NAACL-HLT. Stroudsburg, USA: Association for Computational Linguistics, 2019: 4171-4186.
[3] YANG Z, DAI Z, YANG Y, et al. Xlnet: generalized autoregressive pretraining for language underst-anding[C] //Proceedings of NeurlPS. New York, USA: MIT Press, 2019: 5753-5763.
[4] JOSHI M, CHEN D, LIU Y, et al. Spanbert: improving pre-training by representing and predicting spans[J]. Transactions of the Association for Comp-utational Linguistics, 2019, 8(1): 64-77.
[5] WANG A, AMANPREET S, JULIAN M, et al. Glue: a multi-task benchmark and analysis platform for natural language understanding[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing(EMNLP). Brussels, Belgium: ACL, 2018: 353-355.
[6] DING M, ZHOU C, CHEN Q, et al. Cognitive graph for multi-hop reading comprehension at scale [C] //Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy: ACL, 2019: 2694-2703.
[7] KOVALEVA O, ROMANOV A, ROGERS A, et al. Revealing the dark secrets of bert[C] //Proceedings of EMNLP-IJCNLP. Hong Kong, China: ACL, 2019: 4355-4365.
[8] RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: imagenet classification using binary convolu-tional neural networks[C] //European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016: 525-542.
[9] HANG S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural network [C] //Proceedings of Neural Information Processing Systems(NeurIPS). New York, USA: MIT Press, 2015: 1135-1143.
[10] LI J, ZHAO R, HUANG J, et al. Learning small-size DNN with output-distribution-based criteria[C] //Proceedings of Interspeech. Lyon, France: Interspeech, 2014:1910-1914.
[11] HUANG G, LIU Z, VAN D, et al. Densely connected convolutional networks[C] //Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 4700-4708.
[12] YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C] //Proceedings of CVPR. Hawaii, USA: IEEE, 2017: 7130-7138.
[13] WOO S, PARK J, LEE J Y, et al. Cbam: con-volutional block attention module[C] //Proceedings of the European Conference on Computer Vision(ECCV). Munich, Germany: Springer, 2018: 3-19.
[14] FURLANELLO T, LIPTON Z, TSCHANNEN M, et al. Born-again neural networks[C] //Proceedings of ICML. Stockholm, Sweden: ACM, 2018: 1602-1611.
[15] YANG C, XIE L, SU C, et al. Snapshot distillation: teacher-student optimization in one generation[C] //Proceedings of CVPR. Long Beach, USA: IEEE, 2019: 2859-2868.
[16] XU T, LIU C. Data-distortion guided self-distillation for deep neural networks[C] //Proceedings of AAAI. Hawaii, USA: AAAI, 2019: 5565-5572.
[17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Proceedings of NIPS. New York, USA: MIT Press, 2017: 5998-6008.
[18] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(NeurIPS). New York, USA: MIT Press, 2017: 770-778.
[19] QIU X, SUN T, XU Y, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technological Sciences, 2020, 29(2): 1-26.
[20] BA L, CARUANA R. Do deep nets really need to be deep?[C] //Proceedings of Neural Information Processing Systems. New York, USA: MIT Press, 2013: 2654-2662.
[21] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [C] //Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: AAAI, 2010: 249-256.
[22] ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE 2017: 1125-1134.
[23] SARIKAYA R, HINTON G, DEORAS A. Application of deep belief networks for natural language under-standing[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(4): 778-784.
[24] LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C] //Proceedings of IJCAI. New York, USA: AAAI, 2016:168-175.
[1] Qingfa CHAI,Shoujing SUN,Jifu QIU,Ming CHEN,Zhen WEI,Wei CONG. Prediction method of power grid emergency supplies under meteorological disasters [J]. Journal of Shandong University(Engineering Science), 2021, 51(3): 76-83.
[2] LIANG Qixing, LI Bin, LI Zhi, ZHANG Hui, RONG Xuewen, FAN Yong. Algorithm of adaptive slope adjustment of quadruped robot based on model predictive control and its application [J]. Journal of Shandong University(Engineering Science), 2021, 51(3): 37-44.
[3] LIAO Jinping, MO Yuchang, YAN Ke. Model and application of short-term electricity consumption forecast based on C-LSTM [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 90-97.
[4] ZHOU Kaiqing, LI Hangcheng, MO Liping. Adaptive harmony search algorithm based on global optimization [J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 47-56.
[5] Chunrui CHENG,Beixing MAO. Adaptive sliding mode synchronization of a class of nonlinear chaotic systems [J]. Journal of Shandong University(Engineering Science), 2020, 50(5): 1-6.
[6] LIU Shuai, WANG Lei, DING Xutao. Emotional EEG recognition based on Bi-LSTM [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 35-39.
[7] Guoyong CAI,Xinhao HE,Yangyang CHU. Visual sentiment analysis based on spatial attention mechanism and convolutional neural network [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 8-13.
[8] WANG Chunyan, DI Jinhong, MAO Beixing. Sliding mode synchronization of fractional-order Rucklidge systems with unknown parameters based on a new type of reaching law [J]. Journal of Shandong University(Engineering Science), 2020, 50(4): 40-45.
[9] Baocheng LIU,Yan PIAO,Xuemei SONG. Adaptive fusion target tracking based on joint detection [J]. Journal of Shandong University(Engineering Science), 2020, 50(3): 51-57.
[10] Wei YAN,Damin ZHANG,Huijuan ZHANG,Ziyun XI,Zhongyun CHEN. Improved bird swarm algorithms based on mixed decision making [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 34-43.
[11] Chunyang LI,Nan LI,Tao FENG,Zhuhe WANG,Jingkai MA. Abnormal sound detection of washing machines based on deep learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 108-117.
[12] Shengnan ZHANG,Lei WANG,Chunhong CHANG,Benli HAO. Image denoising based on 3D shearlet transform and BM4D [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 83-90.
[13] Xiaojie CAO,Xiaohua LI,Hui LIU. Construction expansion online for a class of nonaffine nonlinear large-scale systems [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 35-48.
[14] Delei CHEN,Cheng WANG,Jianwei CHEN,Yiyin WU. GRU-based collaborative filtering recommendation algorithm with active learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 21-27,48.
[15] Jialin SU,Yuanzhuo WANG,Xiaolong JIN,Xueqi CHENG. Entity alignment method based on adaptive attribute selection [J]. Journal of Shandong University(Engineering Science), 2020, 50(1): 14-20.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Su-yu,<\sup>,AI Xing<\sup>,ZHAO Jun<\sup>,LI Zuo-li<\sup>,LIU Zeng-wen<\sup> . Milling force prediction model for highspeed end milling 3Cr2Mo steel[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(1): 1 -5 .
[2] ZHANG Yong-hua,WANG An-ling,LIU Fu-ping . The reflected phase angle of low frequent inhomogeneous[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 22 -25 .
[3] LI Liang, LUO Qiming, CHEN Enhong. Graph-based ranking model for object-level search
[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 15 -21 .
[4] CHEN Rui, LI Hongwei, TIAN Jing. The relationship between the number of magnetic poles and the bearing capacity of radial magnetic bearing[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2018, 48(2): 81 -85 .
[5] WANG Bo,WANG Ning-sheng . Automatic generation and combinatory optimization of disassembly sequence for mechanical-electric assembly[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 52 -57 .
[6] LI Ke,LIU Chang-chun,LI Tong-lei . Medical registration approach using improved maximization of mutual information[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 107 -110 .
[7] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .
[8] QIN Tong, SUN Fengrong*, WANG Limei, WANG Qinghao, LI Xincai. 3D surface reconstruction using the shape based interpolation guided by maximal discs[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 1 -5 .
[9] LIU Wen-liang, ZHU Wei-hong, CHEN Di, ZHANG Hong-quan. Detection and tracking of moving targets using the morphology match in radar images[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 31 -36 .
[10] ZHANG Ying,LANG Yongmei,ZHAO Yuxiao,ZHANG Jianda,QIAO Peng,LI Shanping . Research on technique of aerobic granular sludge cultivationby seeding EGSB anaerobic granular sludge[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(4): 56 -59 .