Journal of Shandong University(Engineering Science) ›› 2025, Vol. 55 ›› Issue (4): 56-71.doi: 10.6040/j.issn.1672-3961.0.2024.055

• Machine Learning & Data Mining • Previous Articles    

Review of knowledge distillation based on generative adversarial networks

YANG Jucheng, LU Kaikui, WANG Yuan   

  1. YANG Jucheng, LU Kaikui, WANG Yuan(College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, China
  • Published:2025-08-31

Abstract: To summarize the application of generative adversarial networks in knowledge distillation, and explore the collaborative mechanisms and optimization potential of generative adversarial networks in knowledge distillation, a review of knowledge distillation based on generative adversarial networks was conducted. Research progress was reviewed in four categories of knowledge distillation, including methods based on output features, intermediate features, relational features, and structural features. The advantages and disadvantages of each approach were analyzed. The classification and development of knowledge distillation methods based on generative adversarial networks were introduced in detail. Limitations of these knowledge distillation techniques based on generative adversarial networks were identified, and potential directions for optimization and application expansion were proposed.

Key words: knowledge transfer, model compression, knowledge distillation, model lightweighting, generative adversarial networks

CLC Number: 

  • TP391
[1] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL].(2015-05-09)[2024-04-20]. https://arxiv.org/abs/1503.02531v1
[2] SUCHOLUTSKY I, SCHONLAU M. Soft-label dataset distillation and text dataset distillation[C] //Proceedings of the 2021 International Joint Conference on Neural Networks(IJCNN). Shenzhen, China: IEEE, 2021: 1-8.
[3] MALIK S M, HAIDER M U, THARANI M, et al. Teacher-class network: a neural network compression mechanism[EB/OL].(2021-10-29)[2024-04-30]. https://arxiv.org/abs/2004.03281v3
[4] PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C] //Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach, USA: IEEE, 2019: 3962-3971.
[5] ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[C] //Proceedings of the 3rd International Conference on Learning Representations. Washington, D.C., USA: ICLR, 2015: 1-13.
[6] YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C] //Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu, USA: IEEE, 2017: 7130-7138.
[7] BUDNIK M, AVRITHIS Y. Asymmetric metric learning for knowledge transfer[C] //Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Nashville, USA: IEEE, 2021: 8228-8238.
[8] LI X J, WU J L, FANG H Y, et al. Local correlation consistency for knowledge distillation[C] //Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 18-33.
[9] TAO X Y, HONG X P, CHANG X Y, et al. Few-shot class-incremental learning[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle, USA: IEEE, 2020: 12183-12192.
[10] CHO J H, HARIHARAN B. On the efficacy of knowledge distillation[C] //Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2019: 4793-4801.
[11] ZHOU Z D, ZHUGE C R, GUAN X W, et al. Channel distillation: channel-wise attention for knowledge distillation[EB/OL].(2020-06-02)[2024-04-25]. https://arxiv.org/abs/2006.01683v1
[12] YUE K Y, DENG J F, ZHOU F. Matching guided distillation[C] //Computer Vision-ECCV 2020. Glasgow, UK: Springer, 2020: 312-328.
[13] CHEN S Y, WANG W Y, PAN S J, et al. Cooperative pruning in cross-domain deep neural network compression[C] //Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: ACM, 2019: 2102-2108.
[14] LE D H, VO T N, THOAI N. Paying more attention to snapshots of iterative pruning: improving model compre-ssion via ensemble distillation[EB/OL].(2020-08-14)[2024-04-25]. https://arxiv.org/abs/2006.11487v3
[15] XIE J, LIN S H, ZHANG Y C, et al. Compressing convolutional neural networks with cheap convolutions and online distillation[J]. Displays, 2023, 78: 102428.
[16] XU K R, RUI L, LI Y S, et al. Feature normalized knowledge distillation for image classification[C] //Computer Vision-ECCV 2020. Glasgow, UK: Springer, 2020: 664-680.
[17] CHEN W C, CHANG C C, LEE C R. Knowledge distillation with feature maps for image classification[C] //Computer Vision-ACCV 2018. Perth, Australia: Springer, 2019: 200-215.
[18] HUANG Z Y, ZOU Y, BHAGAVATULA V, et al. Comprehensive attention self-distillation for weakly-supervised object detection[C] //Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2020: 16797-16807.
[19] LI M, HALSTEAD M, MCCOOL C. Knowledge distillation for efficient instance semantic segmentation with transformers[C] //Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle, USA: IEEE, 2024: 5432-5439.
[20] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C] //Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM, 2014: 2672-2680.
[21] 张俊三, 程俏俏, 万瑶, 等. MIRGAN: 一种基于GAN的医学影像报告生成模型[J]. 山东大学学报(工学版), 2021, 51(2): 9-18. ZHANG Junsan, CHENG Qiaoqiao, WAN Yao, et al. MIRGAN: a medical image report generation model based on GAN[J]. Journal of Shandong University(Engineering Science), 2021, 51(2): 9-18.
[22] 张月芳, 邓红霞, 呼春香, 等. 融合残差块注意力机制和生成对抗网络的海马体分割[J]. 山东大学学报(工学版), 2020, 50(6): 76-81. ZHANG Yuefang, DENG Hongxia, HU Chunxiang, et al. Hippocampal segmentation combining residual attention mechanism and generative adversarial networks[J]. Journal of Shandong University(Engineering Science), 2020, 50(6): 76-81.
[23] GOU J P, YU B S, MAYBANK S J, et al. Knowledge distillation: a survey[J]. International Journal of Computer Vision, 2021, 129: 1789-1819.
[24] 黄震华, 杨顺志, 林威, 等. 知识蒸馏研究综述[J]. 计算机学报, 2022, 45(3): 624-653. HUANG Zhenhua, YANG Shunzhi, LIN Wei, et al. Knowledge distillation: a survey[J]. Chinese Journal of Computers, 2022, 45(3): 624-653.
[25] BUCILUĂ C, CARUANA R, NICULESCU-MIZIL A. Model compression[C] //Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, USA: ACM, 2006: 535-541.
[26] BA L J, CARUANA R. Do deep nets really need to be deep? [C] //Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM, 2014: 2654-2662.
[27] LI J Y, ZHAO R, HUANG J T, et al. Learning small-size DNN with output-distribution-based criteria[C] //Proceedings of the 15th Annual Conference of the International Speech Communication Association. Singapore: ISCA, 2014: 1910-1914.
[28] TANG Z Y, WANG D, ZHANG Z Y. Recurrent neural network training with dark knowledge transfer[C] //Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Shanghai, China: IEEE, 2016: 5900-5904.
[29] YANG C L, XIE L X, QIAO S Y, et al. Training deep neural networks in generations: a more tolerant teacher educates better students[C] //Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI, 2019: 5628-5635.
[30] YUAN L, TAY F E, LI G L, et al. Revisiting knowledge distillation via label smoothing regularization[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle, USA: IEEE, 2020: 3902-3910.
[31] ZHAO B R, CUI Q, SONG R J, et al. Decoupled knowledge distillation[C] //Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans, USA: IEEE, 2022: 11943-11952.
[32] XIE Q Z, LUONG M T, HOVY E, et al. Self-training with noisy student improves ImageNet classification[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle, USA: IEEE, 2020: 10684-10695.
[33] GUPTA S, HOFFMAN J, MALIK J. Cross modal distillation for supervision transfer[C] //Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE, 2016: 2827-2836.
[34] KOOHPAYEGANI A S, TEJANKAR A, PIRSIAVASH H. CompRess: self-supervised learning by compressing representations[EB/OL].(2020-10-28)[2024-4-25]. https://arxiv.org/abs/2010.14713v1
[35] YUN S, PARK J, LEE K, et al. Regularizing class-wise predictions via self-knowledge distillation[C] //Procee-dings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle, USA: IEEE, 2020: 13873-13882.
[36] WU G L, GONG S G. Peer collaborative learning for online knowledge distillation[EB/OL].(2021-03-03)[2024-4-25]. https://arxiv.org/abs/2006.04147v2
[37] PENG B Y, JIN X, LI D S, et al. Correlation congruence for knowledge distillation[C] //Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2019: 5007-5016.
[38] ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[EB/OL].(2017-02-12)[2024-04-25]. https://arxiv.org/abs/1612.03928v3
[39] TUNG F, MORI G. Similarity-preserving knowledge distillation[C] //Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2019: 1365-1374.
[40] PASSALIS N, TEFAS A. Learning deep representations with probabilistic knowledge transfer[C] //Computer Vision-ECCV 2018. Munich, Germany: Springer, 2018: 283-299.
[41] GUAN Y S, ZHAO P Y, WANG B X, et al. Differentiable feature aggregation search for knowledge distillation[C] //Computer Vision-ECCV 2020. Glasgow, UK: Springer, 2020: 469-484.
[42] LEE S H, KIM D H, SONG B C. Self-supervised knowledge distillation using singular value decomposition[C] //Computer Vision-ECCV 2018. Munich, Germany: Springer, 2018: 339-354.
[43] HEO B, LEE M, YUN S, et al. Knowledge transfer via distillation of activation boundaries formed by hidden neurons[C] //Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: ACM, 2019: 3779-3787.
[44] KIM J, PARK S, KWAK N, et al. Paraphrasing complex network: network compression via factor transfer[C] //Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: ACM, 2018: 2765-2774.
[45] AHN S, HU S X, DAMIANOU A, et al. Variational information distillation for knowledge transfer[C] //Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach, USA: IEEE, 2019: 9163-9171.
[46] YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C] //Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu, USA: IEEE, 2017:7130-7138.
[47] SRINIVAS S, FLEURET F. Knowledge transfer with Jacobian matching[EB/OL].(2018-03-01)[2024-04-25]. https://arxiv.org/abs/1803.00443v1
[48] PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C] //Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach, USA: IEEE, 2019: 3962-3971.
[49] LASSANCE C, BONTONOU M, HACENE G B, et al. Deep geometric knowledge distillation with graphs[C] //ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Barcelona, Spain: IEEE, 2020: 8484-8488.
[50] LIU Y F, CAO J J, LI B, et al. Knowledge distillation via instance relationship graph[C] //Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach, USA: IEEE, 2019: 7089-7097.
[51] XU X X, ZOU Q, LIN X, et al. Integral knowledge distillation for multi-person pose estimation[J]. IEEE Signal Processing Letters, 2020, 27: 436-440.
[52] HOU Y N, MA Z, LIU C X, et al. Inter-region affinity distillation for road marking segmentation[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle, USA: IEEE, 2020: 12483-12492.
[53] CHEN X, ZHANG Y F, XU H T, et al. Adversarial distillation for efficient recommendation with external knowledge[J]. ACM Transactions on Information Systems, 2018, 37(1): 1-28.
[54] XU Z, HSU Y C, HUANG J. Training student networks for acceleration with conditional adversarial networks[C] //Proceedings of the 2018 British Machine Vision Conference(BMVC). Newcastle, UK: BMVA, 2018:61.
[55] ZHANG T C, LIU Y X. MTUW-GAN: a multi-teacher knowledge distillation generative adversarial network for underwater image enhancement[J]. Applied Sciences, 2024, 14(2): 529.
[56] WANG Y H, XU C, XU C, et al. Adversarial learning of portable student networks[C] //Proceedings of the Thirty-Second AAAI Conference on Artificial Intelli-gence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: ACM, 2018: 4260-4267.
[57] BELAGIANNIS V, FARSHAD A, GALASSO F. Adversarial network compression[C] //Computer Vision-ECCV 2018. Munich, Germany: Springer, 2019: 431-449.
[58] AGUINALDO A, CHIANG P Y, GAIN A, et al. Compressing GANs using knowledge distillation[EB/OL].(2019-02-01)[2024-4-30]. https://arxiv.org/abs/1902.00159v1
[59] WANG X, ZHANG R, SUN Y, et al. KDGAN: knowledge distillation with generative adversarial networks[C] //Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: ACM, 2018: 783-794.
[60] REN Y X, WU J, XIAO X F, et al. Online multi-granularity distillation for GAN compression[C] //Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal, Canada: IEEE, 2021: 6773-6783.
[61] HU T, LIN M B, YOU L Z, et al. Discriminator-cooperated feature map distillation for GAN compression[C] //Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Vancouver, Canada: IEEE, 2023: 20351-20360.
[62] VO D M, SUGIMOTO A, NAKAYAMA H. PPCD-GAN: progressive pruning and class-aware distillation for large-scale conditional GANs compression[C] //Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision(WACV). Waikoloa, USA: IEEE, 2022: 1422-1430.
[63] CHEN H T, WANG Y H, XU C, et al. Data-free learning of student networks[C] //Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2019: 3513-3521.
[64] YE J W, JI Y X, WANG X C, et al. Data-free knowledge amalgamation via group-stack dual-GAN[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle, USA: IEEE, 2020: 12513-12522.
[65] 张晶, 鞠佳良, 任永功. 基于双生成器网络的Data-Free知识蒸馏[J]. 计算机研究与发展, 2023, 60(7): 1615-1627. ZHANG Jing, JU Jialiang, REN Yonggong. Double-generators network for Data-Free knowledge distillation[J]. Journal of Computer Research and Development, 2023, 60(7): 1615-1627.
[66] MICAELLI P, STORKEY A. Zero-shot knowledge transfer via adversarial belief matching[EB/OL].(2019-11-25)[2024-04-30]. https://arxiv.org/abs/1905.09768v4
[67] FANG G F, SONG J, SHEN C C, et al. Data-free adversarial distillation[EB/OL].(2020-03-02)[2024-04-30]. https://arxiv.org/abs/1912.11006v3
[68] YU X Y, YAN L, YANG Y, et al. Conditional generative data-free knowledge distillation[J]. Image and Vision Computing, 2023, 131: 104627.
[69] DO K, LE T H, NGUYEN D, et al. Momentum adversarial distillation: handling large distribution shifts in data-free knowledge distillation[EB/OL].(2022-09-21)[2024-04-30]. https://arxiv.org/abs/2209.10359v1
[70] CHOI Y, CHOI J, EL-KHAMY M, et al. Data-free network quantization with adversarial knowledge distillation[C] //Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW). Seattle, USA: IEEE, 2020: 710-711.
[71] CUI K W, YU Y C, ZHAN F N, et al. KD-DLGAN: data limited image generation via knowledge distillation[C] //Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Vancouver, Canada: IEEE, 2023: 3872-3882.
[72] CHEN H T, WANG Y H, SHU H, et al. Distilling portable generative adversarial networks for image translation[C] //Proceedings of the 34th AAAI Confe-rence on Artificial Intelligence. Palo Alto, USA: AAAI, 2020: 3585-3592.
[73] GAO T W, LONG R J. Accumulation knowledge distillation for conditional GAN compression[C] //Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops(ICCVW). Paris, France: IEEE, 2023: 1294-1303.
[74] CHUNG I, PARK S, KIM J, et al. Feature-map-level online adversarial knowledge distillation[EB/OL].(2020-06-05)[2024-04-30]. https://arxiv.org/abs/2002.01775v3
[75] WANG W W, HONG W, WANG F, et al. GAN-knowledge distillation for one-stage object detection[J]. IEEE Access, 2020, 8: 60719-60727.
[76] ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein generative adversarial networks[C] //Proceedings of the 2017 International Conference on Machine Learning(ICML). Sydney, Australia: JMLR, 2017: 214-223.
[77] MIRZA M, OSINDERO S. Conditional generative adversarial nets[EB/OL].(2014-11-06)[2024-04-30]. https://arxiv.org/abs/1411.1784v1
[78] CHEN P G, LIU S, ZHAO H S, et al. Distilling knowledge via knowledge review[C] //Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Nashville, USA: IEEE, 2021: 5006-5015.
[79] HUANG Z Z, LIANG M F, QIN J H, et al. Understanding self-attention mechanism via dynamical system perspective[C] //Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision(ICCV). Paris, France: IEEE, 2023: 1412-1422.
[80] 兰治, 严彩萍, 李红, 等. 混合双注意力机制生成对抗网络的图像修复模型[J]. 中国图象图形学报, 2023, 28(11): 3440-3452. LAN Zhi, YAN Caiping, LI Hong, et al. HDA-GAN: hybrid dual attention generative adversarial network for image inpainting[J]. Journal of Image and Graphics, 2023, 28(11): 3440-3452.
[81] 黄仲浩, 杨兴耀, 于炯, 等. 基于多阶段多生成对抗网络的互学习知识蒸馏方法[J]. 计算机科学, 2022, 49(10): 169-175. HUANG Zhonghao, YANG Xingyao, YU Jiong, et al. Mutual learning knowledge distillation based on multi-stage multi-generative adversarial network[J]. Computer Science, 2022, 49(10): 169-175.
[82] 钱亚冠, 马骏, 何念念, 等. 面向边缘智能的两阶段对抗知识迁移方法[J]. 软件学报, 2022, 33(12): 4504-4516. QIAN Yaguan, MA Jun, HE Niannian, et al. Two-stage adversarial knowledge transfer for edge intelligence[J]. Journal of Software, 2022, 33(12): 4504-4516.
[83] SHI Y, TANG A D, NIU L F, et al. Sparse optimization guided pruning for neural networks[J]. Neurocomputing, 2024, 574: 127280.
[1] ZHOU Qunying, SUI Jiacheng, ZHANG Ji, WANG Hongyuan. Industrial product surface defect detection based on self supervised convolution and parameter free attention mechanism [J]. Journal of Shandong University(Engineering Science), 2025, 55(4): 40-47.
[2] Tongyu JIANG, Fan CHEN, Hongjie HE. Lightweight face super-resolution network based on asymmetric U-pyramid reconstruction [J]. Journal of Shandong University(Engineering Science), 2022, 52(1): 1-8.
[3] YANG Xiuyuan, PENG Tao, YANG Liang, LIN Hongfei. Adaptive multi-domain sentiment analysis based on knowledge distillation [J]. Journal of Shandong University(Engineering Science), 2021, 51(3): 15-21.
[4] Chunyang LI,Nan LI,Tao FENG,Zhuhe WANG,Jingkai MA. Abnormal sound detection of washing machines based on deep learning [J]. Journal of Shandong University(Engineering Science), 2020, 50(2): 108-117.
[5] Zhifu CHANG,Fengyu ZHOU,Yugang WANG,Dongdong SHEN,Yang ZHAO. A survey of image captioning methods based on deep learning [J]. Journal of Shandong University(Engineering Science), 2019, 49(6): 25-35.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!