您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2026, Vol. 56 ›› Issue (3): 144-155.doi: 10.6040/j.issn.1672-3961.0.2025.105

• 机器学习与数据挖掘 • 上一篇    下一篇

通过元学习增强泛化的一种少样本模仿学习方法

韦龙,冯翔,虞慧群   

  1. 华东理工大学信息科学与工程学院, 上海 200237
  • 发布日期:2026-06-09
  • 作者简介:韦龙(2000— ),男,广西宾阳人,硕士研究生,主要研究方向为模仿学习. E-mail:1603386185@qq.com
  • 基金资助:
    国家自然科学基金重点资助项目(62136003);国家自然科学基金面上资助项目(62276097,62372174)

A few-shot imitation learning method by improving generalization with meta-learning

WEI Long, FENG Xiang, YU Huiqun   

  1. WEI Long, FENG Xiang, YU Huiqun(School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Published:2026-06-09

摘要: 针对大多数经典模仿学习方法在少样本场景下因数据稀缺性导致模型训练效果不佳且泛化能力不足的问题,提出一种基于元学习的生成对抗模仿学习(meta-learning based generative adversarial imitation learning, Meta-GAIL)方法。通过引入元学习机制,策略网络能够在与目标任务具有相似特征的多样化任务中预先积累经验知识;采用生成对抗模仿学习(generative adversarial imitation learning, GAIL)算法对目标任务提供的少量示范数据进行微调,实现新任务快速适应性迁移。为验证方法有效性,在MuJoCo物理仿真平台开展系统性试验,将Meta-GAIL方法与基线算法进行对比评估。试验结果表明,Meta-GAIL方法通过有效融合元学习阶段获取的跨任务知识表征,在未见过的相似任务场景中展现出更强的快速适应能力,在少样本设定下的性能表现持续优于对比基线算法。

关键词: 少样本学习, 模仿学习, 生成对抗模仿学习, 元学习, 泛化

Abstract: To address the issues of poor training performance and insufficient generalization capability of most classical imitation learning methods in few-shot scenarios due to data scarcity, a meta-learning based generative adversarial imitation learning(Meta-GAIL)method was proposed. Through the introduction of meta-learning mechanisms, the policy network pre-accumulated experiential knowledge from diverse tasks with similar characteristics to the target task. The generative adversarial imitation learning(GAIL)algorithm was utilized to fine-tune the network using the limited demonstration data provided by the target task, achieving rapid adaptive transfer to new tasks. To validate the effectiveness of the method, systematic experiments were conducted on the MuJoCo physics simulation platform, where Meta-GAIL method was compared and evaluated against baseline algorithms. Experimental results demonstrated that Meta-GAIL method exhibited stronger rapid adaptability in unseen similar task scenarios by effectively integrating cross-task knowledge representations acquired during the meta-learning phase, and its performance consistently outperformed baseline algorithms under few-shot settings.

Key words: few-shot learning, imitation learning, generative adversarial imitation learning, meta-learning, generalization

中图分类号: 

  • TP18
[1] TORABI F, WARNELL G, STONE P. Behavioral cloning from observation[C] //Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: AAAI, 2018: 4950-4957.
[2] NG A Y, RUSSELL S. Algorithms for inverse reinforcement learning[C] //Proceedings of the Seven-teenth International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 2000: 663-670.
[3] ZARE M, KEBRIA P M, KHOSRAVI A, et al. A survey of imitation learning: algorithms, recent developments, and challenges[J]. IEEE Transactions on Cybernetics, 2024, 54(12): 7173-7186.
[4] HAKHAMANESHI K, ZHAO R H, ZHAN A, et al. Hierarchical few-shot imitation with skill transition models[EB/OL].(2022-03-10)[2025-04-21]. https://arxiv.org/abs/2107.08981
[5] CAO H Y, COHEN S N, SZPRUCH L. Identifiability in inverse reinforcement learning[EB/OL].(2021-11-08)[2025-04-21]. https://arxiv.org/abs/2106.03498
[6] ARORA S, DOSHI P. A survey of inverse reinforcement learning: challenges, methods and progress[J]. Artificial Intelligence, 2021, 297: 103500.
[7] HO J, ERMON S. Generative adversarial imitation learning[C] //Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM, 2016: 4572-4580.
[8] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C] //Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT, 2014: 2672-2680.
[9] RAVICHANDAR H, POLYDOROS A S, CHERNOVA S, et al. Recent advances in robot learning from demonstration[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2020, 3: 297-330.
[10] JENA R, LIU C, SYCARA K. Augmenting GAIL with BC for sample efficient imitation learning[EB/OL].(2020-11-09)[2025-04-21]. https://arxiv.org/abs/2001.07798
[11] BARAM N, ANSCHEL O, CASPI I, et al. End-to-end differentiable adversarial imitation learning[C] //Pro-ceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 390-399.
[12] FINN C, YU T, ZHANG T, et al. One-shot visual imitation learning via meta-learning[EB/OL].(2017-09-14)[2025-04-21]. https://arxiv.org/abs/1709.04905
[13] HUISMAN M, VAN RIJN J N, PLAAT A. A survey of deep meta-learning[J]. Artificial Intelligence Review, 2021, 54(6): 4483-4541.
[14] REDDY S, DRAGAN A D, LEVINE S. SQIL: imitation learning via reinforcement learning with sparse rewards[EB/OL].(2019-09-25)[2025-04-21]. https://arxiv.org/abs/1905.11108
[15] TORABI F, WARNELL G, STONE P. Recent advances in imitation learning from observation[EB/OL].(2019-06-19)[2025-04-21]. https://arxiv.org/abs/1905.13566
[16] OSA T, PAJARINEN J, NEUMANN G, et al. An algorithmic perspective on imitation learning[J]. Foundations and Trends in Robotics, 2018, 7(1/2): 1-179.
[17] PATACCHIOLA M, SUN M F, HOFMANN K, et al. Comparing the efficacy of fine-tuning and meta-learning for few-shot policy imitation[EB/OL].(2023-06-23)[2025-04-21]. https://arxiv.org/abs/2306.13554
[18] DE HAAN P, JAYARAMAN D, LEVINE S. Causal confusion in imitation learning[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2019: 11698-11709.
[1] 刘玉芳,王绍卿,郑顺,张丽杰,孙福振. 基于跨域元学习框架的冷启动用户表示学习方法[J]. 山东大学学报 (工学版), 2024, 54(6): 29-37.
[2] 刘冬兰,刘新,刘家乐,赵鹏,常英贤,王睿,姚洪磊,罗昕. 基于分解式Transformer的联邦长期时间序列预测算法[J]. 山东大学学报 (工学版), 2024, 54(5): 101-110.
[3] 李璐,张志军,范钰敏,王星,袁卫华. 面向冷启动用户的元学习与图转移学习序列推荐[J]. 山东大学学报 (工学版), 2024, 54(2): 69-79.
[4] 宁春梅,孙博,肖敬先,陈廷伟. 基于时间感知注意力机制的混合编码网络方法[J]. 山东大学学报 (工学版), 2022, 52(2): 23-30.
[5] 李小斌1, 李世银2. 时间序列早期分类的多分类器集成方法[J]. 山东大学学报(工学版), 2011, 41(4): 73-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!