通过元学习增强泛化的一种少样本模仿学习方法

doi:10.6040/j.issn.1672-3961.0.2025.105

摘要/Abstract

摘要： 针对大多数经典模仿学习方法在少样本场景下因数据稀缺性导致模型训练效果不佳且泛化能力不足的问题,提出一种基于元学习的生成对抗模仿学习(meta-learning based generative adversarial imitation learning, Meta-GAIL)方法。通过引入元学习机制,策略网络能够在与目标任务具有相似特征的多样化任务中预先积累经验知识;采用生成对抗模仿学习(generative adversarial imitation learning, GAIL)算法对目标任务提供的少量示范数据进行微调,实现新任务快速适应性迁移。为验证方法有效性,在MuJoCo物理仿真平台开展系统性试验,将Meta-GAIL方法与基线算法进行对比评估。试验结果表明,Meta-GAIL方法通过有效融合元学习阶段获取的跨任务知识表征,在未见过的相似任务场景中展现出更强的快速适应能力,在少样本设定下的性能表现持续优于对比基线算法。

关键词: 少样本学习, 模仿学习, 生成对抗模仿学习, 元学习, 泛化

Abstract: To address the issues of poor training performance and insufficient generalization capability of most classical imitation learning methods in few-shot scenarios due to data scarcity, a meta-learning based generative adversarial imitation learning(Meta-GAIL)method was proposed. Through the introduction of meta-learning mechanisms, the policy network pre-accumulated experiential knowledge from diverse tasks with similar characteristics to the target task. The generative adversarial imitation learning(GAIL)algorithm was utilized to fine-tune the network using the limited demonstration data provided by the target task, achieving rapid adaptive transfer to new tasks. To validate the effectiveness of the method, systematic experiments were conducted on the MuJoCo physics simulation platform, where Meta-GAIL method was compared and evaluated against baseline algorithms. Experimental results demonstrated that Meta-GAIL method exhibited stronger rapid adaptability in unseen similar task scenarios by effectively integrating cross-task knowledge representations acquired during the meta-learning phase, and its performance consistently outperformed baseline algorithms under few-shot settings.

Key words: few-shot learning, imitation learning, generative adversarial imitation learning, meta-learning, generalization

中图分类号:

TP18

韦龙,冯翔,虞慧群. 通过元学习增强泛化的一种少样本模仿学习方法[J]. 山东大学学报 (工学版), 2026, 56(3): 144-155.

WEI Long, FENG Xiang, YU Huiqun. A few-shot imitation learning method by improving generalization with meta-learning[J]. Journal of Shandong University(Engineering Science), 2026, 56(3): 144-155.

参考文献

[1] TORABI F, WARNELL G, STONE P. Behavioral cloning from observation[C] //Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: AAAI, 2018: 4950-4957.
[2] NG A Y, RUSSELL S. Algorithms for inverse reinforcement learning[C] //Proceedings of the Seven-teenth International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 2000: 663-670.
[3] ZARE M, KEBRIA P M, KHOSRAVI A, et al. A survey of imitation learning: algorithms, recent developments, and challenges[J]. IEEE Transactions on Cybernetics, 2024, 54(12): 7173-7186.
[4] HAKHAMANESHI K, ZHAO R H, ZHAN A, et al. Hierarchical few-shot imitation with skill transition models[EB/OL].(2022-03-10)[2025-04-21]. https://arxiv.org/abs/2107.08981
[5] CAO H Y, COHEN S N, SZPRUCH L. Identifiability in inverse reinforcement learning[EB/OL].(2021-11-08)[2025-04-21]. https://arxiv.org/abs/2106.03498
[6] ARORA S, DOSHI P. A survey of inverse reinforcement learning: challenges, methods and progress[J]. Artificial Intelligence, 2021, 297: 103500.
[7] HO J, ERMON S. Generative adversarial imitation learning[C] //Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM, 2016: 4572-4580.
[8] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C] //Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT, 2014: 2672-2680.
[9] RAVICHANDAR H, POLYDOROS A S, CHERNOVA S, et al. Recent advances in robot learning from demonstration[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2020, 3: 297-330.
[10] JENA R, LIU C, SYCARA K. Augmenting GAIL with BC for sample efficient imitation learning[EB/OL].(2020-11-09)[2025-04-21]. https://arxiv.org/abs/2001.07798
[11] BARAM N, ANSCHEL O, CASPI I, et al. End-to-end differentiable adversarial imitation learning[C] //Pro-ceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 390-399.
[12] FINN C, YU T, ZHANG T, et al. One-shot visual imitation learning via meta-learning[EB/OL].(2017-09-14)[2025-04-21]. https://arxiv.org/abs/1709.04905
[13] HUISMAN M, VAN RIJN J N, PLAAT A. A survey of deep meta-learning[J]. Artificial Intelligence Review, 2021, 54(6): 4483-4541.
[14] REDDY S, DRAGAN A D, LEVINE S. SQIL: imitation learning via reinforcement learning with sparse rewards[EB/OL].(2019-09-25)[2025-04-21]. https://arxiv.org/abs/1905.11108
[15] TORABI F, WARNELL G, STONE P. Recent advances in imitation learning from observation[EB/OL].(2019-06-19)[2025-04-21]. https://arxiv.org/abs/1905.13566
[16] OSA T, PAJARINEN J, NEUMANN G, et al. An algorithmic perspective on imitation learning[J]. Foundations and Trends in Robotics, 2018, 7(1/2): 1-179.
[17] PATACCHIOLA M, SUN M F, HOFMANN K, et al. Comparing the efficacy of fine-tuning and meta-learning for few-shot policy imitation[EB/OL].(2023-06-23)[2025-04-21]. https://arxiv.org/abs/2306.13554
[18] DE HAAN P, JAYARAMAN D, LEVINE S. Causal confusion in imitation learning[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2019: 11698-11709.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed