山东大学学报 (工学版) ›› 2026, Vol. 56 ›› Issue (3): 144-155.doi: 10.6040/j.issn.1672-3961.0.2025.105
韦龙,冯翔,虞慧群
WEI Long, FENG Xiang, YU Huiqun
摘要: 针对大多数经典模仿学习方法在少样本场景下因数据稀缺性导致模型训练效果不佳且泛化能力不足的问题,提出一种基于元学习的生成对抗模仿学习(meta-learning based generative adversarial imitation learning, Meta-GAIL)方法。通过引入元学习机制,策略网络能够在与目标任务具有相似特征的多样化任务中预先积累经验知识;采用生成对抗模仿学习(generative adversarial imitation learning, GAIL)算法对目标任务提供的少量示范数据进行微调,实现新任务快速适应性迁移。为验证方法有效性,在MuJoCo物理仿真平台开展系统性试验,将Meta-GAIL方法与基线算法进行对比评估。试验结果表明,Meta-GAIL方法通过有效融合元学习阶段获取的跨任务知识表征,在未见过的相似任务场景中展现出更强的快速适应能力,在少样本设定下的性能表现持续优于对比基线算法。
中图分类号:
| [1] TORABI F, WARNELL G, STONE P. Behavioral cloning from observation[C] //Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: AAAI, 2018: 4950-4957. [2] NG A Y, RUSSELL S. Algorithms for inverse reinforcement learning[C] //Proceedings of the Seven-teenth International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 2000: 663-670. [3] ZARE M, KEBRIA P M, KHOSRAVI A, et al. A survey of imitation learning: algorithms, recent developments, and challenges[J]. IEEE Transactions on Cybernetics, 2024, 54(12): 7173-7186. [4] HAKHAMANESHI K, ZHAO R H, ZHAN A, et al. Hierarchical few-shot imitation with skill transition models[EB/OL].(2022-03-10)[2025-04-21]. https://arxiv.org/abs/2107.08981 [5] CAO H Y, COHEN S N, SZPRUCH L. Identifiability in inverse reinforcement learning[EB/OL].(2021-11-08)[2025-04-21]. https://arxiv.org/abs/2106.03498 [6] ARORA S, DOSHI P. A survey of inverse reinforcement learning: challenges, methods and progress[J]. Artificial Intelligence, 2021, 297: 103500. [7] HO J, ERMON S. Generative adversarial imitation learning[C] //Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM, 2016: 4572-4580. [8] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C] //Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT, 2014: 2672-2680. [9] RAVICHANDAR H, POLYDOROS A S, CHERNOVA S, et al. Recent advances in robot learning from demonstration[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2020, 3: 297-330. [10] JENA R, LIU C, SYCARA K. Augmenting GAIL with BC for sample efficient imitation learning[EB/OL].(2020-11-09)[2025-04-21]. https://arxiv.org/abs/2001.07798 [11] BARAM N, ANSCHEL O, CASPI I, et al. End-to-end differentiable adversarial imitation learning[C] //Pro-ceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 390-399. [12] FINN C, YU T, ZHANG T, et al. One-shot visual imitation learning via meta-learning[EB/OL].(2017-09-14)[2025-04-21]. https://arxiv.org/abs/1709.04905 [13] HUISMAN M, VAN RIJN J N, PLAAT A. A survey of deep meta-learning[J]. Artificial Intelligence Review, 2021, 54(6): 4483-4541. [14] REDDY S, DRAGAN A D, LEVINE S. SQIL: imitation learning via reinforcement learning with sparse rewards[EB/OL].(2019-09-25)[2025-04-21]. https://arxiv.org/abs/1905.11108 [15] TORABI F, WARNELL G, STONE P. Recent advances in imitation learning from observation[EB/OL].(2019-06-19)[2025-04-21]. https://arxiv.org/abs/1905.13566 [16] OSA T, PAJARINEN J, NEUMANN G, et al. An algorithmic perspective on imitation learning[J]. Foundations and Trends in Robotics, 2018, 7(1/2): 1-179. [17] PATACCHIOLA M, SUN M F, HOFMANN K, et al. Comparing the efficacy of fine-tuning and meta-learning for few-shot policy imitation[EB/OL].(2023-06-23)[2025-04-21]. https://arxiv.org/abs/2306.13554 [18] DE HAAN P, JAYARAMAN D, LEVINE S. Causal confusion in imitation learning[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2019: 11698-11709. |
| [1] | 刘玉芳,王绍卿,郑顺,张丽杰,孙福振. 基于跨域元学习框架的冷启动用户表示学习方法[J]. 山东大学学报 (工学版), 2024, 54(6): 29-37. |
| [2] | 刘冬兰,刘新,刘家乐,赵鹏,常英贤,王睿,姚洪磊,罗昕. 基于分解式Transformer的联邦长期时间序列预测算法[J]. 山东大学学报 (工学版), 2024, 54(5): 101-110. |
| [3] | 李璐,张志军,范钰敏,王星,袁卫华. 面向冷启动用户的元学习与图转移学习序列推荐[J]. 山东大学学报 (工学版), 2024, 54(2): 69-79. |
| [4] | 宁春梅,孙博,肖敬先,陈廷伟. 基于时间感知注意力机制的混合编码网络方法[J]. 山东大学学报 (工学版), 2022, 52(2): 23-30. |
| [5] | 李小斌1, 李世银2. 时间序列早期分类的多分类器集成方法[J]. 山东大学学报(工学版), 2011, 41(4): 73-78. |
|
||