您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(工学版)》

山东大学学报 (工学版) ›› 2025, Vol. 55 ›› Issue (3): 34-45.doi: 10.6040/j.issn.1672-3961.0.2024.065

• 交通运输工程——智慧交通专题 • 上一篇    

基于分层多智能体强化学习的个性化与信号控制联合路径引导方法

高君健,廖祝华*,刘毅志,赵肄江   

  1. 湖南科技大学计算机科学与工程学院, 湖南 湘潭411201
  • 发布日期:2025-06-05
  • 作者简介:高君健(1999— ),男,湖南益阳人,硕士研究生,主要研究方向为智慧交通. E-mail:junjiangao@hnust.edu.cn. *通信作者简介:廖祝华(1977— ),男,湖南株洲人,副教授,硕士生导师,博士,主要研究方向为数据挖掘、智慧交通、分布式计算. E-mail:zhliao@hnust.edu.cn
  • 基金资助:
    湖南省自然科学基金资助项目(2024JJ5163)

Hierarchical multi-agent reinforcement learning based route guidance method combining personalization and signal control

GAO Junjian, LIAO Zhuhua*, LIU Yizhi, ZHAO Yijiang   

  1. GAO Junjian, LIAO Zhuhua*, LIU Yizhi, ZHAO Yijiang(School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, Hunan, China
  • Published:2025-06-05

摘要: 为进一步缓解交通拥堵、提高道路通行能力,本研究基于分层多智能体强化学习提出一种联合个性化引导和交通信号控制的城市车辆路径引导方法:在交叉路口放置路径引导智能体和信号控制智能体,用于提供个性化路径引导策略和优化信号灯控制,平衡城市交通流量。为了克服预定义的图结构在表示动态交通状态特征时的局限性,信号控制智能体使用自适应图卷积网络挖掘同层次智能体间空间相关性;路径引导智能体结合平均场博弈,分析车辆平均动作以有效捕捉车辆之间的交互作用,实现车辆之间协调,并根据车辆的目的地为车辆提供个性化路径引导策略;为预防局部交通拥堵和交通严重不平衡,基于MAPPO(multi-agent proximal policy optimization)算法,通过集中式训练和分布式执行实现信号控制智能体之间的合作,以实现路径引导中方向的限流;基于分层强化学习方法,实现异质智能体之间信息的共享、交流以促进它们之间的协作。为验证本研究方法的效果,基于多种真实的开源交通数据集,在SUMO仿真平台上进行试验,并与多种基线方法进行比较。结果表明,本研究所提方法将车辆的平均行程时间最少缩短11.05%,平均延误时间最少减少19.90%,有效地提高了城市车辆通行效率。

关键词: 强化学习, 路径引导, 信号控制, 平均场博弈, 自适应图卷积

Abstract: To further alleviate traffic congestion and improve road network efficiency, this study proposed an urban vehicle route guidance method integrating personalized routing strategies and traffic signal control based on hierarchical multi-agent reinforcement learning(MARL). Route guidance agents and traffic signal control agents were deployed at intersections to provide personalized routing policies and optimize traffic light control, thereby balancing urban traffic flow. To overcome the limitations of predefined graph structures in representing dynamic traffic state features, the traffic signal control agents employed an adaptive graph convolutional network to autonomously capture spatial correlations among peer agents. Concurrently, the route guidance agents integrated meanfield game to analyze aggregated vehicle actions, effectively capturing inter-vehicle interactions for coordinated decision-making while delivering destination-specific routing strategies. To prevent local congestion and severe traffic imbalance, a multi-agent proximal policy optimization(MAPPO)algorithm was adopted, enabling centralized training and decentralized execution for cooperative signal control agents to implement directional flow restriction. A hierarchical reinforcement learning framework facilitated information sharing and collaboration among heterogeneous agents. Extensive experiments were conducted on the SUMO simulation platform using multiple real-world open-source traffic datasets, with comparisons against baseline methods. Results demonstrated that the proposed method reduced average travel time by at least 11.05% and decreased average delay time by at least 19.90%, significantly enhancing urban traffic efficiency.

Key words: reinforcement learning, route guidance, signal control, mean field game, adaptive graph convolution

中图分类号: 

  • U121
[1] ZHOU B, SONG Q, ZHAO Z, et al. A reinforcement learning scheme for the equilibrium of the in-vehicle route choice problem based on congestion game[J]. Applied Mathematics and Computation, 2020, 371: 124895.
[2] 周晓昕, 廖祝华, 刘毅志, 等. 融合历史与当前交通流量的信号控制方法[J]. 山东大学学报(工学版), 2023, 53(4): 48-55. ZHOU Xiaoxin, LIAO Zhuhua, LIU Yizhi, et al. Signal control method integrating history and current traffic flow[J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 48-55.
[3] TANG C, HU W, HU S, et al. Urban traffic route guidance method with high adaptive learning ability under diverse traffic scenarios[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(5): 2956-2968.
[4] WEI H, ZHENG G, YAO H, et al. IntelliLight: a reinforcement learning approach for intelligent traffic light control[C] //Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK: Association for Computing Mac-hinery, 2018: 2496-2505.
[5] VEZHNEVETS A S, OSINDERO S, SCHAUL T, et al. FeUdal networks for hierarchical reinforcement learning[C] //Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR, 2017: 3540-3549.
[6] 吴黎兵, 范静, 聂雷, 等. 一种车联网环境下的城市车辆协同选路方法[J]. 计算机学报, 2017, 40(7): 1600-1613. WU Libing, FAN Jing, NIE Lei, et al. A collaborative routing method with internet of vehicles for city cars[J]. Chinese Journal of Computers, 2017, 40(7): 1600-1613.
[7] HALL R W. The fastest path through a network with random time-dependent travel times[J]. Transportation Science, 1986, 20(3): 182-188.
[8] MAO C, SHEN Z. A reinforcement learning framework for the adaptive routing problem in stochastic time-dependent network[J]. Transportation Research Part C: Emerging Technologies, 2018, 93: 179-197.
[9] KOH S, ZHOU B, FANG H, et al. Real-time deep reinforcement learning based vehicle navigation[J]. Applied Soft Computing, 2020, 96: 106694.
[10] ARASTEH F, SHEIKHGARGAR S, PAPAGELIS M. Network-aware multi-agent reinforcement learning for the vehicle navigation problem[C] //Proceedings of the 30th International Conference on Advances in Geographic Information Systems. Seattle, USA: Association for Computing Machinery, 2022: 1-4.
[11] SHOU Z, CHEN X, FU Y, et al. Multi-agent reinforcement learning for Markov routing games: a new modeling paradigm for dynamic traffic assignment[J]. Transportation Research Part C: Emerging Techn-ologies, 2022, 137: 103560.
[12] WANG Y, XU T, NIU X, et al. STMARL: a spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control[J]. IEEE Transactions on Mobile Computing, 2020, 21(6): 2228-2242.
[13] VARAIYA P. Max pressure control of a network of signalized intersections[J]. Transportation Research Part C: Emerging Technologies, 2013, 36: 177-195.
[14] WEI H, CHEN C, ZHENG G, et al. PressLight: learning max pressure control to coordinate traffic signals in arterial network[C] //Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Anchorage, USA: Association for Computing Machinery, 2019: 1290-1298.
[15] CHU T, WANG J, CODECÀ L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Trans-portation Systems, 2019, 21(3): 1086-1095.
[16] WEI H, XU N, ZHANG H, et al. CoLight: learning network-level cooperation for traffic signal control[C] //Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing, China: Association for Computing Machinery, 2019: 1913-1922.
[17] ZHU H, WANG Z, YANG F, et al. Intelligent traffic network control in the era of internet of vehicles[J]. IEEE Transactions on Vehicular Technology, 2021, 70(10): 9787-9802.
[18] SUN Q, ZHANG L, YU H, et al. Hierarchical reinforcement learning for dynamic autonomous vehicle navigation at intelligent intersections[C] //Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Long Beach, USA: Association for Computing Machinery, 2023: 4852-4861.
[19] ZHANG L, WU Q, SHEN J, et al. Expression might be enough: representing pressure and demand for reinforcement learning based traffic signal control[C] //International Conference on Machine Learning. Baltimore, USA: PMLR, 2022: 26645-26654.
[20] HUANG H, HU Z, LU Z, et al. Network-scale traffic signal control via multiagent reinforcement learning with deep spatiotemporal attentive network[J]. IEEE Transactions on Cybernetics, 2021, 53(1): 262-274.
[21] BAI L, YAO L, LI C, et al. Adaptive graph convolutional recurrent network for traffic forecasting[J]. Advances in Neural Information Processing Systems, 2020, 33: 17804-17815.
[22] MUKHUTDINOV D, FILCHENKOV A, SHALYTO A, et al. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system [J]. Future Generation Computer Systems, 2019, 94: 587-600.
[23] TANAKA T, NEKOUEI E, PEDRAM A R, et al. Linearly solvable mean-field traffic routing games[J]. IEEE Transactions on Automatic Control, 2020, 66(2): 880-887.
[24] YANG Y, LUO R, LI M, et al. Mean field multi-agent reinforcement learning[C] //Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018: 5567-5576.
[25] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C] //Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA: JMLR, 2016: 1995-2003.
[26] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems, 2022, 35: 24611-24624.
[27] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL].(2017-08-28)[2025-04-03]. https://doi.org/10.48550/arXiv.1707.06347
[28] LOPEZ P A, BEHRISCH M, BIEKER-WALZ L, et al. Microscopic traffic simulation using SUMO[C] //2018 21st International Conference on Intelligent Transportation Systems(ITSC). Maui, USA: IEEE, 2018: 2575-2582.
[29] SU H, ZHONG Y D, CHOW J Y J, et al. EMVLight: a multi-agent reinforcement learning framework for an emergency vehicle decentralized routing and traffic signal control system[J]. Transportation Research Part C: Emerging Technologies, 2023, 146: 103955.
[1] 陈兴国,吕咏洲,巩宇,陈耀雄. 基于贝叶斯优化的强化学习广义不动点解逼近[J]. 山东大学学报 (工学版), 2024, 54(4): 21-34.
[2] 曹宇慧,黄昱泽,冯北鹏,张淼,郭珍珍. 基于深度强化学习的物联网服务协同卸载方法[J]. 山东大学学报 (工学版), 2024, 54(1): 83-90.
[3] 张俊三,程俏俏,万瑶,朱杰,张世栋. MIRGAN: 一种基于GAN的医学影像报告生成模型[J]. 山东大学学报 (工学版), 2021, 51(2): 9-18.
[4] 常致富,周风余,王玉刚,沈冬冬,赵阳. 基于深度学习的图像自动标注方法综述[J]. 山东大学学报 (工学版), 2019, 49(6): 25-35.
[5] 沈晶,刘海波,张汝波,吴艳霞,程晓北. 基于半马尔可夫对策的多机器人分层强化学习[J]. 山东大学学报(工学版), 2010, 40(4): 1-7.
[6] 臧利林,贾磊,林忠琴 . 基于模糊逻辑交通信号优化控制算法[J]. 山东大学学报(工学版), 2006, 36(1): 41-45 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!