基于分层多智能体强化学习的个性化与信号控制联合路径引导方法

doi:10.6040/j.issn.1672-3961.0.2024.065

摘要/Abstract

摘要： 为进一步缓解交通拥堵、提高道路通行能力,本研究基于分层多智能体强化学习提出一种联合个性化引导和交通信号控制的城市车辆路径引导方法:在交叉路口放置路径引导智能体和信号控制智能体,用于提供个性化路径引导策略和优化信号灯控制,平衡城市交通流量。为了克服预定义的图结构在表示动态交通状态特征时的局限性,信号控制智能体使用自适应图卷积网络挖掘同层次智能体间空间相关性;路径引导智能体结合平均场博弈,分析车辆平均动作以有效捕捉车辆之间的交互作用,实现车辆之间协调,并根据车辆的目的地为车辆提供个性化路径引导策略;为预防局部交通拥堵和交通严重不平衡,基于MAPPO(multi-agent proximal policy optimization)算法,通过集中式训练和分布式执行实现信号控制智能体之间的合作,以实现路径引导中方向的限流;基于分层强化学习方法,实现异质智能体之间信息的共享、交流以促进它们之间的协作。为验证本研究方法的效果,基于多种真实的开源交通数据集,在SUMO仿真平台上进行试验,并与多种基线方法进行比较。结果表明,本研究所提方法将车辆的平均行程时间最少缩短11.05%,平均延误时间最少减少19.90%,有效地提高了城市车辆通行效率。

关键词: 强化学习, 路径引导, 信号控制, 平均场博弈, 自适应图卷积

Abstract: To further alleviate traffic congestion and improve road network efficiency, this study proposed an urban vehicle route guidance method integrating personalized routing strategies and traffic signal control based on hierarchical multi-agent reinforcement learning(MARL). Route guidance agents and traffic signal control agents were deployed at intersections to provide personalized routing policies and optimize traffic light control, thereby balancing urban traffic flow. To overcome the limitations of predefined graph structures in representing dynamic traffic state features, the traffic signal control agents employed an adaptive graph convolutional network to autonomously capture spatial correlations among peer agents. Concurrently, the route guidance agents integrated meanfield game to analyze aggregated vehicle actions, effectively capturing inter-vehicle interactions for coordinated decision-making while delivering destination-specific routing strategies. To prevent local congestion and severe traffic imbalance, a multi-agent proximal policy optimization(MAPPO)algorithm was adopted, enabling centralized training and decentralized execution for cooperative signal control agents to implement directional flow restriction. A hierarchical reinforcement learning framework facilitated information sharing and collaboration among heterogeneous agents. Extensive experiments were conducted on the SUMO simulation platform using multiple real-world open-source traffic datasets, with comparisons against baseline methods. Results demonstrated that the proposed method reduced average travel time by at least 11.05% and decreased average delay time by at least 19.90%, significantly enhancing urban traffic efficiency.

Key words: reinforcement learning, route guidance, signal control, mean field game, adaptive graph convolution

中图分类号:

U121

高君健,廖祝华,刘毅志,赵肄江. 基于分层多智能体强化学习的个性化与信号控制联合路径引导方法[J]. 山东大学学报 (工学版), 2025, 55(3): 34-45.

GAO Junjian, LIAO Zhuhua, LIU Yizhi, ZHAO Yijiang. Hierarchical multi-agent reinforcement learning based route guidance method combining personalization and signal control[J]. Journal of Shandong University(Engineering Science), 2025, 55(3): 34-45.

参考文献

[1] ZHOU B, SONG Q, ZHAO Z, et al. A reinforcement learning scheme for the equilibrium of the in-vehicle route choice problem based on congestion game[J]. Applied Mathematics and Computation, 2020, 371: 124895.
[2] 周晓昕, 廖祝华, 刘毅志, 等. 融合历史与当前交通流量的信号控制方法[J]. 山东大学学报(工学版), 2023, 53(4): 48-55. ZHOU Xiaoxin, LIAO Zhuhua, LIU Yizhi, et al. Signal control method integrating history and current traffic flow[J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 48-55.
[3] TANG C, HU W, HU S, et al. Urban traffic route guidance method with high adaptive learning ability under diverse traffic scenarios[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(5): 2956-2968.
[4] WEI H, ZHENG G, YAO H, et al. IntelliLight: a reinforcement learning approach for intelligent traffic light control[C] //Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK: Association for Computing Mac-hinery, 2018: 2496-2505.
[5] VEZHNEVETS A S, OSINDERO S, SCHAUL T, et al. FeUdal networks for hierarchical reinforcement learning[C] //Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR, 2017: 3540-3549.
[6] 吴黎兵, 范静, 聂雷, 等. 一种车联网环境下的城市车辆协同选路方法[J]. 计算机学报, 2017, 40(7): 1600-1613. WU Libing, FAN Jing, NIE Lei, et al. A collaborative routing method with internet of vehicles for city cars[J]. Chinese Journal of Computers, 2017, 40(7): 1600-1613.
[7] HALL R W. The fastest path through a network with random time-dependent travel times[J]. Transportation Science, 1986, 20(3): 182-188.
[8] MAO C, SHEN Z. A reinforcement learning framework for the adaptive routing problem in stochastic time-dependent network[J]. Transportation Research Part C: Emerging Technologies, 2018, 93: 179-197.
[9] KOH S, ZHOU B, FANG H, et al. Real-time deep reinforcement learning based vehicle navigation[J]. Applied Soft Computing, 2020, 96: 106694.
[10] ARASTEH F, SHEIKHGARGAR S, PAPAGELIS M. Network-aware multi-agent reinforcement learning for the vehicle navigation problem[C] //Proceedings of the 30th International Conference on Advances in Geographic Information Systems. Seattle, USA: Association for Computing Machinery, 2022: 1-4.
[11] SHOU Z, CHEN X, FU Y, et al. Multi-agent reinforcement learning for Markov routing games: a new modeling paradigm for dynamic traffic assignment[J]. Transportation Research Part C: Emerging Techn-ologies, 2022, 137: 103560.
[12] WANG Y, XU T, NIU X, et al. STMARL: a spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control[J]. IEEE Transactions on Mobile Computing, 2020, 21(6): 2228-2242.
[13] VARAIYA P. Max pressure control of a network of signalized intersections[J]. Transportation Research Part C: Emerging Technologies, 2013, 36: 177-195.
[14] WEI H, CHEN C, ZHENG G, et al. PressLight: learning max pressure control to coordinate traffic signals in arterial network[C] //Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Anchorage, USA: Association for Computing Machinery, 2019: 1290-1298.
[15] CHU T, WANG J, CODECÀ L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Trans-portation Systems, 2019, 21(3): 1086-1095.
[16] WEI H, XU N, ZHANG H, et al. CoLight: learning network-level cooperation for traffic signal control[C] //Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing, China: Association for Computing Machinery, 2019: 1913-1922.
[17] ZHU H, WANG Z, YANG F, et al. Intelligent traffic network control in the era of internet of vehicles[J]. IEEE Transactions on Vehicular Technology, 2021, 70(10): 9787-9802.
[18] SUN Q, ZHANG L, YU H, et al. Hierarchical reinforcement learning for dynamic autonomous vehicle navigation at intelligent intersections[C] //Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Long Beach, USA: Association for Computing Machinery, 2023: 4852-4861.
[19] ZHANG L, WU Q, SHEN J, et al. Expression might be enough: representing pressure and demand for reinforcement learning based traffic signal control[C] //International Conference on Machine Learning. Baltimore, USA: PMLR, 2022: 26645-26654.
[20] HUANG H, HU Z, LU Z, et al. Network-scale traffic signal control via multiagent reinforcement learning with deep spatiotemporal attentive network[J]. IEEE Transactions on Cybernetics, 2021, 53(1): 262-274.
[21] BAI L, YAO L, LI C, et al. Adaptive graph convolutional recurrent network for traffic forecasting[J]. Advances in Neural Information Processing Systems, 2020, 33: 17804-17815.
[22] MUKHUTDINOV D, FILCHENKOV A, SHALYTO A, et al. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system [J]. Future Generation Computer Systems, 2019, 94: 587-600.
[23] TANAKA T, NEKOUEI E, PEDRAM A R, et al. Linearly solvable mean-field traffic routing games[J]. IEEE Transactions on Automatic Control, 2020, 66(2): 880-887.
[24] YANG Y, LUO R, LI M, et al. Mean field multi-agent reinforcement learning[C] //Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018: 5567-5576.
[25] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C] //Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA: JMLR, 2016: 1995-2003.
[26] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems, 2022, 35: 24611-24624.
[27] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL].(2017-08-28)[2025-04-03]. https://doi.org/10.48550/arXiv.1707.06347
[28] LOPEZ P A, BEHRISCH M, BIEKER-WALZ L, et al. Microscopic traffic simulation using SUMO[C] //2018 21st International Conference on Intelligent Transportation Systems(ITSC). Maui, USA: IEEE, 2018: 2575-2582.
[29] SU H, ZHONG Y D, CHOW J Y J, et al. EMVLight: a multi-agent reinforcement learning framework for an emergency vehicle decentralized routing and traffic signal control system[J]. Transportation Research Part C: Emerging Technologies, 2023, 146: 103955.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed