Modeling Car Following Behavior of Autonomous Driving Vehicles Based on Deep Reinforcement Learning
-
摘要: 为提高自动驾驶车辆的跟驰性能,减轻交通震荡干扰的负面影响,研究了1种基于深度强化学习的自动驾驶跟驰模型。在现有奖励函数设计基础上融入对能源消耗的考虑,基于VT-Micro模型构建能耗相关项;同时对使用跟车时距构建行驶效率因素相关项的方法进行优化,添加虚拟速度来避免在交通震荡场景中出现计算溢出和车间距过近的问题。为克服过往抑制震荡研究中仅用闭合环状模拟道路和仿真车辆轨迹开展训练的局限性,选用NGSIM轨迹数据中交通震荡阶段的驾驶员行为特征搭建训练环境,应用双延迟深度确定性策略梯度算法(Twin Delayed Deep Deterministic Policy Gradient Algorithm,TD3)训练形成多目标优化的跟驰模型。进一步构建模型性能测试评价体系,对比分析TD3模型与其他传统模型在跟车与交通震荡2类测试场景中的表现。跟车测试场景实验结果表明:在舒适度与行驶效率上,TD3模型和传统自适应巡航控制(Adaptive Cruise Control, ACC)模型表现相近,二者均优于人类驾驶员;在安全性上,TD3模型相较于传统ACC模型安全隐患降低53.65%,相较于人类驾驶员降低36.24%;在能源消耗上,TD3模型相较于传统ACC模型和人类驾驶员分别降低6.73%和15.65%。交通震荡场景实验结果表明:TD3模型可以有效减少交通振荡的负面影响;当TD3模型渗透率为100%时,相较于纯人类驾驶环境,行驶过程中的不适性降低55.95%,行驶效率提高8.82%,安全隐患降低73.21%,油耗减少5.97%。Abstract: In order to enhance the performance of car following behavior of autonomous vehicles and mitigate the negative effects of traffic oscillations, a deep reinforcement learning-based car following model for automated driving is investigated. The existing reward function is improved by incorporating energy consumption, and the related terms for representing energy consumption are established based on the VT-Micro model. In addition, the method of using the time gap between vehicles to establish the reward function related to driving efficiency is improved by adding virtual speed to the time gap, in order to avoid computation overflow and unrealistic short following distance in the traffic oscillation scenario. To overcome the limitations of training on closed-loop simulated roads and simulated vehicle trajectories, human driver behavior extracted from the NGSIM trajectory data during traffic oscillation are used to develop the training environment. By applying the twin delayed deep deterministic policy gradient algorithm (TD3), a multi-objective car following model is then developed. A system for evaluating model performance is established to compare the performance of the TD3 model with traditional models in car following and traffic oscillations scenarios. Study results of car following scenarios show that the TD3 model and the traditional adaptive cruise control (ACC) model perform similarly in terms of comfort and driving efficiency, but both outperform the human drivers. In terms of safety, the TD3 model reduces safety hazards by 53.65% compared to the traditional ACC model, and 36.24% compared to the human drivers. Regarding energy consumption, the TD3 model reduces the energy consumption of the conventional ACC model and human drivers by 6.73% and 15.65%, respectively. Study results show that the TD3 model can reduce the negative impacts of traffic oscillations. In the scenario with a 100% TD3 model penetration rate, driving discomfort decreases by 55.95%, driving efficiency increases by 8.82%, crash risks reduce by 73.21%, and fuel consumption drops by 5.97%, compared to a 100% human-driven environment.
-
表 1 模型超参数
Table 1. Hyperparameters of model
参数 取值 Actor网络学习率 0.000 1 Critic网络学习率 0.000 2 批量大小 512 经验池大小 50 000 折扣系数 0.95 软更新速率 0.01 Actor网络延迟更新频率 2 α0 5 α1 -120 α2 0.05 α3 0.4 α4 0.1 α5 -1.2 α6 1 α7 -0.3 t0 0.5 表 2 安全性与燃油消耗对比
Table 2. Comparison of safety and fuel consumption
渗透率/% 平均
iTTC值/s相对变化
率/%平均燃油
消耗/mL相对变化
率/%0 32.22 0 247.49 0 20 26.21 -18.65 246.18 -0.52 40 22.10 -31.41 243.68 -1.54 60 16.37 -49.19 238.77 -3.52 80 10.12 -68.59 233.18 -5.78 100 8.63 -73.21 232.71 -5.97 表 3 行驶效率与舒适度对比
Table 3. Comparison of traffic efficiency and comfort
渗透率/% 100~200 s
时平均速
度/(m/s)相对变化
率/%平均Jerk
绝对值之
和/(m/s3)相对变化
率/%0 7.59 0 51.81 0 20 7.71 1.58 45.57 -12.04 40 7.82 3.03 39.68 -23.41 60 8.06 6.19 33.35 -35.63 80 8.21 8.16 24.76 -52.21 100 8.26 8.82 22.82 -55.95 -
[1] LI X, CUI J, SHI A, et al. Stop-and-go traffic analysis: theoretical properties, environmental impacts and oscillation mitigation[J]. Transportation Research Part B: Methodological, 2014(70): 319-339. [2] ZHENG Z, AHN S, MONSERE C M. Impact of traffic oscillations on freeway crash occurrences[J]. Accident Analysis & Prevention, 2010, 42(2): 626-636. [3] GOLOB T F, RECKER W W, ALVAREZ V M. Safety aspects of freeway weaving sections[J]. Transportation Research Part A: Policy & Practice, 2004, 38(1): 35-51. [4] 韩雨, 郭延永, 张乐, 等. 消除高速公路运动波的可变限速控制方法[J]. 中国公路学报, 2022, 35(1): 151-158. doi: 10.19721/j.cnki.1001-7372.2022.01.013HAN Y, GUO Y Y, ZHANG L, et al. An optimal variable speed limit control approach against freeway jam waves[J]. China Journal of Highway and Transport, 2022, 35(1): 151-158. (in Chinese) doi: 10.19721/j.cnki.1001-7372.2022.01.013 [5] HE Z, LIANG Z, SONG L, et al. A jam-absorption driving strategy for mitigating traffic oscillations[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(4): 802-813. doi: 10.1109/TITS.2016.2587699 [6] 秦严严, 王昊, 何兆益, 等. 基于比功率的自动驾驶交通流油耗分析[J]. 交通运输系统工程与信息, 2020, 20(1): 91-96. doi: 10.16097/j.cnki.1009-6744.2020.01.014QIN Y Y, WANG H, HE Z Y, et al. Fuel consumption analysis of automated driving traffic flow based on vehicle specific power[J]. Journal of Transportation Systems Engineering and Information Technology, 2020, 20(1): 91-96. (in Chinese) doi: 10.16097/j.cnki.1009-6744.2020.01.014 [7] KESTING A, TREIBER M, SCHÖNHOF M, et al. Adaptive cruise control design for active congestion avoidance[J]. Transportation Research Part C: Emerging Technologies, 2008.16(6): 668-683. doi: 10.1016/j.trc.2007.12.004 [8] LI T N, CHEN D J, ZHAO H, et al. Car-following behavior characteristics of adaptive cruise control vehicles based on empirical experiments[J]. Transportation Research Part B: Methodological, 2021.147: 67-91. doi: 10.1016/j.trb.2021.03.003 [9] LIN X, MENG W, VAN AREM B. Realistic car-following models for microscopic simulation of adaptive and cooperative adaptive cruise control vehicles[J]. Transportation Research Record: Journal of the Transportation Research Board, 2017, 2623(1): 1-9. doi: 10.3141/2623-01 [10] ZHOU M, QU X, LI X. A recurrent neural network based microscopic car following model to predict traffic oscillation[J]. Transportation Research Part C: Emerging Technologies, 2017, 84: 245-264. doi: 10.1016/j.trc.2017.08.027 [11] HUANG X, SUN J, SUN J. A car-following model considering asymmetric driving behavior based on long short-term memory neural networks[J]. Transportation Research Part C: Emerging Technologies, 2018, 95: 346-362. doi: 10.1016/j.trc.2018.07.022 [12] MA L, QU S. A sequence to sequence learning based car-following model for multi-step predictions considering reaction delay[J]. Transportation Research Part C: Emerging Technologies, 2020, 120: 102785. doi: 10.1016/j.trc.2020.102785 [13] 朱冰, 蒋渊德, 赵健, 等. 基于深度强化学习的车辆跟驰控制[J]. 中国公路学报, 2019, 32(6): 53-60. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201906006.htmZHU B, JIANG Y D, ZHAO J, et al. A car-following control algorithm based on deep reinforcement learning[J]. China Journal of Highway and Transport, 2019, 32(6): 53-60. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201906006.htm [14] 闫浩, 刘小珠, 石英. 基于REINFORCE算法和神经网络的无人驾驶车辆变道控制[J]. 交通信息与安全, 2021, 39(1): 164-172. doi: 10.3963/j.jssn.1674-4861.2021.01.0019YAN H, LIU X Z, SHI Y. Lane-change control for unmanned vehicle based on REINFORCE algorithm and neural network[J]. Journal of Transport Information and Safety, 2021, 39(1): 164-172. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2021.01.0019 [15] 李孟凡, 秦文虎, 云中华. 基于横纵向联合控制的多目标优化车辆跟驰研究[J]. 计算机应用研究, 2022, 39(8): 2409-2413. https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202208028.htmLI M F, QIN W H, YUN Z H. Multi-objective optimal car-following model with lateral and longitudinal control[J]. ApplicationResearchofComputers, 2022, 39 (8): 2409-2413. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ202208028.htm [16] KREIDIEH A R, WU C, BAYCN A M. Dissipating stop-and-go waves in closed and open networks via deep reinforcement learning[C]. 2018 IEEE International Conference on Intelligent Transportation Systems(ITSC), Hawaii, USA: IEEE, 2018. [17] QU X, YU Y, ZHOU M, et al. Jointly dampening traffic oscillations and improving energy consumption with electric, connected and automated vehicles: A reinforcement learning based approach[J]. Applied Energy, 2020(257): 114030 [18] ZHU M X, WANG Y H, PU Z Y, et al. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving[J]. Transportation Research Part C: Emerging Technologies, 2020(117): 102662. [19] BALAS V E, BALAS M M. Driver assisting by inverse time to collision[C]. 2006 World Automation Congress, Budapest, Hungary: IEEE, 2006. [20] YAO Z H, RONG H, JIANG Y S, et al. Stability and safety evaluation of mixed traffic flow with connected automated vehicles on expressways[J]. Journal of Safety Research, 2020(75): 262-274. [21] YAO Z H, XU T R, JIANG Y S, et al. Linear stability analysis of heterogeneous traffic flow considering degradations of connected automated vehicles and reaction time[J]. Physica A: Statistical Mechanics and Its Applications, 2021(561): 125218. [22] MONTANINO M, PUNZO V. Trajectory data reconstruction and simulation-based validation against macroscopic traffic patterns[J]. Transportation Research Part B: Methodological, 2015, 80: 82-106. [23] TREIBER M, HENNECKE A, HELBING D. Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, 2000(62): 1805-1824.