Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A COMPUTER-IMPLEMENTED METHOD FOR FORECASTING TRAFFIC INDICATORS IN A ROAD NETWORK
Document Type and Number:
WIPO Patent Application WO/2024/068527
Kind Code:
A1
Abstract:
In order to improve traffic indicator prediction and route planning, a forecasting system (10) is proposed. The forecasting system (10) comprises a sensor means or database means that obtains for each link (L1, L2, …) and/or node (N1, N2, …) of the road network (100) traffic data and external factor data. The system (10) has a feature convolution aggregator (12) means that is configured to generate aggregated data from the traffic data and the external factor data. The aggregated data are fed to a first machine learning model (14) that includes at least one graph attention network (16) layer that is configured to determine spatial dependency output data (18) that is indicative of the spatial dependency of the aggregated data based on the aggregated data. The output of the first machine learning model (14) is fed to a second machine learning model (20) that includes an encoder-decoder style informer layer (22) that is trained to determine traffic indicator data for each link (L1, L2, …) and/or node (N1, N2, …) based on the spatial dependency output data (18), wherein the informer layer (22) includes at least one encoder (24) and at least one decoder (26), wherein the encoder (24) determines a concatenated feature map (30) based on the spatial dependency output data (18), wherein the concatenated feature map (30) is fed into each decoder (26). A fully connected output layer generates the traffic indicator timeseries for each link (L1, L2, …) and/or node (N1, N2, …) from the traffic indicator data.

Inventors:
RONG PROF SU (SG)
LUO RUIKANG (SG)
SONG YAOFENG (SG)
HUANG DR LIPING (SG)
Application Number:
PCT/EP2023/076362
Publication Date:
April 04, 2024
Filing Date:
September 25, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CONTINENTAL AUTOMOTIVE TECH GMBH (DE)
UNIV NANYANG TECH (SG)
International Classes:
G01C21/36; G01C21/34; G06N3/042; G06N3/0455; G06N3/0464; G08G1/01; G08G1/052; G08G1/065; G08G1/0968
Foreign References:
US20120109519A12012-05-03
Other References:
RUIKANG LUO ET AL: "AST-GIN: Attribute-Augmented Spatial-Temporal Graph Informer Network for Electric Vehicle Charging Station Availability Forecasting", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 September 2022 (2022-09-07), XP091312631
HAOYI ZHOU ET AL: "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 December 2020 (2020-12-14), XP081837541
BUI KHAC-HOAI NAM ET AL: "Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues", APPLIED INTELLIGENCE, KLUWER ACADEMIC PUBLISHERS, DORDRECHT, NL, vol. 52, no. 3, 21 June 2021 (2021-06-21), pages 2763 - 2774, XP037687650, ISSN: 0924-669X, [retrieved on 20210621], DOI: 10.1007/S10489-021-02587-W
Y. XIONGJ. GANB. ANC. MIAOA. L. BAZZAN: "Optimal electric vehicle fast charging station placement based on game theoretical framework", IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, vol. 19, no. 8, 2017, pages 2493 - 2504, XP011688031, DOI: 10.1109/TITS.2017.2754382
J. YANGH. CHENY. XUZ. SHIR. LUOL. XIER. SU: "2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV", 2020, IEEE, article "Traffic signal transition time prediction based on aerial captures during peak hours", pages: 111 - 117
C. MENGX. YIL. SUJ. GAOY. ZHENG: "City-wide traffic volume inference with loop detector data and taxi trajectories", PROCEEDINGS OF THE 25TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, 2017, pages 1 - 10
A. THEOFILATOS: "Incorporating real-time traffic and weather data to explore road accident likelihood and severity in urban arterials", JOURNAL OF SAFETY RESEARCH, vol. 61, 2017, pages 9 - 21, XP085009949, DOI: 10.1016/j.jsr.2017.02.003
S. SIAMI-NAMININ. TAVAKOLIA. S. NAMIN: "2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA", 2018, IEEE, article "A comparison of arima and Istm in forecasting time series", pages: 1394 - 1401
Y. LIR. YUC. SHAHABIY. LIU: "Diffusion convolutional recurrent neural network: Data-driven traffic forecasting", ARXIV PREPRINT ARXIV: 1707.01926, 2017
M. S. AHMEDA. R. COOK, ANALYSIS OF FREEWAY TRAFFIC TIME-SERIES DATA BY USING BOX-JENKINS TECHNIQUES, no. 722, 1979
M. M. HAMEDH. R. AL-MASAEIDZ. M. B. SAID: "Short-term prediction of traffic volume in urban arterials", JOURNAL OF TRANSPORTATION ENGINEERING, vol. 121, no. 3, 1995, pages 249 - 254
I. OKUTANIY. J. STEPHANEDES: "Dynamic prediction of traffic volume through kalman filtering theory", TRANSPORTATION RESEARCH PART B: METHODOLOGICAL, vol. 18, no. 1, 1984, pages 1 - 11
Z. LVJ. XUK. ZHENGH. YINP. ZHAOX. ZHOU: "Lc-rnn: A deep learning model for traffic speed prediction", IJCAI, 2018, pages 3470 - 3476, XP055703939, DOI: 10.24963/ijcai.2018/482
D. ZHANGM. R. KABUKA: "Combining weather condition data to predict traffic flow: a gru-based deep learning approach", IET INTELLIGENT TRANSPORT SYSTEMS, vol. 12, no. 7, 2018, pages 578 - 585
J. ZHANGY. ZHENGD. QI: "Deep spatio-temporal residual networks for citywide crowd flows prediction", THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017
X. CAOY. ZHONGY. ZHOUJ. WANGC. ZHUW. ZHANG: "Interactive temporal recurrent convolution network for traffic prediction in data centers", IEEE ACCESS, vol. 6, 2017, pages 5276 - 5289, XP055639283, DOI: 10.1109/ACCESS.2017.2787696
J. T. CONNORR. D. MARTINL. E. ATLAS: "Recurrent neural networks and robust time series prediction", IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 5, no. 2, 1994, pages 240 - 254
Y. LVY. DUANW. KANGZ. LIF.-Y. WANG: "Traffic flow prediction with big data: a deep learning approach", IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, vol. 16, no. 2, 2014, pages 865 - 873, XP011577011, DOI: 10.1109/TITS.2014.2345663
H. ZHAOH. YANGY. WANGD. WANGR. SU: "2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC", 2020, IEEE, article "Attention based graph bi-lstm networks for traffic forecasting", pages: 1 - 6
P. WEIY. CAOD. SUN: "Total unimodularity and decomposition method for large-scale air traffic cell transmission model", TRANSPORTATION RESEARCH PART B: METHODOLOGICAL, vol. 53, 2013, pages 1 - 16, XP028546669, DOI: 10.1016/j.trb.2013.03.004
X.-Y. XUJ. LIUH.-Y. LIJ.-Q. HU: "Analysis of subway station capacity with the use of queueing theory", TRANSPORTATION RESEARCH PART C: EMERGING TECHNOLOGIES, vol. 38, 2014, pages 28 - 43, XP028817638, DOI: 10.1016/j.trc.2013.10.010
F.-F. XUZ.-C. HEZ. SHA: "Impacts of traffic management measures on urban network microscopic fundamental diagram", JOURNAL OF TRANSPORTATION SYSTEMS ENGINEERING AND INFORMATION TECHNOLOGY, vol. 13, no. 2, 2013, pages 185 - 190
D. SILVERA. HUANGC. J. MADDISONA. GUEZL. SIFREG. VAN DEN DRIESSCHEJ. SCHRITTWIESERI. ANTONOGLOUV. PANNEERSHELVAMM. LANCTOT ET AL.: "Mastering the game of go with deep neural networks and tree search", NATURE, vol. 529, no. 7587, 2016, pages 484 - 489
Z.-S. YAOC.-F. SHAOY.-L. GAO: "Research on methods of shortterm traffic forecasting based on support vector regression [j", JOURNAL OF BEIJING JIAOTONG UNIVERSITY, vol. 30, no. 3, 2006, pages 19 - 22
X. LUOD. LIS. ZHANG: "Traffic flow prediction during the holidays based on dft and svr", JOURNAL OF SENSORS, vol. 2019, 2019
H. SUNC. ZHANGB. RAN: "Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749", 2004, IEEE, article "Interval prediction for traffic time series using local linear predictor", pages: 410 - 415
G. DUDEK: "Pattern-based local linear regression models for short-term load forecasting", ELECTRIC POWER SYSTEMS RESEARCH, vol. 130, 2016, pages 139 - 147, XP029332795, DOI: 10.1016/j.epsr.2015.09.001
W. HUANGG. SONGH. HONGK. XIE: "Deep architecture for traffic flow prediction: deep belief networks with multitask learning", IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, vol. 15, no. 5, 2014, pages 2191 - 2201, XP011559996, DOI: 10.1109/TITS.2014.2311123
S. HOCHREITERJ. SCHMIDHUBER: "Long short-term memory", NEURAL COMPUTATION, vol. 9, no. 8, 1997, pages 1735 - 1780
K. CHOB. VAN MERRIENBOERC. GULCEHRED. BAHDANAUF. BOUGARESH. SCHWENKY. BENGIO: "Learning phrase representations using rnn encoder-decoder for statistical machine translation", ARXIV PREPRINT ARXIV: 1406.1078, 2014
R. FUZ. ZHANGL. LI: "2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC", 2016, IEEE, article "Using Istm and gru neural network methods for traffic flow prediction", pages: 324 - 328
Z. CUIR. KEZ. PUY. WANG, DEEP BIDIRECTIONAL AND UNIDIRECTIONAL ISTM RECURRENT NEURAL NETWORK FOR NETWORK-WIDE TRAFFIC SPEED PREDICTION, 2018
C.-W. HUANGC.-T. CHIANGQ. LI: "2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC", 2017, IEEE, article "A study of deep learning networks on mobile traffic forecasting", pages: 1 - 6
M. DEFFERRARDX. BRESSONP. VANDERGHEYNST: "Convolutional neural networks on graphs with fast localized spectral filtering", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, vol. 29, 2016, pages 3844 - 3852
J. KEH. ZHENGH. YANGX. M. CHEN: "Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach", TRANSPORTATION RESEARCH PART C: EMERGING TECHNOLOGIES, vol. 85, 2017, pages 591 - 608
Z. WUS. PANF. CHENG. LONGC. ZHANGS. Y. PHILIP: "A comprehensive survey on graph neural networks", IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 32, no. 1, 2020, pages 4 - 24
W. JIANGJ. LUO, GRAPH NEURAL NETWORK FOR TRAFFIC FORECASTING: A SURVEY, 2021
J. ATWOODD. TOWSLEY: "Diffusion-convolutional neural networks", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2016, pages 1993 - 2001
T. N. KIPFM. WELLING, SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS, 2016
P. VELICKOVICG. CUCURULLA. CASANOVAA. ROMEROP. LIOY. BENGIO, GRAPH ATTENTION NETWORKS, 2017
Z. PANW. ZHANGY. LIANGW. ZHANGY. YUJ. ZHANGY. ZHENG: "Spatio-temporal meta learning for urban traffic prediction", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020
P. J. LIUM. SALEHE. POTB. GOODRICHR. SEPASSIL. KAISERN. SHAZEER, GENERATING WIKIPEDIA BY SUMMARIZING LONG SEQUENCES, 2018
N. PARMARA. VASWANIJ. USZKOREITL. KAISERN. SHAZEERA. KUD. TRAN: "Image transformer", INTERNATIONAL CONFERENCE ON MACHINE LEARNING. PMLR, 2018, pages 4055 - 4064
X. WANGY. MAY. WANGW. JINX. WANGJ. TANGC. JIAJ. YU: "Traffic flow prediction via spatial temporal graph neural network", PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, pages 1082 - 1092
L. CAIK. JANOWICZG. MAIB. YANR. ZHU: "Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting", TRANSACTIONS IN GIS, vol. 24, no. 3, 2020, pages 736 - 755
S. LIX. JINY. XUANX. ZHOUW. CHENY.-X. WANGX. YAN: "Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, vol. 32, 2019, pages 5243 - 5253
Y. LIJ. M. MOURA, FORECASTER: A GRAPH TRANSFORMER FOR FORECASTING SPATIAL AND TIME-DEPENDENT DATA, 2019
H. ZHOUS. ZHANGJ. PENGS. ZHANGJ. LIH. XIONGW. ZHANG: "Informer: Beyond efficient transformer for long sequence time-series forecasting", PROCEEDINGS OF AAAI, 2021
L. ZHAOY. SONGC. ZHANGY. LIUP. WANGT. LINM. DENGH. LI: "T-gcn: A temporal graph convolutional network for traffic prediction", IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, vol. 21, no. 9, 2019, pages 3848 - 3858
R. LUOY. ZHANGY. ZHOUH. CHENL. YANGJ. YANGR. SU: "2021 IEEE International Intelligent Transportation Systems Conference (ITSC", 2021, IEEE, article "Deep learning approach for long-term prediction of electric vehicle (ev) charging station availability", pages: 3334 - 3339
P. VELICKOVICG. CUCURULLA. CASANOVAA. ROMEROP. LIOY. BENGIO: "Graph attention networks", STAT, vol. 1050, 2017, pages 20
A. VASWANIN. SHAZEERN. PARMARJ. USZKOREITL. JONESA. N. GOMEZL. KAISERI. POLOSUKHIN: "Attention is all you need", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2017, pages 5998 - 6008
Y.-H. H. TSAIS. BAIM. YAMADAL.-P. MORENCYR. SALAKHUTDINOV, TRANSFORMER DISSECTION: A UNIFIED UNDERSTANDING OF TRANSFORMER'S ATTENTION VIA THE LENS OF KERNEL, 2019
T. VAN ERVENP. HARREMOS: "Renyi divergence and kullback-leibler divergence", IEEE TRANSACTIONS ON INFORMATION THEORY, vol. 60, no. 7, 2014, pages 3797 - 3820, XP011550448, DOI: 10.1109/TIT.2014.2320500
Attorney, Agent or Firm:
KASTEL PATENTANWÄLTE PARTG MBB (DE)
Download PDF:
Claims:
CLAIMS

1 . A computer-implemented method for forecasting a traffic indicator of a link (L1 , L2, ... ) and/or node (N1 , N2, ... ) of a road network (100) that is represented by road network structure data, the method comprising: a) for each link (L1 , L2, ... ) and/or node (N1 , N2, ... ) of the road network (100), obtaining traffic data that are indicative of a traffic indicator at different points in time and external factor data that are non-indicative of traffic indicators but influential on a traffic indicator; b) feeding the traffic data and the external factor data to a feature convolution aggregator (12) to obtain aggregated data; c) feeding the aggregated data and the road network structure data to a first machine learning model (14) that includes at least one graph attention network (16) layer that is configured to determine spatial dependency output data (18) that is indicative of the spatial dependency of the aggregated data; d) feeding the spatial dependency output data (18) to a second machine learning model (20) that includes an informer layer (22) that is trained to determine a traffic indicator data for each link (L1 , L2, ... ) and/or node (N1 , N2, ... ) based on the spatial dependency output data (18), wherein the informer layer (22) includes at least one encoder (24) and at least one decoder (26), wherein the encoder (24) determines a concatenated feature map (30) based on the spatial dependency output data (18), wherein the concatenated feature map (30) is fed into each decoder (26); and e) feeding the traffic indicator data to a fully connected output layer that generates a traffic indicator timeseries for each link (L1 , L2, ... ) and/or node (N1 , N2, ... ) from the traffic indicator data.

2. The method according to claim 1 , ch a racte ri zed i n th at each graph attention network (16) layer includes an attention mechanism that includes a softmax normalization layer and/or a leaky ReLU activation function.

3. The method according to any of the preceding claims, ch a racte ri zed i n th at in step d) the informer layer (22) includes a first encoder and a last encoder, wherein the output of step c) is fed to the first encoder and the output of the first encoder is fed to another encoder, wherein the concatenated feature map (30) is output by the last encoder.

4. The method according to any of the preceding claims, ch a racte ri zed i n th at in step d) the informer layer (22) includes a first decoder and a last decoder, wherein the output of the last decoder is fed to the fully connected output layer.

5. The method according to any of the preceding claims, ch a racte ri zed i n th at in step d) the spatial dependency output data (18) are partially masked in a consecutive manner and the partially consecutively masked data are fed into the decoder (26) or the first decoder (26).

6. The method according to any of the preceding claims, ch a racte ri zed i n th at in step d) each encoder (24) includes a first self-attention pyramid (27) and a second self-attention pyramid (28), wherein the second self-attention pyramid (28) has fewer inputs than the first self-attention pyramid (27), and each self-attention pyramid (27, 28) generates a semi-feature map (32, 44) that are concatenated into the concatenated feature map (30).

7. The method according to claim 6, ch a racte ri zed i n th at the second selfattention pyramid (28) has half the inputs of the first self-attention pyramid (27).

8. The method according to claim 6 or 7, ch a racte ri zed i n th at the first selfattention pyramid (27) includes at least one attention block (34, 38, 42) and at least one distilling layer (36, 40) that alternate along the data flow, wherein the last element of the first self-attention pyramid (27) is an attention block (34, 38, 42) that outputs a first semi-feature map (32).

9. The method according to any of the claims 6 to 8, ch a racte ri zed i n th at the second self-attention pyramid (28) includes at least one attention block (46, 50) and at least one distilling layer (48) that alternate along the data flow, wherein the last element of the second self-attention pyramid (28) is an attention block (34, 38, 42) that outputs a second semi-feature map (44), wherein preferably the second self- attention pyramid (28) includes fewer elements along the data flow than the first selfattention pyramid (27).

10. A computer-implemented method for planning a route of a vehicle (101 ), the method comprising: a) determining an initial route from a starting position to a destination; b) determining a route length, a route driving time and/or a route fuel consumption of the initial route and performing a method according to any of the preceding claims in order to obtain a traffic indicator for each node (N1 , N2, ... ) and/or link (L1 , L2, ... ) that is covered by the initial route; c) modifying the route length, the route driving time and/or the route fuel consumption based on the traffic indicator; d) modifying the initial route so as to improve upon route length, route driving time and/or route fuel consumption in order to obtain an alternative route; and d) generating a control signal that causes the vehicle (101 ) to inform a driver of the alternative route or that causes the vehicle (101 ) to follow the alternative route.

11 . A computer program that includes instructions that, when executed by a computer, cause the computer to perform one, some, or all steps of a method according to any of the preceding claims.

12. A computer readable storage medium or a data carrier signal that includes the computer program according to claim 11 .

13. A forecasting system (10) configured for forecasting a traffic indicator of a link (L1 , L2, ... ) and/or node (N1 , N2, ... ) within a road network (100) that is represented by road network structure data, the system comprising: a) a sensor means or database means that is configured, for each link (L1 , L2, ... ) and/or node (N1 , N2, ... ) of the road network (100), to obtain traffic data that are indicative of a traffic indicator at different points in time and external factor data that are non-indicative of traffic indicators but influential on a traffic indicator; b) a feature convolution aggregator (12) means that is configured to generate aggregated data from the traffic data and the external factor data; c) a first machine learning model (14) that includes at least one graph attention network (16) layer that is configured to determine spatial dependency output data (18) that is indicative of the spatial dependency of the aggregated data based on the aggregated data; d) a second machine learning model (20) that includes an informer layer (22) that is trained to determine traffic indicator data for each link (L1 , L2, ... ) and/or node (N1 , N2, ... ) based on the spatial dependency output data (18), wherein the informer layer (22) includes at least one encoder (24) and at least one decoder (26), wherein the encoder (24) determines a concatenated feature map (30) based on the spatial dependency output data (18), wherein the concatenated feature map (30) is fed into each decoder (26). e) a fully connected output layer that is configured to generate a traffic indicator timeseries for each link (L1 , L2, ... ) and/or node (N1 , N2, ... ) from the traffic indicator data.

14. A vehicle route planning device for a vehicle (101 ), the device being configured to plan a vehicle route and comprising a forecasting system (10) according to claim 13 that is used in vehicle route planning.

15. A vehicle (101 ), preferably a land vehicle, comprising a forecasting system (10) according to claim 13 and/or a vehicle route planning device according to claim 14.

Description:
DESCRIPTION

A computer-implemented method for forecasting traffic indicators in a road network

TECHNICAL FIELD

The invention relates to a computer-implemented method for forecasting a traffic indicator of a link and/or node in a road network.

BACKGROUND

The intelligent traffic system is a significant part of the modern city. Traffic congestion alleviation is one of the challenges faced by traffic management, traffic planning and traffic control. Traffic information usually includes flow, speed, density and facility usage conditions. Long-term accurate traffic forecasting is critical for traffic scheduling and road condition assessments, which are two major components of an intelligent traffic system. As the number of deployed data sensors steadily increases, a huge amount of real-time data can be collected to help in traffic planning. At the same time the requirements for data processing technology are also getting more and more.

However, predicting traffic has gotten more difficult as a result of its complicated spatial-temporal connections. Firstly, the shifting patterns in driving behavior are typically unknown in the temporal dimension. The traffic parameter may vary on a periodic basis and is relatively easy to describe in the long run. Traditional techniques, on the other hand, heavily rely on stationary circumstances. Sudden events, such as accidents and emergencies, can make it difficult for the system to quickly respond to changes caused by these sudden events. Secondly, the complete spatial relationships between roads and cars contribute significantly to the future state. For instance, if a traffic accident occurs on the connection, congestion will occur. Upstream vehicles may alter their path, and nearby connections may experience more diversion in subsequent stages. The hidden pattern cannot be extracted purely by temporal dependencies. Numerous existing techniques for addressing these issues may be classified into two categories: model-driven methods, which employ dynamic mathematical models to replicate driving behavior, and data-driven methods, such as the Auto-Regressive Integrated Moving Average (ARIMA) model, Kalman filtering, Gated Recurrent Unit (GRU) network and Graph Neural Network (GNN).

Among these models, recurring neural network (RNN) models have been extensively applied to train learning models from sequential data. However, RNN models treat different roads data independently without considering spatial connectivity. To learn the spatial features from the given network, some studies propose Convolutional Neural Networks (CNN) for spatial dependencies modeling, however, the performance of CNN for non-Euclidean data is typically unsatisfying. Thus, GNN- based methods are proposed recently to understand the spatial-temporal information by taking time-series road data and connections between roads. These models usually combine RNN and GNN to adapt the complicated forecasting task.

GNN-based spatial-temporal models have been proven to have better performance on some forecasting problems than pure RNN or GNN models, yet there are still some limitations. RNNs extract the temporal dependencies by traversing the information over a long recurrent chain bidirectionally, which is difficult to obtain information from distant data and also may suffer from the vanishing gradient issue. Moreover, traffic parameters do not just sequentially depend on the past and current data. The periodicity can be impacted continuously by the emergency. Thus, it is important to understand the global dependencies over time.

Additional reference is made to the following documents:

[1] Y. Xiong, J. Gan, B. An, C. Miao, and A. L. Bazzan, “Optimal electric vehicle fast charging station placement based on game theoretical framework”, IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 8, pp. 2493-2504, [2] R. Luo and R. Su, “Traffic signal transition time prediction based on aerial captures during peak hours”, in 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2020, pp. 104-110;

[3] C. Meng, X. Yi, L. Su, J. Gao, and Y. Zheng, “City-wide traffic volume inference with loop detector data and taxi trajectories”, in Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2017, pp. 1-10;

[4] A. Theofilatos, “Incorporating real-time traffic and weather data to explore road accident likelihood and severity in urban arterials”, Journal of safety research, vol. 61 , pp. 9-21 , 2017;

[5] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A comparison of arima and Istm in forecasting time series”, in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2018, pp. 1394-1401 ;

[6] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting”, arXiv preprint arXiv: 1707.01926, 2017;

[7] M. S. Ahmed and A. R. Cook, Analysis of freeway traffic time-series data by using Box-Jenkins techniques, 1979, no. 722;

[8] M. M. Hamed, H. R. Al-Masaeid, and Z. M. B. Said, “Short-term prediction of traffic volume in urban arterials", Journal of Transportation Engineering, vol. 121 , no. 3, pp. 249-254, 1995;

[9] I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic volume through kalman filtering theory”, Transportation Research Part B: Methodological, vol. 18, no. 1 , pp. 1-11 , 1984;

[10] Z. Lv, J. Xu, K. Zheng, H. Yin, P. Zhao, and X. Zhou, “Lc-rnn: A deep learning model for traffic speed prediction.”, in IJCAI, 2018, pp. 3470-3476; [11] D. Zhang and M. R. Kabuka, “Combining weather condition data to predict traffic flow: a gru-based deep learning approach", IET Intelligent Transport Systems, vol.

12, no. 7, pp. 578-585, 2018;

[12] J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual networks for citywide crowd flows prediction", in Thirty-first AAAI conference on artificial intelligence, 2017;

[13] X. Cao, Y. Zhong, Y. Zhou, J. Wang, C. Zhu, and W. Zhang, “Interactive temporal recurrent convolution network for traffic prediction in data centers", IEEE Access, vol. 6, pp. 5276-5289, 2017;

[14] J. T. Connor, R. D. Martin, and L. E. Atlas, “Recurrent neural networks and robust time series prediction", IEEE transactions on neural networks, vol. 5, no. 2, pp. 240-254, 1994;

[15] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction with big data: a deep learning approach", IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 2, pp. 865-873, 2014;

[16] H. Zhao, H. Yang, Y. Wang, D. Wang, and R. Su, “Attention based graph bi-lstm networks for traffic forecasting", in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1-6;

[17] P. Wei, Y. Cao, and D. Sun, “Total unimodularity and decomposition method for large-scale air traffic cell transmission model", Transportation Research Part B: Methodological, vol. 53, pp. 1-16, 2013;

[18] X.-y. Xu, J. Liu, H.-y. Li, and J.-Q. Hu, “Analysis of subway station capacity with the use of queueing theory", Transportation research part C: emerging technologies, vol. 38, pp. 28-43, 2014; [19] F.-F. Xu, Z.-C. He, and Z. Sha, “Impacts of traffic management measures on urban network microscopic fundamental diagram", Journal of Transportation Systems Engineering and Information Technology, vol. 13, no. 2, pp. 185-190, 2013;

[20] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search", nature, vol. 529, no. 7587, pp. 484-489, 2016;

[21] Z.-s. Yao, C.-f. Shao, and Y.-l. Gao, “Research on methods of shortterm traffic forecasting based on support vector regression [j]“, Journal of Beijing Jiaotong University, vol. 30, no. 3, pp. 19-22, 2006;

[22] X. Luo, D. Li, and S. Zhang, “Traffic flow prediction during the holidays based on dft and svr", Journal of Sensors, vol. 2019, 2019;

[23] H. Sun, C. Zhang, and B. Ran, “Interval prediction for traffic time series using local linear predictor", in Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No. 04TH8749). IEEE, 2004, pp. 410- 415;

[24] G. Dudek, “Pattern-based local linear regression models for short-term load forecasting", Electric power systems research, vol. 130, pp. 139- 147, 2016;

[25] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for traffic flow prediction: deep belief networks with multitask learning", IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 5, pp. 2191-2201 , 2014;

[26] S. Hochreiter and J. Schmidhuber, “Long short-term memory", Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997;

[27] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation", arXiv preprint arXiv: 1406.1078, 2014; [28] R. Fu, Z. Zhang, and L. Li, “Using Istm and gru neural network methods for traffic flow prediction", in 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC). IEEE, 2016, pp. 324-328;

[29] Z. Cui, R. Ke, Z. Pu, and Y. Wang, “Deep bidirectional and unidirectional Istm recurrent neural network for network-wide traffic speed prediction", arXiv preprint arXiv: 1801.02143, 2018;

[30] C.-W. Huang, C.-T. Chiang, and Q. Li, “A study of deep learning networks on mobile traffic forecasting", in 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC). IEEE, 2017, pp. 1-6;

[31] J. Yang, H. Chen, Y. Xu, Z. Shi, R. Luo, L. Xie, and R. Su, “Domain adaptation for degraded remote scene classification", in 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2020, pp. 111-117;

[32] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering", Advances in neural information processing systems, vol. 29, pp. 3844-3852, 2016;

[33] J. Ke, H. Zheng, H. Yang, and X. M. Chen, “Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach", Transportation Research Part C: Emerging Technologies, vol. 85, pp. 591-608, 2017;

[34] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks", IEEE transactions on neural networks and learning systems, vol. 32, no. 1 , pp. 4-24, 2020;

[35] W. Jiang and J. Luo, “Graph neural network for traffic forecasting: A survey", arXiv preprint arXiv:2101.11174, 2021 ; [36] J. Atwood and D. Towsley, “Diffusion-convolutional neural networks", in Advances in neural information processing systems, 2016, pp. 1993- 2001 ;

[37] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks", arXiv preprint arXiv: 1609.02907, 2016;

[38] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lib, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv: 1710.10903, 2017;

[39] Z. Pan, W. Zhang, Y. Liang, W. Zhang, Y. Yu, J. Zhang, and Y. Zheng, “Spatiotemporal meta learning for urban traffic prediction", IEEE Transactions on Knowledge and Data Engineering, 2020;

[40] P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer, “Generating Wikipedia by summarizing long sequences", arXiv preprint arXiv: 1801.10198, 2018;

[41] N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, and D. Tran, “Image transformer", in International Conference on Machine Learning. PMLR, 2018, pp. 4055-4064;

[42] X. Wang, Y. Ma, Y. Wang, W. Jin, X. Wang, J. Tang, C. Jia, and J. Yu, “Traffic flow prediction via spatial temporal graph neural network", in Proceedings of The Web Conference 2020, 2020, pp. 1082-1092;

[43] L. Cai, K. Janowicz, G. Mai, B. Yan, and R. Zhu, “Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting", Transactions in GIS, vol. 24, no. 3, pp. 736-755, 2020;

[44] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, and X. Yan, “Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting", Advances in Neural Information Processing Systems, vol. 32, pp. 5243- 5253, 2019; [45] Y. Li and J. M. Moura, “Forecaster: A graph transformer for forecasting spatial and time-dependent data", arXiv preprint arXiv: 1909.04019, 2019;

[46] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting", in Proceedings of AAAI, 2021 ;

[47] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li, “T-gcn: A temporal graph convolutional network for traffic prediction", IEEE Transactions on Intelligent Transportation Systems, vol. 21 , no. 9, pp. 3848-3858, 2019;

[48] R. Luo, Y. Zhang, Y. Zhou, H. Chen, L. Yang, J. Yang, and R. Su, “Deep learning approach for long-term prediction of electric vehicle (ev) charging station availability", in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 2021 , pp. 3334- 3339;

[49] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks", stat, vol. 1050, p. 20, 2017;

[50] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need", in Advances in neural information processing systems, 2017, pp. 5998-6008;

[51] Y.-H. H. Tsai, S. Bai, M. Yamada, L.-P. Morency, and R. Salakhutdinov, “Transformer dissection: A unified understanding of transformer’s attention via the lens of kernel", arXiv preprint arXiv: 1908.11775, 2019; and

[52] T. Van Erven and P. Harremos, “Renyi divergence and kullback-leibler divergence", IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797- 3820, 2014.

SUMMARY OF THE INVENTION It is the object of the invention to improve route planning in road networks. The object is achieved by the subject-matter of the independent claims. Preferred embodiments are subject-matter of the dependent claims.

The invention provides a computer-implemented method for forecasting a traffic indicator of a link and/or node of a road network that is represented by road network structure data, the method comprising: a) for each link and/or node of the road network, obtaining traffic data that are indicative of a traffic indicator at different points in time and external factor data that are non-indicative of traffic indicators but influential on a traffic indicator; b) feeding the traffic data and the external factor data to a feature convolution aggregator to obtain aggregated data; c) feeding the aggregated data and the road network structure data to a first machine learning model that includes at least one graph attention network layer that is configured to determine spatial dependency output data that is indicative of the spatial dependency of the aggregated data; d) feeding the spatial dependency output data to a second machine learning model that includes an informer layer that is trained to determine a traffic indicator data for each link based on the spatial dependency output data, wherein the informer layer includes at least one encoder and at least one decoder, wherein the encoder determines a concatenated feature map based on the spatial dependency output data, wherein the concatenated feature map is fed into each decoder; and e) feeding the traffic indicator data to a fully connected output layer that generates a traffic indicator timeseries for each link and/or node from the traffic indicator data.

Preferably, each graph attention network layer includes an attention mechanism that includes a softmax normalization layer and/or a leaky ReLU activation function.

Preferably, in step d) the informer layer includes a first encoder and a last encoder. Preferably, the output of step c) is fed to the first encoder and the output of the first encoder is fed to another encoder. Preferably, the concatenated feature map is output by the last encoder. Preferably, in step d) the informer layer includes a first decoder and a last decoder. Preferably, the output of the last decoder is fed to the fully connected output layer.

Preferably, in step d) the spatial dependency output data are partially masked in a consecutive manner and the partially consecutively masked data are fed into the decoder or the first decoder.

Preferably, in step d) each encoder includes a first self-attention pyramid and a second self-attention pyramid. Preferably, the second self-attention pyramid has fewer inputs than the first self-attention pyramid. Preferably, each self-attention pyramid generates a semi-feature map that are concatenated into the concatenated feature map.

Preferably, the second self-attention pyramid has half the inputs of the first selfattention pyramid.

Preferably, the first self-attention pyramid includes at least one attention block and at least one distilling layer that alternate along the data flow. Preferably, the last element of the attention pyramid is an attention block that outputs a first semi-feature map.

Preferably, the second self-attention pyramid includes at least one attention block and at least one distilling layer that alternate along the data flow. Preferably, the last element of the attention pyramid is an attention block that outputs a second semifeature map. Preferably, the second self-attention pyramid includes fewer elements along the data flow than the first self-attention pyramid.

The invention provides a computer-implemented method for planning a route of a vehicle, the method comprising: a) determining an initial route from a starting position to a destination; b) determining a route length, a route driving time and/or a route fuel consumption of the initial route and performing a preferred method in order to obtain a traffic indicator for each link that is covered by the initial route; c) modifying the route length, the route driving time and/or the route fuel consumption based on the traffic indicator; d) modifying the initial route so as to improve upon route length, route driving time and/or route fuel consumption in order to obtain an alternative route; and d) generating a control signal that causes the vehicle to inform a driver of the alternative route or that causes the vehicle to follow the alternative route.

The invention provides a computer program that includes instructions that, when executed by a computer, cause the computer to perform one, some, or all steps of a preferred method.

The invention provides a computer readable storage medium or a data carrier signal that includes the preferred computer program.

The invention provides a forecasting system configured for forecasting a traffic indicator of a link and/or node within a road network that is represented by road network structure data, the system comprising: a) a sensor means or database means that is configured, for each link and/or node of the road network, to obtain traffic data that are indicative of a traffic indicator at different points in time and external factor data that are non-indicative of traffic indicators but influential on a traffic indicator; b) a feature convolution aggregator means that is configured to generate aggregated data from the traffic data and the external factor data; c) a first machine learning model that includes at least one graph attention network layer that is configured to determine spatial dependency output data that is indicative of the spatial dependency of the aggregated data based on the aggregated data; d) a second machine learning model that includes an informer layer that is trained to determine a traffic indicator data for each link and/or node based on the spatial dependency output data, wherein the informer layer includes at least one encoder and at least one decoder, wherein the encoder determines a concatenated feature map based on the spatial dependency output data, wherein the concatenated feature map is fed into each decoder; and e) a fully connected output layer that is configured to generate a traffic indicator timeseries for each link and/or node from the traffic indicator data. Preferably, each graph attention network layer includes an attention mechanism that includes a softmax normalization layer and/or a leaky ReLU activation function.

Preferably, the informer layer includes a first encoder and a last encoder. Preferably, the output of the first machine learning model is fed to the first encoder and the output of the first encoder is fed to another encoder. Preferably, the concatenated feature map is output by the last encoder.

Preferably, the informer layer includes a first decoder and a last decoder. Preferably, the output of the last decoder is fed to the fully connected output layer.

Preferably, the spatial dependency output data are partially masked in a consecutive manner and the partially consecutively masked data are fed into the decoder or the first decoder.

Preferably, each encoder includes a first self-attention pyramid and a second selfattention pyramid. Preferably, the second self-attention pyramid has fewer inputs than the first self-attention pyramid. Preferably, each self-attention pyramid generates a semi-feature map that are concatenated into the concatenated feature map.

Preferably, the second self-attention pyramid has half the inputs of the first selfattention pyramid.

Preferably, the first self-attention pyramid includes at least one attention block and at least one distilling layer that alternate along the data flow. Preferably, the last element of the attention pyramid is an attention block that outputs a first semi-feature map.

Preferably, the second self-attention pyramid includes at least one attention block and at least one distilling layer that alternate along the data flow. Preferably, the last element of the attention pyramid is an attention block that outputs a second semifeature map. Preferably, the second self-attention pyramid includes fewer elements along the data flow than the first self-attention pyramid. The invention provides a vehicle route planning device for a vehicle, the device being configured to plan a vehicle route and comprising a preferred forecasting system that is used in vehicle route planning.

The invention provides a vehicle, preferably a land vehicle, comprising a preferred forecasting system and/or a preferred vehicle route planning device.

Comparing with known approaches, this disclosure proposes a novel spatial- temporal transformer-based model for forecasting long series traffic information. It combines a graph attention (GAT) layer with an informer layer, which was recently introduced for long sequence forecasting via an attention mechanism, to extract complicated topological road structure and global temporal relationships more effectively.

The spatial-temporal graph informer (STGIN) model containing the GAT layer and informer layer is proposed to better capture adjacent roads information and global temporal dependencies for long sequence traffic forecasting. Based on existing knowledge, the proposed method has the ability to attend more than sequential connection and achieve global understanding also of hidden traffic patterns.

The proposed measures were studied in an extensive analysis regarding long sequence predictive ability over different horizons. The forecasting results of the STGIN show a steady performance and much better performance than other baselines when the target sequence is longer.

Two real-world traffic speed datasets and one traffic facility usage condition dataset are deployed for performance evaluation. The results show over 85% predictive accuracy and around 10% accuracy improvement than baseline algorithms. The feasibility has been verified.

Existing traffic prediction methods can be summarized as the model-driven and data- driven methods [16], For the model-driven methods, they use stationary mathematic model, such as cell transmission model [17], queuing theory model [18], microscopic fundamental diagram model [19], to express the instantaneous traffic relationships. However, a wealth of prior knowledge is typically required to predict the traffic condition precisely. Furthermore, these methods usually are not able to take unexpected events or external factors into account.

Data-driven approaches [20], which are the basis for the disclosed methods, can be further divided into classic machine learning models and neural network-based models.

For a better understanding of the invention, four related methods: autoregressive model, RNN model, GNN model and transformer-based model, will be discussed in more detail.

The representative methods of traditional autoregressive models contain vector autoregressive (VAR), support vector regression (SVR), [21] and ARIMA models [22], VAR and SVR require stationary sequential data, and ARIMA is commonly used in the early time to process non-stationary time-series data [23] but does not perform well on long-term prediction [24],

Since deep learning models were firstly applied to forecast traffic in 2014 [25], RNN and its variant models such as LSTM networks [26] and GRU networks [27] were found popular to learn comprehensive dynamic schemes from sequential traffic data and other scenarios. In some publications, authors compared different RNN models, finding GRU performed better than LSTM in the detailed traffic forecasting problem [28],

Further, to solve the problem of capturing forward temporal dependencies, some authors proposed the stacked bidirectional and unidirectional LSTM structure [29], But the inherent limits of losing distant data information and spatial dependencies is, at the time of filing, still not solved.

To extract spatial correlations in traffic prediction, CNNs and GNNs have been widely used. Though CNNs show good performance in capturing topological dependencies in images, videos and other Euclidean data [30][31], road networks are typically in the form of non-Euclidean data. Non-Euclidean data are typically not well-processed by CNNs [32],

Some research converted network-wide traffic matrices into images for CNNs application [33], however the overall effect was inefficient. GNNs directly process the road network as a graph with edges and nodes to extract spatial dependencies [34][35], As the recent development, it can be categorized into Diffusion Graph Convolution (DGC) [36], Graph Convolutional Network (GCN) [37] and Graph Attention Network (GAT) [38],

Among them, GAT provides a good insight to capture spatial information and commonly combined with different temporal extraction layers to form spatial-temporal models under various scenarios. For example, ST-MetaNet + [39] combines GAT and RNN to predict traffic speed information with fixed relationship between road nodes.

DGC models the spatial dependencies as a diffusion process to further adapt to real road conditions. To learn the real road network connections, the attention mechanism is used in GAT to understand the real road network connections instead of predefining relative weights between two nodes. But it still has the limitation that the proposed temporal attention mechanism is typically unable to fully consider the complex realities of the road.

Recently, transformer with self-attention mechanism has been widely concerned as a new deep learning architecture for sequence modeling and verified more effective than RNNs in some scenarios. It applies attention mechanism to score connections between each input and output positions pair without information loss [40][41 ].

Several studies have proposed to utilize Transformer for time-series forecasting problem [42][43], Specifically, in this work [44], authors added a convolutional selfattention layer to enhance performance on local contexts and proved effectiveness in long-term dependencies extraction. However, it ignored spatial dependencies of the road structure and training data requirement is huge. Authors have also proposed the idea of transformer to deal with taxi ride-hailing forecasting problems by using sparse linear layers to better capture spatial dependencies [45], Lately, in this work [46], authors proposed a novel transformer-based architecture, called informer, to address long sequence time-series forecasting problem by using ProbSparse Self-attention. The model has been verified on diverse datasets and achieved better performance on prediction accuracy and time complexity than other transformer variants. But similarly, the spatial dependencies extraction was ignored, and network-wide traffic forecasting cases have not been verified.

In contrast the method disclosed herein, is based on a novel spatial-temporal transformer-based network to explicitly model the spatial correlations and the periodicity of the traffic network.

The purpose of this disclosure is to predict the traffic information in the long-term based on historical records and topological road network data. Traffic information on roads is a general concept and could be link volume, vehicle density, link speed, and facility usage status. These are collectively called traffic indicators herein.

The road network structure can be denoted as a directed graph G = {0, R, ?!}. Each location recording the data is represented by a node combining with features, and 0 is the nodes set; R is edges set connecting nodes in the road graph/road network; the adjacency matrix A e R N * N , which represents paired nodes connectivity, where N is the number of nodes.

Instead of assigning elements of 0 and 1 to indicate connections between two nodes, the adjacency matrix elements are calculated by the real road distances between nodes using a Gaussian kernel weighting function [47]: i en<v - v ^ k <1 > otherwise where len(v a , v b ) is the actual road length from node v a to node v b , instead of the Euclidean distance, a is the standard deviation of road length; K is the preset threshold to filter negligible elements. The feature matrix is used to represent historical attributes of N nodes at time t on the road graph, which could be represented by X t = {xi, tl x 2 , tl , x N ,t}, where Xt e R NX:F , x a e R T is a-th input node features and T is the feature dimension of the historical data.

Now the problem can be treated as: given the total known E historical and current observation sequence {Xt-f+i, ... , X t } as input and the traffic network graph G constructed, we need to find a mapping function f to predict the future F time steps traffic information {P t+1 , where Y t+F designates the traffic information estimated at time t + F in the future as

Before introducing STGIN framework, appropriate data processing for the traffic data and external factors is preparatory for information aggregation. Our design is to use multiple convolutional operators treating every input link separately to aggregate the a traffic indicator profile, e.g., traffic speed profile together, with all external factors.

Correspondingly, there are multiple GAT networks that intakes the aggregated data generated by convolutional networks. This data linear transformation structure is called Feature Convolution Aggregator (FCA).

CNN shows good performance when dealing with Euclidean problem, such as image recognition, however, the real urban traffic network is a kind of typical non-Euclidean data in the form of a graphic structure [48], GAT remains the attention mechanism to indicate the importance between every two nodes.

In general, network G contains relationships among roads. The neighboring roads are likely to affect each other and share the similar attributes. For example, the traffic volume on the downstream link in the next time step is highly dependent on the current traffic volume on the upstream link. Hence, to capture the spatial dependence, GAT model is proposed to transform and propagate information as the first layer.

The GAT layer utilizes the FCA to reflect the input features into high-dimensional features over nodes of the graph to captures spatial information. The linear transformation can be expressed by a weighted matrix, W e R T '* T , which is applied over nodes. The attention coefficients can be computed by [49] e a b = att(Wx a , Wxb) (3) where is the number of output features. As stated, the coefficients indicate the correlation between every two nodes.

However, in the real traffic network, not every two links are connected. Thus, the adjacency matrix is used to select the topological neighborhood in the graph for attention coefficient computation. In this disclosure, suitable normalization (softmax) and activation function (leaky ReLU) are applied to generate the full attention mechanism: where N a is the topological neighborhood of node a in the graph, T represents transposition operation and || is the concatenation operation.

Further, the computed coefficients are used to find the final output features for every node. Similar to multi-head attention operation in transformer [50], the averaging result is served as the final output:

Acquiring global temporal dependency is a step in forecasting traffic information. Currently, neural network-based methods are frequently utilized to analyze and predict sequence data. However, the deficiencies discussed in the preceding section limit the long-term forecasting performance. Thus, a model based on transformers, the informer model, is proposed to collect global temporal information [46], Informer, like the canonical transformer model, employs an encoder-decoder architecture. In a transformer [50], each element is paired to all other input sequence elements. The purpose of this exercise is to determine the distribution of attention and the most closely connected pairings. The self-attention mechanism is characterized as executing a scaled dot-product on all nodes' queries, keys, and values and calculating attention scores for each location using a softmax function on aggregated data. The attention is calculated as follows:

A Q, K, 7) = softmax where Q e R l Q xd i n , K e R l K xd in, V e R l v xd in and d in is the dimension of the input sequence.

Let qi, ki, vi represent the i-th row in Q, K, V, then specifically, the i-th query's attention is defined [51] as:

Even though the dot-product operations are executed for every element, in most real- world scenarios, only several significant dot-product pairs determine the dominated attention. A suitable selective mechanism for dot-product pairs that can distinguish them has the potential to improve the prediction capacity.

From above derivations, the known i-th query's attention towards other keys is notated as a probability p(kj\qi). The dominant dot-product pairs trend to show the separation between the uniform distribution p(kj\qi) = — and the corresponding k query's attention probability distribution. Kullback-Leibler divergence [52] is widely utilized to evaluate the correlation between distributions p and q as:

Based on equation 9, the i-th query's sparsity measurement is defined as:

Further, since we have the bounds for each query in the keys set K we have the approximate measurement as:

If M^q^Kyis larger, the i-th query attention probability is more diverse. Therefore, it is likely to have dominating dot-product pairs. Using this measurement, ProbSparse Self-attention is applied to let each key only paired with the dominant queries:

A(Q,K,V) = softmax where Q and Q have the same size, however, Q only has the Top-u pairs under the sparsity measurement M^q^K) and the other pairs are preferably filled with zero. Till this stage, the ProbSparse self-attention has been constructed with more sensitive ability to potential dominant dot-product pairs.

The self-attention distilling operation is designed to extract dominating attention and construct a focused feature map. The distilling operation from m-th layer to (m + 1 )-th layer is defined as:

X m f +1 = MaxPool (ELU (Convld (M]^))) (13) where Convld performs a one-dimensional convolution on the time dimension, [ represents the multi-head ProbSparse self-attention operations, and ELU is an exponential linear unit. The max-pooling layer helps to reduce into its half slice in the next layer. Halving replicas can be stacked to improve the robustness. One self-attention layer is reduced at a time to ensure the aligned output dimension of the concatenated feature map, which is the hidden representative of the encoder.

For the decoder of the informer, a similar structure as the classical transformer is applied, which contains two multi-head attention layers. The first layer will take the concatenated vector of a start token Xt oken , and placeholder X f 0 as the input, Xa e .

The masked operation is to avoid auto-regressive using of the future steps. The second layer will take the feature map and outputs from the first layer as its inputs. In the end, a fully connected layer helps matching final sequence outputs. In particular, an Ltoken long input sequence ahead of the target predictive window is chosen as ^token- The length of X f 0 is exact the length of target sequence time stamps.

The decoder proposed in the informer can execute prediction by one forward inference, which further improves computational efficiency compared with a classical transformer model.

The disclosed STGIN includes 1 ) the spatial GAT layers, which capture the spatial relations among traffic road networks; and 2) informer layers, which capture the long sequence temporal dependence.

The GAT layers are utilized to propagate spatial information through the traffic road networks. Thus, the outputs of GAT layers for T historical instances, which each contains N-node slices information, are fed as the inputs of informer layers. Similarly, there are N Informer units used to process the historical sequential inputs of N-node structure. Finally, the outputs of N informer units form the final prediction result in the shape of N-node by T' future time steps. Each hidden representation for one past time stamp is expressed by a three-slice combination, actually the number is exactly the node number, d is the embedding dimension, which can be set freely.

BRIEF DESCRIPTION OF THE DRAWINGS Embodiments of the invention are described in more detail with reference to the accompanying schematic drawings.

Fig. 1 depicts an example of a road network;

Fig. 2 depicts an embodiment of a forecasting system;

Fig. 3 depicts an embodiment of a GAT layer;

Fig. 4 depicts an embodiment of an encoder-decoder informer layer;

Fig. 5 depicts an embodiment of self-attention pyramids;

Fig. 6 depicts a table of experimental results for the SZ-taxi dataset; and Fig. 7 depicts a table of experimental results for the Los-Loop dataset.

DETAILED DESCRIPTION OF EMBODIMENT

Referring to Fig. 1 , an exemplary road network 100 is depicted. The road network 100 comprises a plurality of links L1 , L2, ... , L5 that connect a plurality of nodes N1 , N2, ... , N6.

The road network 100 is used by vehicles 101 , for example. Each link L1 , L2, ... , L5 or node N1 , N2, ... , N6 can have one or more traffic indicators associated with it. Typical traffic indicators include, but are not limited to, link volume, vehicle density, link speed, or facility usage status. The term “facility” as used herein refers to infrastructure components that directly or indirectly support the usage of the link, e.g., fuel stations, charging stations, amenities, and the like.

For each link L1 , L2, ... , L5, traffic data that is indicative or influences the traffic indicator may be obtained through a variety approaches. For example, the traffic data may be measured by stationary sensors that are installed on the corresponding link L1 , L2, ... , L5, or node N1 , N2, ... , N6, such as speed sensors, vehicle counting sensors and the like. In another approach, cell phone data or data measured by the vehicles 101 using the links L1 , L2, ... , L5 or nodes N1 , N2, ... , N6 can be used as traffic data.

It should be noted that for sake of brevity and ease of understanding, the invention is described with respect to only one exemplary traffic indicator that is associated with each node N 1 , N2, ... , N6 or link L1 , L2, ... , L5. However, the ideas disclosed herein are also applicable to other traffic indicators.

Referring to Fig. 2, a forecasting system 10 is schematically depicted. The forecasting system 10 is configured for forecasting a traffic indicator, e.g., traffic speed, for each node N1 , N2, ... , N6. The forecasting system 10 is configured to take in traffic data Xt as well as external factor data ENI , N2, N6 and generate from that a predicted time series of the traffic indicator for each node N1 , N2, ... , N6.

External factor data ENI , N2, N6 are not directly indicative of the traffic indicator but have influence on the traffic indicator. External factor data ENI , N2, N6 include, but are not limited to, accident data, construction site data, facility usage, etc.

The forecasting system 10 comprises a feature convolution aggregator 12. The traffic data Xt and the external factor data ENI , N2, N6 are fed into a feature convolution aggregator 12. The feature convolution aggregator 12 aggregates the traffic data Xt together with all external factor data ENI , N2, N6.

The feature convolution aggregator 12 comprises a plurality of convolutional operators, e.g., convolutional networks. Preferably, there is at least one convolutional operator for each link L1 , L2, ... , L5. By applying the convolutional operator to the traffic data Xt and external factor data ENI , N2, N6, these get aggregated into aggregated data.

The forecasting system 10 comprises a first machine learning model 14. The first machine learning model 14 includes a graph attention network (GAT) 16 that takes in the aggregated data.

An example of a GAT 16 is depicted in Fig. 3 in more detail. As depicted, each node N1 , N2, ... , N7 has its own time series of features x a , where a designates the node N1 , N2, etc. Nodes that have a spatial distance from each other that is larger than the cutoff K are disregarded, i.e. in the adjacency matrix Aab the entry for those nodes is set to zero, like the nodes N6 and N7. Likewise, the attention e a b given to these nodes is equally zero. The remaining nodes receive an entry into the adjacency matrix according to equation (1 ) and an attention a ab according to equation (4). The GAT 16 determines spatial dependency output data 18 according to equation (5). The spatial dependency output data 18 include abstract feature data for each node N1 , N2, N5 (as N6 and N7 are disregarded). Due to the attention mechanism of the GAT 16, the influence of topologically neighboring nodes is included in the spatial dependency output data 18. As a result, the spatial dependency output data 18 are indicative of the spatial dependency of the aggregated data.

Referring back to Fig. 2, the forecast system 10 comprises a second machine learning model 20. The second machine learning model 20 includes at least one informer layer 22., preferably one for each node N1 , N2, ... Each informer layer 22 comprises at least one encoder 24 and at least one decoder 26.

Referring to Fig. 4 for more detail, the encoder 24 comprises a first self-attention pyramid 27 and a second self-attention pyramid 28 that has half the inputs of the first self-attention pyramid 27. Both the self-attention pyramids 27, 28 are configured as Multi-head ProbSparse self-attention pyramids. The output of the self-attention pyramids 27, 28 is concatenated into a concatenated feature map 30. The ProbSparse self-attention is determined by equations (8) to (12).

Referring to Fig. 5 the self-attention pyramids 27, 28 are described in more detail. The first self-attention pyramid 27 is fed with the full length spatial dependency output data 18 and generates a first semi-feature map 32.

The first self-attention pyramid 27 comprises a first attention block 34 that receives the spatial dependency output data 18 and determines the attention according to the ProbSparse self-attention mechanism. The output of the first attention block 34 is fed into a first distilling layer 36. The first distilling layer 36 performs a one-dimensional convolution operation along the time dimension and a max-pooling operation, such that the length of the input data is halved.

The output of the first distilling layer 36 is preferably fed to a second attention block 38 and thereafter to a second distilling layer 40. The output of the second distilling layer 40 may be fed to a third attention block 42. The further attention and distilling layers 38-42 are configured similarly to the first attention and distilling layers 34, 36, respectively.

The output of the third attention block 42 is the first semi-feature map 32.

The second self-attention pyramid 28 is fed with half length spatial dependency output data 18 and generates a second semi-feature map 44.

The second self-attention pyramid 28 comprises a fourth attention block 46 that receives the halved length spatial dependency output data 18 and determines the attention according to the ProbSparse self-attention mechanism. The output of the fourth attention block 46 is fed into a third distilling layer 48. The third distilling layer 48 performs a one-dimensional convolution operation along the time dimension and a max-pooling operation, such that the length of the input data is halved.

The output of the third distilling layer 48 is preferably fed to a fifth attention block 50. The fifth attention block 50 is configured similarly to the fourth attention block 46.

The output of the fifth attention block 50 is the second semi-feature map 44.

As depicted in Fig. 4, the first and second semi-feature maps 32, 44 are combined by concatenation into the concatenated feature map 30.

Still referring to Fig. 4, the concatenated feature map 30 is fed to the decoder 26 into a multi-head attention block 52. The multi-head attention block 52 receives the output of a masked self-attention pyramid 54 that is fed with masked traffic data 56. The decoder 26 has an output layer that normalizes the decoder output so as to output traffic indicator data for each node NI , N2, ... and/or each link L1 , L2, ...

The final output of the forecasting system 10 is an expected traffic indicator timeseries that includes for each point i within a predetermined future time frame, e.g., from 30 minutes to 120 minutes in 30 minute increments a prediction of the traffic indicator Y, of each node N1 , N2, ... and/or link L1 , L2, ... in the road network 100. Th output of the forecast system 10 may be fed to a route planning system (not shown) that is known and capable of planning a route based on a route length, a route driving time and/or a route fuel consumption. In addition, the route is also determined based on the traffic indicators Y.

The route planning system may determine an initial route from a starting position to a destination and also determine a route length, a route driving time and/or a route fuel consumption of the initial route.

The route planning system may perform the method as previously described and use it to determine at least one traffic indicator for each node and/or link that belongs to the initial route.

Then, the route length, the route driving time and/or the route fuel consumption are modified based on the traffic indicator. Furthermore, the initial route is modified so as to improve upon route length, route driving time and/or route fuel consumption in order to obtain an alternative route.

Finally, a control signal is generated that causes the vehicle to inform a driver of the alternative route or that causes the vehicle to follow the alternative route.

The inventors conducted experiments to further demonstrate the predictive performance of disclosed STGIN model. The selected datasets are introduced first and then evaluation metrics are stated. Next, experiment settings and baseline models are described. In the end, the experiment results and detailed discussion are given to deeply understand the framework.

There are two real-world vehicle speed datasets utilized for our experimental evaluation: SZ-taxi dataset and Los-loop dataset, both obtainable at https. /kjithub com/lehaifenq/ 1 l / 4 ree/master/data and also via https://bigscity- hbcity-docs.readthedocs.io/en atesVuser guide/data/raw data.html#los-loop and docs.readthedocs.io/en/latest/user quide/data/raw data html sz-taxi following the respective link. SZ-taxi: This dataset was collected from Luohu District, Shenzhen for taxi vehicles. 156 major roads are selected, and taxi speed values are recorded from Jan. 1 to Jan. 31 , 2015. The records are aggregated every 5 minutes. Apart from the feature matrix, the dataset also contains the adjacency matrix, which shows the spatial dependence between 156 roads.

Los-loop: Los-loop dataset was collected by loop detectors at the highway of Los Angeles County. Among them, 207 sensors are selected, and traffic speed values are recorded from Mar. 1 to Mar. 7, 2012. Records are aggregated every 5 minutes. The corresponding adjacency matrix, which is derived by the road length between sensors, is affiliated. Proper technics, such as linear interpolation, were utilized to fill missing data.

For the disclosed STGIN model, 80% of the data was used as the training set and the rest of the data was used as the testing set. The sequence samples are generated by sliding a window size of T + T', which the T length is the input sequence and the T' length is the ground truth labeled. Since different horizons are selected for experiments, the input sequence was manually set to fit the requirement. The mean squared error (MSE) loss function is selected as the loss function. We set attention heads to 4, batch size to 32, hidden dimensions to 32, training iteration to 500. For baseline models, the settings are kept unchanged according to the papers that reported them.

There are three commonly used criteria applied here to evaluate the prediction effectiveness of the disclosed STGIN model and baselines:

(1) Root Mean Squared Error (RMSE):

RMSE is the standard deviation of the prediction errors.

(2) Mean Absolute Error (MAE):

MAE is a measure of errors between ground truths and predictive value. The smaller the value of RMSE or MAE, the better predictive performance.

(3) Accuracy:

Accuracy is the most important index that identifies the prediction precision.

In the above equations j and y' represent the ground truth traffic information and predicted value of the j-th time instant in the i-th location. M is the number of instants; N is the number of nodes; Y and Y represent the set of j and y , respectively.

Seven representative baselines containing RNN and transformer models are utilized for comparison to demonstrate the performance of the proposed model. The baselines are described as follows:

The Auto-regressive Integrated Moving Average (ARIMA) model is used to solve predictive problem by fitting time series data to estimate future data.

The Support Vector Regression (SVR) model is a typical statistical regressive method used for traffic sequence regression by doing regression based on stable assumptions.

GRU: Together with LSTM, they are two popular variants of RNN, which have been proven effective in traffic prediction problem and can alleviate the problem of gradient explosion and vanishing. GRU has a simpler structure than LSTM with faster training speed.

STGCN: STGCN combines the GCN layer and the GRU layer while doing forecasting. Transformer: The classic Transformer model with the self-attention mechanism.

GCN+Transformer: The combination of GCN and classic Transformer to extract spatial-temporal dependencies.

Informer: A new Transformer variant proposed to process long sequence prediction issue without spatial dependencies extraction.

Fig. 6 and Fig. 7 show the results for SZ-Taxi and Los-loop datasets, respectively. For each dataset comparisons among baseline models and the disclosed model, four different forecasting horizons of 15-min, 30-min, 45-min and 60-min are used to indicate the prediction performance. In general, the STGIN model according to the invention obtains the best performance under most metrics over all horizons and following features can be observed from the results:

It can be found deep neural network methods, from GRU to Informer, have better prediction performance than traditional regressive models, such as ARIMA and SVR, which further verifies the significance of temporal dependencies extraction. It is difficult for regressive methods to do the forecasting based on nonstationary and comprehensive time-series data. For instance, in 15-min forecasting task of the Los- loop dataset, the invention achieves 9.7% prediction accuracy improvement than that of ARIMA and the RMSE error of the invention is reduced by 49.7% compared with the ARIMA model.

We can observe the invention achieves the best prediction accuracy over four horizons and the performance is stable as the testing length is extended. The longest window even reaches 60-min, which has great reference value in real traffic monitoring. From the results, it is clear that the disclosed method can be utilized for both short-term and long-term prediction reliably.

To show the significance of spatial dependencies extraction when doing the forecasting, the inventors’ experiments set the internal comparison for each main temporal extraction model. For two datasets, we can observe obvious prediction accuracy improvements of the invention compared to informer, GCN+Transformer transformer and STGCN compared to GRU over all four horizons for both datasets. This may indicate that the GNN layer helps to capture spatial dependencies from sequential data and achieves a better understanding of the hidden pattern.

A case study was carried out to examine how the model according to the invention works on long sequence traffic prediction. As an example, one node of the SZ-taxi dataset was selected. One short and one long horizon (15-min and 60-min) ahead prediction results are discussed.

For the 15-min ahead case, it can be observed that the prediction performance is limited at the early stage (before 2 epochs). Most data changing trends are not reliably captured and only the central or average values of the fluctuation range are reflected. However, the STGIN according to the invention perceives the periodicity and data shift rapidly; only 2 epochs later, i.e. , at 4 epochs, the method according to the invention can catch the changing curve, even though some abrupt changes are not fitted well.

As more training epochs are executed, the prediction performance is improved continuously and the STGIN can fit the ground truth reliably well and accurate after 16 epochs.

For the 60-min ahead case, which is a challenge in real-world applications it is observed that the STGIN according to the invention still can learn and capture the features quickly, e.g., between 20 and 40 epochs. At about 80 epochs the method according to the invention can predict the variations of the traffic indicator reliably well and accurate for real world applications. The results further prove the long-term prediction capability of the disclosed model.

This disclosure proposes a novel spatial-temporal transformer-based framework called STGIN to address the challenge of long sequence traffic forecasting. The GAT and informer layers are integrated to capture traffic information spatial and temporal relationships. Through the use of the attention mechanism, the Informer layer enables the acquiring of global temporal information when processing long sequence time series data. Comparing the disclosed STGIN against other extensively used baseline models such as GRU, transformer, and other validated frameworks, the improved method demonstrates efficacy with two real-word datasets. The inventive STGIN delivers a better prediction performance over a range of horizons and datasets while maintaining a high degree of stability. Additionally, the capacity for long-term forecasting has been shown. In summary, the disclosed STGIN model successfully completes long-term accurate traffic forecasting by collecting spatial- temporal characteristics, indicating that it has a great promise for real-world traffic parameters forecasting applications, e.g., to suggest a potential traffic flow trend, or provide key parameters towards better traffic control system.

REFERENCE SIGNS

10 forecasting system

12 feature convolution aggregator

14 first machine learning model

16 graph attention network (GAT)

18 spatial dependency output data

20 second machine learning model

22 informer layer

24 encoder

26 decoder

27 first self-attention pyramid

28 second self-attention pyramid

30 concatenated feature map

32 first semi-feature map

34 first attention block

36 first distilling layer

38 second attention block

40 second distilling layer

42 third attention block

44 second semi-feature map

46 fourth attention block

48 third distilling layer

50 fifth attention block

52 multi-head attention block

54 masked self-attention pyramid

56 masked traffic data

100 road network

101 vehicle

L1 , L2, ... link

N1 , N2, ... node