Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TRAINING OF A CONVOLUTIONAL NEURAL NETWORK
Document Type and Number:
WIPO Patent Application WO/2021/008798
Kind Code:
A1
Abstract:
The present invention is related to a method, a computer program code, and an apparatus for training a convolutional neural network for an autonomous driving system. The invention is further related to a convolutional neural network, to an autonomous driving system comprising a neural network, and to an autonomous or semi-autonomous vehicle comprising such an autonomous driving system. For training the convolutional neural network, in a first step real-world driving data are selected (10) as training data. Furthermore, synthetic driving data are generated (11) as training data. The convolutional neural network is then trained (12) on the selected real-world driving data and the generated synthetic driving data using a genetic algorithm.

Inventors:
GRIGORESCU SORIN MIHAI (DE)
TRASNEA BOGDAN (DE)
VASILCOI ANDREI (DE)
Application Number:
PCT/EP2020/066660
Publication Date:
January 21, 2021
Filing Date:
June 17, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ELEKTROBIT AUTOMOTIVE GMBH (DE)
International Classes:
G06N3/00; B60W60/00; G06N3/04; G06N3/08; G08G1/16; G06N7/00
Domestic Patent References:
WO2019053052A12019-03-21
Other References:
SORIN GRIGORESCU ET AL: "NeuroTrajectory: A Neuroevolutionary Approach to Local State Trajectory Learning for Autonomous Vehicles", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 June 2019 (2019-06-26), XP081384330
PREPRINT ET AL: "GridSim: A Vehicle Kinematics Engine for Deep Neuroevolutionary Control in Autonomous Driving GridSim: A Vehicle Kinematics Engine for Deep Neuroevolutionary Control in Autonomous Driving", 21 January 2019 (2019-01-21), XP055723137, Retrieved from the Internet [retrieved on 20200817]
GRIGORESCU SORIN M: "Generative One-Shot Learning (GOL): A Semi-Parametric Approach to One-Shot Learning in Autonomous Vision", 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE, 21 May 2018 (2018-05-21), pages 7127 - 7134, XP033403581, DOI: 10.1109/ICRA.2018.8461174
V. MNIH ET AL.: "Human-level control through deep reinforcement learning", NATURE, vol. 518, 2015, pages 529 - 533, XP055283401, DOI: 10.1038/nature14236
S. HOCHREITER ET AL.: "Long short-term memory", NEURAL COMPUTATION, vol. 9, 1997, pages 1735 - 1780, XP055232921, DOI: 10.1162/neco.1997.9.8.1735
B. PADEN ET AL.: "A survey of motion planning and control techniques for self-driving urban vehicles", IEEE TRANS. INTELLIGENT VEHICLES, vol. 1, 2016, pages 33 - 55, XP011617380, DOI: 10.1109/TIV.2016.2578706
Attorney, Agent or Firm:
THIES, Stephan (DE)
Download PDF:
Claims:
Patent claims

1 . A method for training a convolutional neural network (CNN) for an autonomous driving system, the method comprising:

- selecting (10) real-world driving data (X) as training data;

- generating (1 1 ) synthetic driving data (X) as training data; and

- training (12) the convolutional neural network on the selected real-world driving data (X) and the generated synthetic driving data (X) using a genetic algorithm.

2. The method according to claim 1 , wherein the training data are represented by paired sequences of occupancy grids and behavioral labels (Y,Y).

3. The method according to claim 2, wherein the behavioral labels (Y, Y) are composed of driving trajectories, steering angles and velocities.

4. The method according to claim 2 or 3, wherein the sequences of occupancy grids representing the real-world driving data (X) and the synthetic driving data (X) are processed in parallel by a set of convolutional layers before being stacked.

5. The method according to claim 4, wherein the stacked processed occupancy grids are fed to an LSTM network via a fully connected layer.

6. The method according to one of the preceding claims, wherein the synthetic driving data (X) are obtained using a generative process, which models the behavior of an ego vehicle and of other traffic participants.

7. The method according to claim 6, wherein the generative process uses a single-track kinematic model of a robot for generating artificial motion sequences of virtual agents.

8. The method according to claim 7, wherein variables controlling the behavior of the virtual agents are, for each virtual agent, the longitudinal velocity and the rate of change of the steering angle.

9. A computer program code comprising instructions, which, when executed by at least one processor, cause the at least one processor to perform the method of any of claims 1 to 8 for training a convolutional neural network (CNN) for an autonomous driving system.

10. An apparatus (20) for training a convolutional neural network (CNN) for an autonomous driving system, the apparatus (20) comprising:

- a selecting unit (22) for selecting (10) real-world driving data (X) as training data;

- a processing unit (23) for generating (1 1 ) synthetic driving data (X) as training data; and

- a training unit (24) for training (12) the convolutional neural network on the selected real-world driving data (X) and the generated synthetic driving data (X) using a genetic algorithm.

1 1 . A convolutional neural network (CNN) for an autonomous driving system, characterized in that the convolutional neural network has been trained in accordance with a method according to any of claims 1 to 8.

12. A computer program code comprising instructions, which, when executed by at least one processor, cause the at least one processor to implement the convolutional neural network (CNN) of claim 1 1 .

13. An autonomous driving system configured to select a driving strategy,

characterized in that the autonomous driving system comprises a

convolutional neural network (CNN) according to claim 1 1 .

14. An autonomous or semi-autonomous vehicle (40), characterized in that the autonomous or semi-autonomous vehicle (40) comprises an autonomous driving system according to claim 13.

Description:
Description

Training of a convolutional neural network

The present invention is related to a method, a computer program code, and an apparatus for training a convolutional neural network for an autonomous driving system. The invention is further related to a convolutional neural network, to an autonomous driving system comprising a neural network, and to an autonomous or semi-autonomous vehicle comprising such an autonomous driving system.

Learning human-like driving behaviors for autonomous cars is still an open challenge. The ability of an autonomous car to steer itself in a human-like fashion has become mainstream research in the quest for autonomous driving. An autonomous vehicle is an intelligent agent which observes its environment, makes decisions and performs actions based on these decisions. In order to implement these functions, sensory input needs to be mapped to control output.

The main approach for controlling autonomous vehicles currently are so-called Perception-Planning-Action pipelines. The driving problem is divided into smaller sub-problems, where separate components are responsible for environment perception, path planning and motion control. The output of each component represents input to the following module. The vehicle's sensors are used for building a comprehensive world model of the driving environment. Although the components are relatively easy to interpret due to their modularity, they are often constructed on manually chosen rules, which are unsuitable for learning complex driving strategies.

It is to be expected that in the field of autonomous driving, the traditional modular pipeline will migrate towards deep learning approaches, where sensory data is mapped to a driving behavioral strategy using monolithic deep neural systems.

Current deep learning techniques for autonomous driving are end2end and deep reinforcement learning, where the next best driving action is estimated through learning from driving recordings, or from exploring a simulated environment, respectively.

End2end [1 ] learning directly maps raw input data to control signals. The training data, often in the form of images from a front-facing camera, is collected together with time-synchronized steering angles recorded from a human driver. A

convolutional neural network is then trained to output steering commands given input images of the road ahead. End2end systems are faced with the challenge of learning a very complex mapping in a single step. Although end2end behaves well in certain low-speed situations, a high capacity model together with a large amount of training data is required to learn corner cases with multi-path situations, such as a T-point or road intersection.

Deep Reinforcement Learning [2] is a type of machine learning algorithm, where agents are taught actions by interacting with their environment. The system does not have access to training data, but maximizes a cumulative reward, which is positive if the vehicle is able to maintain its direction without collisions, and negative otherwise. The reward is used as a pseudo label for training a deep neural network, which is then used to estimate a Q-value function approximating the next best driving action, given the current state. This is in contrast with end2end learning, where labelled training data is provided. The main challenge here is the training, since the agent has to explore its environment, usually through learning from collisions. Such systems perform well when deployed in the same simulation environment, but have a decrease in performance when ported to a real-world vehicle. This is because systems trained solely on simulated data tend to learn a biased version of the driving environment.

It is an object of the present invention to provide a solution for improving the training procedure of a convolutional neural network such that a more human-like driving behavior is achieved.

This object is achieved by a method for training a convolutional neural network for an autonomous driving system according to claim 1 , by a computer program code according to claim 9, and by an apparatus for training a convolutional neural network for an autonomous driving system according to claim 10. The dependent claims include advantageous further developments and improvements of the present principles as described below.

According to a first aspect, a method for training a convolutional neural network for an autonomous driving system comprises:

- selecting real-world driving data as training data;

- generating synthetic driving data as training data; and

- training the convolutional neural network on the selected real-world driving data and the generated synthetic driving data using a genetic algorithm.

Similarly, a computer program code comprises instructions, which, when executed by at least one processor, cause the at least one processor to train a convolutional neural network for an autonomous driving system by performing the steps of:

- selecting real-world driving data as training data;

- generating synthetic driving data as training data; and

- training the convolutional neural network on the selected real-world driving data and the generated synthetic driving data using a genetic algorithm.

The term computer has to be understood broadly. In particular, it also includes workstations and other processor-based data processing devices.

The computer program code can, for example, be made available for electronic retrieval or stored on a computer-readable storage medium.

According to a further aspect, an apparatus for training a convolutional neural network for an autonomous driving system:

- a selecting unit for selecting real-world driving data as training data;

- a processing unit for generating synthetic driving data as training data; and

- a training unit for training the convolutional neural network on the selected real-world driving data and the generated synthetic driving data using a genetic algorithm. Known techniques tend to generalize on specific driving scenarios, e.g. highway driving, or they require learning through simulations which often aren't accurate enough for real-world use cases. In addition, it is very difficult to monitor their functional safety, since these systems predict the next best driving action, without estimating the vehicle's behaviour over a longer time horizon. To address these issues, the proposed solution introduces a multi-objective neuroevolutionary approach to autonomous driving, based on the reformulation of autonomous driving as a behaviour arbitration problem for an artificial agent. During training of a population of deep neural networks, each network individual is evaluated against a multi-objective fitness vector, with the purpose of establishing the so-called Pareto front of deep nets. For the training data a hybrid approach is used, which uses both synthetic data and real-world information.

In an advantageous embodiment, the training data are represented by paired sequences of occupancy grids and behavioral labels. For example, the behavioral labels may be composed of driving trajectories, steering angles and velocities. In contrast to other approaches, the present solution replicates the way in which a human person is driving a car. For this purpose the desired behavior is encoded in a three elements fitness vector, which describes the vehicle's travel path, lateral velocity and a longitudinal speed. Using these elements as behavioral labels during training helps calculating the driving behavior in a human-like fashion during runtime.

In an advantageous embodiment, the sequences of occupancy grids representing the real-world driving data and the synthetic driving data are processed in parallel by a set of convolutional layers before being stacked. The convolutional layers decrease the raw input’s features space and thus simplify the input to the further processing units.

In an advantageous embodiment, the stacked processed occupancy grids are fed to an LSTM network (LSTM: Long short-term memory) via a fully connected layer. Details on LSTM networks can be found in [3] LSTM networks are particularly good in predicting time sequences, such as the sequence of occupancy grids.

In an advantageous embodiment, the synthetic driving data are obtained using a generative process, which models the behavior of an ego vehicle and of other traffic participants. Using a generative process for obtaining synthetic driving data enables to learn driving behaviors for corner cases, which typically appear rarely during the driving process. Preferably, the generative process uses a single-track kinematic model of a robot for generating artificial motion sequences of virtual agents. Using a single-track kinematic model helps to reduce the complexity of the kinematic model.

In an advantageous embodiment, variables controlling the behavior of the virtual agents are, for each virtual agent, the longitudinal velocity and the rate of change of the steering angle. These variables allow controlling the virtual agents such that they replicate the way in which a human person is driving a car.

Advantageously, a convolutional neural network for an autonomous driving system is trained in accordance with a method according to the invention. Such a convolutional neural network is preferably used in an autonomous driving system, which may be comprised in an autonomous or semi-autonomous vehicle, e.g. for selecting a driving strategy. The convolutional neural network may be provided as a computer program code comprising instructions, which, when executed by at least one processor, cause the at least one processor to implement the convolutional neural network.

The term computer has to be understood broadly. In particular, it also includes embedded devices and other processor-based data processing devices.

The computer program code can, for example, be made available for electronic retrieval or stored on a computer-readable storage medium.

Further features of the present invention will become apparent from the following description and the appended claims in conjunction with the figures. Figures

Fig. 1 schematically illustrates a method for training a convolutional neural

network for an autonomous driving system;

Fig. 2 schematically illustrates a first embodiment of an apparatus for training a convolutional neural network for an autonomous driving system;

Fig. 3 schematically illustrates a second embodiment of an apparatus for training a convolutional neural network for an autonomous driving system;

Fig. 4 illustrates the problem space of the proposed behavior arbitration learning;

Fig. 5 depicts a deep neural network architecture for learning optimal driving behaviors; and

Fig. 6 illustrates the overall concept of evolutionary behavior arbitration learning for autonomous driving.

Detailed description

The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various

arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure.

All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

Thus, for example, it will be appreciated by those skilled in the art that the diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure.

The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term“processor” or“controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read only memory (ROM) for storing software, random access memory (RAM), and nonvolatile storage.

Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a combination of circuit elements that performs that function or software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

Fig. 1 schematically illustrates a method for training a convolutional neural network for an autonomous driving system. In a first step real-world driving data are selected 10 as training data. Furthermore, synthetic driving data are generated 1 1 as training data. For example, the synthetic driving data may be obtained using a generative process, which models the behavior of an ego vehicle and of other traffic

participants. For this purpose, the generative process may use a single-track kinematic model of a robot for generating artificial motion sequences of virtual agents. The behavior of the virtual agents may be controlled by a number of variables, e.g. the longitudinal velocity and the rate of change of the steering angle. The convolutional neural network is then trained 12 on the selected real-world driving data and the generated synthetic driving data using a genetic algorithm. Preferably, the training data are represented by paired sequences of occupancy grids and behavioral labels. For example, the behavioral labels may be composed of driving trajectories, steering angles and velocities. Advantageously, the sequences of occupancy grids representing the real-world driving data and the synthetic driving data are processed in parallel by a set of convolutional layers before being stacked. The stacked processed occupancy grids may then be fed to an LSTM network via a fully connected layer.

Fig. 2 schematically illustrates a block diagram of a first embodiment of an apparatus 20 for training a convolutional neural network CNN for an autonomous driving system. The apparatus 20 has an input 21 for receiving data, e.g. driving data. A selecting unit 22 selects real-world driving data as training data.

Furthermore, a processing unit 23 generates synthetic driving data as training data. For example, the processing unit 23 may generate the synthetic driving data using a generative process, which models the behavior of an ego vehicle and of other traffic participants. For this purpose, the generative process may use a single-track kinematic model of a robot for generating artificial motion sequences of virtual agents. The behavior of the virtual agents may be controlled by a number of variables, e.g. the longitudinal velocity and the rate of change of the steering angle. The apparatus 20 further has a training unit 24 for training the neural network CNN on the selected real-world driving data and the generated synthetic driving data via an interface 27. For this purpose the training unit 24 preferably uses a genetic algorithm. The interface 27 may also be combined with the input 21 into a single bidirectional interface. Data generated by the apparatus 20 can likewise be stored in a local storage unit 26. Preferably, the training data are represented by paired sequences of occupancy grids and behavioral labels. For example, the behavioral labels may be composed of driving trajectories, steering angles and velocities. Advantageously, the sequences of occupancy grids representing the real-world driving data and the synthetic driving data are processed in parallel by a set of convolutional layers before being stacked. The stacked processed occupancy grids may then be fed to an LSTM network via a fully connected layer.

The selecting unit 22, the processing unit 23, and the training unit 24 may be controlled by a controller 24. A user interface 28 may be provided for enabling a user to modify settings of the selecting unit 22, the processing unit 23, the training unit 24, or the controller 25. The selecting unit 22, the processing unit 23, the training unit 24, and the controller 25 can be embodied as dedicated hardware units. Of course, they may likewise be fully or partially combined into a single unit or implemented as software running on a processor.

A block diagram of a second embodiment of an apparatus 30 for training a convolutional neural network for an autonomous driving system is illustrated in Fig. 3. The apparatus 30 comprises a processing device 31 and a memory device 32. For example, the apparatus 30 may be a computer or an electronic control unit. The memory device 32 has stored instructions that, when executed by the processing device 31 , cause the apparatus 30 to perform steps according to one of the described methods. The instructions stored in the memory device 32 thus tangibly embody a program of instructions executable by the processing device 31 to perform program steps as described herein according to the present principles. The apparatus 30 has an input 33 for receiving data. Data generated by the processing device 31 are made available via an output 34. In addition, such data may be stored in the memory device 32. The input 33 and the output 34 may be combined into a single bidirectional interface.

The processing device 31 as used herein may include one or more processing units, such as microprocessors, digital signal processors, or a combination thereof.

The local storage unit 25 and the memory device 32 may include volatile and/or non-volatile memory regions and storage devices such as hard disk drives, optical drives, and/or solid-state memories.

In the following, a more detailed description of the present approach for training a convolutional neural network for an autonomous driving system shall be given with reference to Fig. 4 to Fig. 6.

The present approach to human-like autonomous driving is to reformulate behavior arbitration as a cognitive learning task, where the robust temporal predictions of LSTMs are combined with a generative model. The generative model allows learning driving behaviors for corner cases, which typically appear rarely during the driving process.

An illustration of the problem space is shown in Fig. 4. Given a sequence of 2D occupancy grids the position of the ego-vehicle in R 2 in and the destination coordinates in occupancy grid space at time t,

the task is to learn a behavior arbitration strategy for navigating the ego-vehicle to the destination coordinates is the length of the input occupancy grids,

while is the number of time steps for which the motion of the ego-vehicle is planned. In other words, with being a coordinate in the current occupancy grid

observation a human-like navigation strategy of the ego-vehicle is sought from

any arbitrary starting point The navigation strategy is subject to

certain constraints. First, the travelled path shall be minimal.

Second, the lateral velocity, given by the rate of change of the steering angle

is minimal, that is a minimal value for Third, the forward

speed, or longitudinal velocity is maximal and bounded to an acceptable

range

The above problem can be modelled as a Markov Decision Process M = (S, A, T, L), where:

• S represents a finite set of states, where is the state of the agent at time t. To encode the location of the agent in the driving occupancy grid space at time t, is defined, which denotes an axis-aligned

discrete grid sequence in the interval centered on the ego-vehicle

positions

• A represents a finite set of behavioral action sequences allowing the agent to navigate through the environment defined by where

A is the future predicted optimal behavior the agent should perform in the next time interval A behavior is defined as a

collection of estimated trajectory set-points:

• T: S x A x S ® [0,1] is a stochastic transition function, where

describes the probability of arriving in state after performing the behavioral actions in state

L: S x A x S ® R 3 is a multi-objective fitness vector function, which quantifies the behavior of the ego-vehicle: The elements in Equation (2) are defined as:

Intuitively, represents a distance-based feedback, which is smaller if the car

follows a minimal energy trajectory to and large otherwise.

quantifies hazardous motions and passenger discomfort by summing up the lateral velocity of the vehicle. The feedback function is the moving longitudinal

velocity of the ego-vehicle, bounded to speeds appropriate for different road sectors, e.g. [80kmh, 130kmh] for the case of highway driving.

Considering the proposed behavior arbitration scheme, the goal is to train an optimal approximator, defined here by a deep network which can predict the

optimal behavior strategy of the ego-vehicle, given a sequence of occupancy grid observations and the multi-objective fitness vector from Equation 2.

For computing the behavior strategy of the ego-vehicle, a deep network has been designed, which is illustrated in Fig. 5. The occupancy grid sequences are processed by a set of convolutional layers, before feeding them to an LSTM network. In order to train the deep network on as many corner cases as possible, a training dataset has been constructed based on real-world occupancy grid samples as well as on synthetic sequences As explained in the next

section, synthetic data is generated in an occupancy grid simulation environment based on a kinematic model of simulated traffic participants. As network parameters Q, both the weights of the LSTM network as well as the weights of the convolutional layers are consider. As shown in Fig. 5, the two synthetic and real-world data streams are processed in parallel by two convolutional networks, before being stacked upon each other and fed to the LSTM network via a fully connected layer of 256 units. The first convolutional layer in each network has 48 kernel filters, with a size of 9 x 9. The second layer consists of 96 kernels of size 5 x 5. During runtime, the optimal driving behaviour strategy is computed solely from real-world occupancy grid sequences.

For training the deep network of Fig. 5, the training dataset has been composed of synthetic and real-world data occupancy grid sequences: where is an occupancy grid and vehicle behavior data

generation distribution.

The generative process uses the single-track kinematic model of a robot for generating artificial motion sequences of virtual agents [4] The single-track model, which is also known as the car-like robot, or the bicycle model, consists of two wheels connected by a rigid link. The wheels are restricted to move in a 2D plane coordinate system. In order to generate these sequences, a driving simulator has been built based on an occupancy grid sensor model. Different traffic participants, also called agents, are added within the simulation with different driving behaviors.

In the considered kinematic model, the variables controlling the behavior of the virtual agents are, for each agent, the longitudinal velocity v f , bounded to a velocity interval and the steering angle's velocity

Since the multi-objective loss vector of Equation 2 is used to quantify the response of the deep network of Fig. 5, it has been chosen to learn its weights Q via evolutionary computation, as described in the following. The training aims to compute optimal weights for a collection of deep neural networks by simultaneously optimizing the elements of the fitness vector

indicated in Equations 3 to 5.

Traditional training approaches use algorithms such as backpropagation and a scalar loss function to compute the optimal weight values of a single network. In the present approach, evolutionary computation is used to train a collection of deep networks The three functions give a quantitative measure of the network's

response, thus forming a multi-objective loss which is used to find the weights of The training procedure does not search for a fixed set of weights, or even a

finite set, but for a Pareto-optimal collection of weights where each element in

the collection represents a Pareto-optimal deep neural network

Fig. 6 illustrates the overall concept of evolutionary behavior arbitration learning for autonomous driving. The synthetic driving sequences are calculated from a generative process model, which mimics the behaviour of the ego-vehicle and of the traffic participants. The generative system is based on the single-track kinematic model of a non-holonomic robot. The model generates both synthetic occupancy grid sequences as well as behavioral labels Both the synthetic

data and the real-world driving sequences X, Y are drawn from the real-world probability distributions P(X) and P(Y). The combined data is used to evolve a neural networks population by learning their weights

using genetic algorithms. The learning aims to minimize a multi-objective

fitness vector in the multidimensional objective space L,

where each coordinate axis represents a fitness value. The best performing networks lie on the so-called Pareto front in objective space.

References

[1 ] https://en.wikipedia.org/wiki/End-to-end_reinforcement_learn ing [2] V. Mnih et al. :“Human-level control through deep reinforcement learning”,

Nature, Vol. 518 (2015), pp. 529-533.

[3] S. Hochreiter et al:“Long short-term memory”, Neural computation, Vol. 9 (1997), pp. 1735-1780.

[4] B. Paden et al.:“A survey of motion planning and control techniques for self-driving urban vehicles”, IEEE Trans. Intelligent Vehicles, Vol. 1 (2016), pp. 33-55.