Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
REDUCING ADVERSE ENVIRONMENTAL INFLUENCES IN A CAMERA IMAGE
Document Type and Number:
WIPO Patent Application WO/2023/001809
Kind Code:
A1
Abstract:
According to a computer-implemented method for training an ANN, a training image (Xa) is provided, wherein each of a set of adverse environmental influence factors is either present or absent in the training image (Xa). A set of features (z) is generated by applying a generator encoder module (Ge) to the training image (Xa). A predefined set of reference attributes (b) is provided, each specifying an intended absence of an adverse environmental influence factor. An improved training image (Xb) is generated by applying a generator decoder module (Gd) to the set of features (z) depending on the set of reference attributes (b). A discriminator module (D) is applied to the improved training image (Xb) and an adversarial loss (14a) is computed depending on an output of the discriminator module (D) for adapting the generator module (G).

Inventors:
DAS ARINDAM (IN)
HURYCH DAVID (CZ)
YOGAMANI SENTHIL KUMAR (IE)
Application Number:
PCT/EP2022/070177
Publication Date:
January 26, 2023
Filing Date:
July 19, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CONNAUGHT ELECTRONICS LTD (IE)
International Classes:
G06T11/00; G06V20/56; G06V10/774; G06V10/82; G06V10/98
Other References:
"Pattern Recognition : 5th Asian Conference, ACPR 2019, Auckland, New Zealand, November 26-29, 2019, Revised Selected Papers, Part II", vol. 12348, 3 December 2020, SPRINGER INTERNATIONAL PUBLISHING, Cham, ISBN: 978-3-030-41298-2, ISSN: 0302-9743, article ZHENG ZIQIANG ET AL: "ForkGAN: Seeing into the Rainy Night : 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part III", pages: 155 - 170, XP055978012, DOI: 10.1007/978-3-030-58580-8_10
YOO JAECHANG ET AL: "Image-To-Image Translation Using a Cross-Domain Auto-Encoder and Decoder", APPLIED SCIENCES, vol. 9, no. 22, 8 November 2019 (2019-11-08), pages 4780, XP055978217, DOI: 10.3390/app9224780
M. URICAR ET AL.: "SoilingNet: Soiling Detection on Automotive Surround-View Cameras", ARXIV:1905.01492V2, 2019
K. DE ET AL.: "Image sharpness measure for blurred images in frequency domain", PROCEDIA ENGINEERING, vol. 64, 2013, pages 149 - 158, XP028772692, DOI: 10.1016/j.proeng.2013.09.086
Attorney, Agent or Firm:
JAUREGUI URBAHN, Kristian (DE)
Download PDF:
Claims:
Claims

1. Computer-implemented method for training an artificial neural network, ANN, to reduce adverse environmental influences in a camera image (Ya), wherein the ANN comprises a generative adversarial network, GAN, which comprises a discriminator module (D) and a generator module (G), which comprises a generator encoder module (Ge) and a generator decoder module (Gd), characterized in that a predetermined training image (Xa) is provided, wherein each of a predefined set of adverse environmental influence factors is either present or absent in the training image (Xa); a set of features (z) concerning the training image (Xa) is generated by applying the generator encoder module (Ge) to the training image (Xa); a predefined set of reference attributes (b) is provided, each reference attribute specifying an intended absence of a corresponding adverse environmental influence factor; an improved training image (Xb) is generated by applying the generator decoder module (Gd) to the set of features (z) depending on the set of reference attributes (b); the discriminator module (D) is applied to the improved training image (Xb) and an adversarial loss (14a) is computed depending on an output of the discriminator module (D); and the generator module (G) is adapted depending on the adversarial loss (14a).

2. Computer-implemented method according to claim 1 , characterized in that a predefined set of training attributes (a) is provided, each training attribute specifying an actual presence or absence of the corresponding adverse environmental influence factor in the training image (Xa); the ANN comprises an attribute classificator module (C), which is applied to the training image (Xa) to generate a set of inferred attributes (a) for the training image (Xa); an attribute consistency loss (14b) is computed depending on the set of training attributes (a) and the set of inferred attributes (a); and the attribute classificator module (C) is adapted depending on the attribute consistency loss (14b).

Computer-implemented method according to claim 1 , characterized in that a predefined set of training attributes (a) is provided, each training attribute specifying an actual presence or absence of the corresponding adverse environmental influence factor in the training image (Xa); a reproduced image (Xa) is generated by applying the generator decoder module (Gd) to the set of features (z) depending on the set of training attributes (a); a cycle loss (14c) is computed depending on the training image (Xa) and the reproduced image (Xa); and the generator module (G) is adapted depending on the cycle loss (14c).

Computer-implemented method for reducing adverse environmental influences in a camera image (Ya), wherein an artificial neural network, ANN, is trained using a computer-implemented method according to one of the preceding claims and, after the training is completed, a set of features (z') concerning the camera image (Ya) is generated by applying the generator encoder module (Ge) to the camera image (Ya); a severity of the adverse environmental influences in the camera image (Ya) is determined; and an improved camera image (Yb) is generated by applying the generator decoder module (Gd) to the set of features (z') concerning the camera image (Ya) depending on the severity.

Computer-implemented method according to claim 4, characterized in that the ANN is trained using a computer-implemented method according to claim 2; a set of inferred attributes (a') for the camera image (Ya) is generated by applying the attribute classificator module (C) to the camera image (Ya); and the severity of the adverse environmental influences in the camera image (Ya) is determined depending on the set of inferred attributes (a').

Computer-implemented method according to claim 5, characterized in that the improved camera image (Yb) is generated by applying the generator decoder module (Gd) to the set of features concerning the camera image (Ya) depending on the set of inferred attributes (a'), if the severity is equal to or smaller than a predefined maximum severity; and/or the improved camera image (Yb) is generated by applying the generator decoder module (Gd) to the set of features concerning the camera image (Ya) depending on the set of reference attributes (b), if the severity in greater than the maximum severity.

Method for automatic visual perception, wherein a camera image (Ya) is generated by a camera system (6) and the presence of adverse environmental influences in the camera image (Ya) is determined by at least one computing unit (4), characterized in that a computer-implemented method for reducing the adverse environmental influences in the camera image (Ya) is carried out according to one of claims 4 to 6; an algorithm for automatic visual perception is applied to the improved camera image (Yb) by the at least one computing unit (4).

Method according to claim 7, characterized in that the algorithm for automatic visual perception comprises a perception encoder module (7), a first perception decoder module (8) and a second perception decoder module (9a, 9b); for determining the adverse environmental influences in the camera image (Ya), the perception encoder module (7) is applied to the camera image (Ya) to generate a first result and the first perception decoder module (8) is applied to the first result of the perception encoder module (7); and the perception encoder module (7) is applied to the improved camera image (Yb) to generate a second result and the second perception decoder module (9a, 9b) is applied to the second result of the perception encoder module (7). 9. Method according to one of claims 7 or 8, characterized in that an image quality measure of the improved camera image (Yb) is determined by the at least one computing unit (4); the perception encoder module (7) is applied to the improved camera image (Yb), only if the image quality measure is equal to or greater than a predefined minimum quality measure.

10. Method for guiding a vehicle (1 ) at least in part automatically, characterized in that - a method for automatic visual perception according to one of claims 7 to 9 is carried out, wherein the camera system (6) is mounted to the vehicle (1) such that the camera image (Ya) represents an environment of the vehicle (1); and at least one control signal for guiding the vehicle (1) at least in part automatically is generated by a control unit of the vehicle (1) depending on a result of applying the algorithm for automatic visual perception to the improved camera image (Yb).

11. System (2) for reducing adverse environmental influences in a camera image (Ya), characterized in that the system (2) comprises a memory device (5) storing an ANN, which has been trained using a computer-implemented method according to one of claims 1 to 3; the system (2) comprises at least one computing unit (4), which is configured to

- generate a set of features (z') concerning the camera image (Ya) by applying the generator encoder module (Ge) to the camera image (Ya);

- determine a severity of the adverse environmental influences in the camera image (Ya); and

- generate an improved camera image (Yb) by applying the generator decoder module (Gd) to the set of features (z') concerning the camera image (Ya) depending on the severity. 12. Electronic vehicle guidance system (3) for a vehicle (1), comprising a camera system (6), which is configured to generate a camera image (Ya), which represents an environment of the vehicle (1); and a control unit, which is configured to generate at least one control signal for guiding the vehicle (1) at least in part automatically, characterized in that the electronic vehicle guidance system (3) comprises a system (2) for reducing adverse environmental influences in a camera image (Ya) according to claim 11 ; the at least one computing unit (4) is configured to apply an algorithm for automatic visual perception to the improved camera image (Yb); and - the control unit is configured to generate the at least one control signal depending on a result of applying the algorithm for automatic visual perception to the improved camera image (Yb).

13. Vehicle (1) comprising an electronic vehicle guidance system (3) according to claim 12.

14. Computer program comprising instructions, wherein when the instructions are carried out by a system (2) for reducing adverse environmental influences in a camera image (Ya) according to claim 11 , the instructions cause the system (2) for reducing adverse environmental influences in a camera image (Ya) to carry out a computer-implemented method according to one of claims 1 to 6; or when the instructions are carried out by an electronic vehicle guidance system (3) according to claim 12, the instructions cause the electronic vehicle guidance system (3) to carry out a method according to claim 10.

15. Computer readable storage medium storing a computer program according to claim 14.

Description:
Reducing adverse environmental influences in a camera image

The present invention is directed to a computer-implemented method for training an artificial neural network, ANN, to reduce adverse environmental influences in a camera image, wherein the ANN comprises a generative adversarial network, GAN, which comprises a discriminator module and a generator module, wherein the generator module comprises a generator encoder module and a generator decoder module. The invention is further directed to a corresponding computer-implemented method for reducing adverse environmental influences in a camera image, to a method for automatic visual perception, to a method for guiding a vehicle at least in part automatically, to a system for reducing adverse environmental influences in a camera image, to an electronic vehicle guidance system, to a vehicle, to a computer program and to a computer readable storage medium. Systems and functions for guiding a motor vehicle automatically or in part automatically can make use of camera images from cameras mounted to the vehicle as a basis for algorithms for automatic visual perception, such as for example object detection, object tracking, semantic segmentation et cetera. The camera images may be subject to adverse environmental influences such as soiling or adverse weather influences. Such adverse environmental influences may impair the reliability or functionality of the visual perception algorithm or the guidance of the vehicle. In consequence, comfort and/or safety while driving may be reduced.

To reduce the negative influences of soiling, the camera lens may be cleaned by a corresponding cleaning system, using for example water or another liquid. However, such cleaning systems increase the complexity of the camera system, require assembly space and may be subject to degradation. Furthermore, the negative influences of adverse weather conditions are not necessarily be removable by means of cleaning. In the publication by M. Uricar et al: “SoilingNet: Soiling Detection on Automotive

Surround- View Cameras”, arXiv:1905.01492v2 (2019), a convolutional neural network based architecture is presented, which can be combined with an existing object detection task in a multi-task learning framework to detect soiling on cameras. Generative adversarial networks are used to generate additional images for data augmentation for training. The publication K. De et al: “Image sharpness measure for blurred images in frequency domain”, Procedia Engineering, vol. 64, 2013, pages 149-158 describes how to compute a frequency domain image blur measure for an image.

It is an objective of the present invention to reduce adverse environmental influences in a camera image, in particular automatically and without requiring a physical cleaning system to clean the camera.

This objective is achieved by the respective subject matter of the independent claims. Further implementations and preferred embodiments are subject matter of the dependent claims.

The invention is based on the idea to train an artificial neural network, ANN, to reduce adverse environmental influences in the camera image. Therein, based on a predefined training image, which contains adverse environmental influences, an improved training image is generated depending on a set of reference attributes by a generator module of a GAN, each reference attribute specifying an intended absence of a corresponding adverse environmental influence factor. The generator module is trained depending on the output of a discriminator module of the GAN.

According to a first aspect of the invention, a computer-implemented method for training an artificial neural network, ANN, to reduce adverse environmental influences in a camera image is provided. The ANN comprises a generative adversarial network, GAN, which comprises a discriminator module and a generator module. The generator module comprises a generator encoder module and a generator decoder module. A predetermined training image is provided, wherein each of a predefined set of adverse environmental influence factors is either present or absent in the training image. A set of features, in particular training image features, concerning the training image is generated by applying the generator encoder module to the training image. A predefined set of reference attributes is provided, wherein each reference attribute of the set of reference attributes specifies an intended absence of a corresponding adverse environmental influence factor of the predefined set of adverse environmental influence factors. An improved training image is generated by applying the generator decoder module to the set of features depending on the set of reference attributes. The discriminator module is applied to the improved training image to generate a respective output and an adversarial loss is computed depending on the output of the discriminator module. The generator module is adapted depending on the adversarial loss in order to train the ANN, in particular to train the generator module.

If not stated otherwise, all method steps of a computer-implemented method for training an ANN may be carried out by at least one training computing unit.

The adverse environmental influences in the camera image may be understood as extrinsic influences, which may potentially impair a subsequent automatic visual perception algorithm carried out with the respective camera image as an input. Adverse environmental influences may be caused by soiling of a camera lens, for example by grass, mud, dirt, oil et cetera on the camera lens, damages of the camera lens, for example cracks or dents, or adverse weather conditions, such as rain, fog, snow et cetera.

The invention treats the adverse environmental influences as a set of attributes, also denoted as attribute vector, wherein each of the attributes is related to a corresponding predefined adverse environmental influence factor. For each given camera image or training image, each of the adverse environmental influence factors is necessarily either present or absent. Therefore, a corresponding attribute vector, which may be a binary vector, can be defined for each camera image or training image. The adverse environmental influence factors may for example correspond to the presence of rain, fog or snow in the environment, the presence of water drops on the lens, the presence of opaque soiling or transparent or semi-transparent soiling on the camera lens et cetera. For a hypothetic camera image, which is not influenced at all by any of the predefined adverse environmental influence factors, all entries of its attribute vector indicates the actual absence of the corresponding factor. The set of reference attributes corresponds to such a hypothetic situation. For example, if the respective attribute is 0 in case the corresponding factor is absent and 1 in case the corresponding factor is present in the considered image, the set of reference attributes may all be equal to 0. Flowever, also different choices are possible.

According to the invention, the generator decoder module does not only receive the low dimensional representation of the training image in form of the set of features generated by the encoder module but also the set of reference attributes as an input for generating the improved training image. The discriminator module, however, does not obtain the set of reference attributes but may decide whether the improved training image matches the respective underlying distribution of a set of clean reference images without adverse environmental influences. In particular, the discriminator module may be operated in the common way as known for the training of a GAN. In particular, the discriminator module is a trainable or trained module of the GAN. However, when the method steps described above and in particular the adaption of the generator module of the GAN are carried out, the discriminator module may be already trained. In other words, the discriminator module may be a pre-trained discriminator module.

In particular, a GAN may be trained in two phases, wherein in the first phase the generator module is frozen, while the discriminator module is trained to distinguish real from fake images. After the training of the discriminator module is completed, the discriminator module is frozen, and the generator module is trained as described above.

By training the generator module, the generator decoder module learns to generate the improved training image depending on the set of reference attributes such that the adverse environmental influences in the training image are reduced, in particular are partially or fully removed. In other words, the set of reference attributes instructs the generator decoder module how to process the set of features concerning the training image in order to reduce the adverse environmental influences.

The output of the discriminator module may in some implementations be binary such that for example 0 corresponds to the detection of a fake image and 1 corresponds to the detection of a real image or vice versa. However, the output of the discriminator module may also comprise different intermediate values between 0 and 1 or alternative minimum and maximum values. The output of the discriminator module directly defines the adversarial loss. In particular, if the output of the discriminator module indicates a fake image, the adversarial loss is large and if the output of the discriminator module indicates a real image, the adversarial loss is low.

The training of the ANN, in particular the generator module, may be terminated, once the adversarial loss settles to a predefined target value and/or another termination condition, which depends on the adversarial loss, is fulfilled.

Once the training of the ANN is completed, the ANN may effectively generate improved camera images based on input camera images, wherein the adverse environmental influences in the improved camera images are reduced with respect to the respective input camera images. Subsequent algorithms for automatic visual perception may then take the improved image instead of the camera image as an input, which may improve the reliability and performance of the visual perception algorithm. An output of the visual perception algorithm, for example a segmented image or a classified image or one or more bounding boxes for objects in the image, may then be used by an electronic vehicle guidance system to guide a vehicle at least in part automatically. Due to the improved reliability of the visual perception algorithm, reliability, comfort and/or safety when guiding the vehicle may be improved.

According to several implementations of the computer-implemented method for training an ANN, a predefined set of training attributes is provided, wherein each training attribute of the set of training attributes specifies an actual presence or absence of the corresponding adverse environmental influence factor in the training image. The ANN comprises an attribute classificator module, which is applied to the training image to generate a set of inferred attributes for the training image. An attribute consistency loss is computed depending on the set of training attributes and the set of inferred attributes.

The attribute classificator module is adapted depending on the attribute consistency loss, and, in particular, depending on the adversarial loss, to train the ANN. The attribute consistency loss may for example be the smaller, the better the set of training attributes matches the set of inferred attributes.

While the set of reference attributes describes a hypothetic or intended situation, wherein none of the adverse environmental influence factors is present, the set of training attributes describe the actual presence or absence of the corresponding factors in the training image. In other words, the set of training attributes corresponds to a ground truth for training the attribute classificator module.

By generating the set of inferred attributes, the attribute classificator module tries to reproduce the set of training attributes. Consequently, after the training of the attribute classificator module is completed and, in particular after the training of the ANN is completed, the attribute classificator module is able to predict the corresponding set of inferred attributes for a given camera image.

According to several implementations, a reproduced image is generated by applying the generator decoder module to the set of features depending on the set of training attributes and a cycle loss is computed depending on the training image and the reproduced image. The generator module is adapted depending on the cycle loss, and, in particular depending on the adversarial loss and, in particular, depending on the attribute consistency loss, to train the ANN. The cycle consistency loss may be the smaller, the better the reproduced image matches the training image. In such implementations, the generator decoder module generates two different images based on the set of features, namely the reproduced image and the improved image. For the reproduced image, the generator decoder module uses the set of training attributes, or, in other words, the actual adverse environmental influence factors of the training image, and for the improved image, it uses the set of reference attributes or, in other words, assumes the absence of all adverse environmental influence factors.

By taking the cycle loss into account for training the generator module, the accuracy and robustness of the trained ANN may be further improved. According to a further aspect of the invention, a computer-implemented method for reducing adverse environmental influences in a camera image is provided. Therein, an ANN is trained using a computer-implemented method for training an ANN to reduce adverse environmental influences in a camera image according to the invention. After the training of the ANN is completed, a set of features concerning the camera image is generated by applying the generator encoder module to the camera image. A severity of the adverse environmental influences in the camera image is determined and an improved camera image is generated by applying the generator decoder module to the set of features concerning the camera image depending on the severity. If not stated otherwise, all method steps of a computer-implemented method for reducing adverse environmental influences in a camera image may be carried out by one or more computing units, in particular of a vehicle, or, in other words, may be carried out online during use of the vehicle. For each implementation of a computer-implemented method for reducing adverse environmental influences according to the invention, a corresponding implementation of a method for reducing adverse environmental influences in a camera image, which is not completely computer-implemented, is obtained directly, wherein the method comprises an additional step of generating the camera image by a camera system, in particular of the vehicle. According to several implementations of the computer-implemented method for reducing adverse environmental influences, a set of inferred attributes of the camera image is generated by applying the attribute classificator module to the camera image, in particular after the training of the ANN is completed. The severity of the adverse environmental influences in the camera image is determined depending on the set of inferred attributes.

For example, the severity may depend on the number of adverse environmental influence factors, which are present in the camera image according to the inferred attributes. However, not all of the influence factors necessarily need to be weighted in the same way. In other words, some influence factors may have a larger impact on the severity than other influence factors.

Since the attribute classificator module has been trained to accurately predict the inferred set of attributes, the severity may be determined with a high reliability and robustness. In particular, by generating the improved camera image depending on the severity, the reliability of the improved camera image may be increased.

According to several implementations, the improved camera image is generated by applying the generator decoder module to the set of features concerning the camera image depending on the set of inferred attributes, if, in particular if and only if, the severity is equal to or smaller than a predefined maximum severity.

In other words, if the severity is equal to or smaller than the maximum severity, the generator decoder module is provided with the set of inferred attributes to generate the improved camera image depending on the set of features concerning the camera image.

According to several implementations, the improved camera image is generated by applying the generator decoder module to the set of features concerning the camera image depending on the set of reference attributes, if, in particular if only if, the severity is greater than the maximum severity.

In other words, if it is found depending on the inferred attributes that the severity is small enough, the generator decoder module uses the inferred attributes and otherwise the inferred attributes are replaced by the set of reference attributes for generating the improved camera image. Thus the reliability of the improved camera image is increased. According to a further aspect of the invention, a method for automatic visual perception is provided. A camera image is generated by a camera system, in particular of a vehicle, and the presence of adverse environmental influences in the camera image is determined by at least one computing unit, in particular of the vehicle. A computer-implemented method for reducing the adverse environmental influences in the camera image according to the invention is carried out, in particular by the at least one computing unit. An algorithm for automatic visual perception is applied to the improved camera image by the at least one computing unit.

In case the computer-implemented method for reducing the adverse environmental influences comprises an implementation of a computer-implemented method for training the ANN, the method steps for training the ANN and the remaining method steps of the computer-implemented method may be carried out by different computing units or training computing units, respectively. In particular, the training method may be carried out offline and the remaining steps of the computer-implemented method may be carried out online.

Computer vision algorithms, which may also be denoted as machine vision algorithms or algorithms for automatic visual perception, may be considered as computer algorithms for performing a visual perception task automatically. A visual perception task, also denoted as computer vision task, may for example be understood as a task for extracting information from image data. In particular, the visual perception task may in principle be performed by a human, which is able to visually perceive an image corresponding to the image data. In the present context, however, visual perception tasks are performed automatically without requiring the support of a human.

For example, a computer vision algorithm may be understood as an image processing algorithm or an algorithm for image analysis, which is trained using machine learning and may for example be based on an artificial neural network, in particular a convolutional neural network.

For example, the computer vision algorithm may include an object detection algorithm, an obstacle detection algorithm, an object tracking algorithm, a classification algorithm, and/or a segmentation algorithm.

The output of a visual perception algorithm depends on the specific underlying perception task. For example, an output of an object detection algorithm may include one or more bounding boxes defining a spatial location and, optionally, orientation of one or more respective objects in the environment and/or corresponding object classes for the one or more objects. A semantic segmentation algorithm applied to a camera image may include a pixel level class for each pixel of the camera image. The pixel level classes may, for example, define a type of object the respective pixel or point belongs to.

According to several implementations of the method for automatic visual perception, the algorithm for automatic visual perception comprises a perception encoder module, a first perception decoder module and a second perception decoder module. For determining the adverse environmental influences in the camera image, the perception encoder module is applied to the camera image to generate a first result and the first perception decoder module is applied to the first result of the perception encoder module. The perception encoder module is applied to the improved camera image to generate a second result and the second perception decoder module is applied to the second result of the perception encoder module.

In other words, the perception encoder module is used in combination with the first perception decoder module to classify the camera image with respect to the adverse environmental influences. The first perception decoder module outputs one of at least two classes, wherein one of the classes corresponds to the presence of the adverse environmental influences. Then, if the camera image is classified accordingly to comprise the adverse environmental influences, the computer-implemented method for reducing the adverse environmental influences according to the invention is carried out and the second result of the perception encoder module is provided to the second perception decoder module to carry out the respective perception algorithm, for example an object detection algorithm, an object tracking algorithm, a semantic segmentation algorithm et cetera. Otherwise, the first result may be provided to the second perception decoder module to carry out the respective perception algorithm, in case no adverse environmental influences are determined.

In such implementations, the available computing resources are used in a particularly efficient way while it is still ensured that the result of the second perception decoder module is robust and reliable.

According to several implementations, an image quality measure, for example a frequency domain image blur measure, of the improved camera image is determined by the at least one computing unit. The perception encoder module is applied to the improved camera image only if the image quality measure is equal to or greater than a predefined minimum quality measure.

In other words, if the image quality measure is smaller than the minimum quality measure, it may not be safe to be used for guiding the vehicle at least in part automatically. In such cases, a cleaning system of the vehicle may be used to clean the camera. In this way, the safety of the automatic or partly automatic guidance of the vehicle may be improved.

According to a further aspect of the invention, a method for guiding a vehicle, in particular a motor vehicle, at least in part automatically is provided. A method for automatic visual perception according to the invention is carried out, in particular at least in part by at least one computing unit of the vehicle, wherein the camera system is mounted to the vehicle such that the camera image represents an environment of the vehicle. At least one control signal for guiding the vehicle at least in part automatically is generated by a control unit of the vehicle, for example of the at least one computing unit, depending on a result of applying the algorithm for automatic visual perception to the improved camera image.

In particular, the vehicle may be guided based on the at least one control signal, for example by means of one or more respective actuators of the vehicle.

For example, the vehicle may comprise an electronic vehicle guidance system, which comprises the at least one computing unit and/or the control unit and/or the actuators for carrying out the method for guiding the vehicle at least in part automatically. The electronic vehicle guidance system may also comprise the camera system.

An electronic vehicle guidance system may be understood as an electronic system, configured to guide a vehicle in a fully automated or a fully autonomous manner and, in particular, without a manual intervention or control by a driver or user of the vehicle being necessary. The vehicle carries out all required functions, such as steering maneuvers, deceleration maneuvers and/or acceleration maneuvers as well as monitoring and recording the road traffic and corresponding reactions automatically. In particular, the electronic vehicle guidance system may implement a fully automatic or fully autonomous driving mode according to level 5 of the SAE J3016 classification. An electronic vehicle guidance system may also be implemented as an advanced driver assistance system, ADAS, assisting a driver for partially automatic or partially autonomous driving. In particular, the electronic vehicle guidance system may implement a partly automatic or partly autonomous driving mode according to levels 1 to 4 of the SAE J3016 classification. Here and in the following, SAE J3016 refers to the respective standard dated June 2018.

Guiding the vehicle at least in part automatically may therefore comprise guiding the vehicle according to a fully automatic or fully autonomous driving mode according to level 5 of the SAE J3016 classification. Guiding the vehicle at least in part automatically may also comprise guiding the vehicle according to a partly automatic or partly autonomous driving mode according to levels 1 to 4 of the SAE J3016 classification.

According to a further aspect of the invention, a system for reducing adverse environmental influences in a camera image is provided. The system comprises a memory device storing an ANN, which has been trained using an implementation of a computer-implemented method for training an artificial neural network to reduce adverse environmental influences in a camera image according to the invention. The system comprises at least one computing unit, which is configured to generate a set of features concerning the camera image by applying the generator encoder module to the camera image. The at least one computing unit is configured to determine a severity of the adverse environmental influences in the camera image and to generate an improved camera image by applying the generator decoder module to the set of features concerning the camera image depending on the severity.

Further implementations of the system according to the invention follow directly from the various implementations of the computer-implemented method for reducing adverse environmental influences in a camera image according to the invention. In particular, a system according to the invention is configured to carry out a computer-implemented method according to the invention or carries out such a computer-implemented method.

According to a further aspect of the invention, a system for automatic visual perception is provided, which comprises a system for reducing adverse environmental influences in a camera image according to the invention and is further configured to carry out a method for automatic visual perception according to the invention.

According to a further aspect of the invention, an electronic vehicle guidance system for a vehicle is provided. The electronic vehicle guidance system comprises a camera system, in particular to be mounted to the vehicle, which is configured to generate a camera image, which represents an environment of the vehicle, in particular when the camera system is mounted to the vehicle. The electronic vehicle guidance system comprises a control unit, which is configured to generate at least one control signal for guiding the vehicle at least in part automatically. The electronic vehicle guidance system comprises a system for reducing adverse environmental influences according to the invention and the at least one computing unit is configured to apply an algorithm for automatic visual perception to the improved camera image. The control unit is configured to generate the at least one control signal depending on a result of applying the algorithm for automatic visual perception to the improved camera image.

Further implementations of the electronic vehicle guidance system follow directly from the various implementations of a method for guiding a vehicle at least in part automatically and vice versa. In particular, an electronic vehicle guidance system according to the invention is configured to carry out a method for guiding a vehicle at least in part automatically according to the invention or carries out such a method.

According to a further aspect of the invention, a vehicle, in particular a motor vehicle, is provided, which comprises an electronic vehicle guidance system according to the invention.

According to a further aspect of the invention, a first computer program comprising first instructions is provided. When the first instructions are carried out by a system for reducing adverse environmental influences in a camera image according to the invention, in particular by the at least one computing unit of the system, the first instructions cause the system for reducing adverse environmental influences in a camera image to carry out a computer-implemented method for training an artificial neural network according to the invention and/or a computer-implemented method for reducing adverse environmental influences in a camera image according to the invention.

According to a further aspect of the invention, a second computer program comprising second instructions is provided. When the second instructions are carried out by an electronic vehicle guidance system according to the invention, in particular by the at least one computing unit of the electronic vehicle guidance systems, the second instructions cause the electronic vehicle guidance system to carry out a method for guiding a vehicle at least in part automatically according to the invention. According to a further aspect of the invention, a computer readable storage medium storing a first computer program and/or a second computer program according to the invention is provided.

If, in the context of the present disclosure, it is mentioned that a component of the system for reducing adverse environmental influences according to the invention or the electronic vehicle guidance system according to the invention, in particular the at least one computing unit, is adapted, configured or designed to, et cetera, to perform or realize a certain function, to achieve a certain effect or to serve a certain purpose, this can be understood such that the component, beyond being usable or suitable for this function, effect or purpose in principle or theoretically, is concretely and actually capable of executing or realizing the function, achieving the effect or serving the purpose by a corresponding adaptation, programming, physical design and so on.

A computing unit may in particular be understood as a data processing device. The computing unit can therefore in particular process data to perform computing operations. This may also include operations to perform indexed accesses to a data structure, for example a look-up table, LUT.

In particular, the computing unit may include one or more computers, one or more microcontrollers, and/or one or more integrated circuits, for example, one or more application-specific integrated circuits, ASIC, one or more field-programmable gate arrays, FPGA, and/or one or more systems on a chip, SoC. The computing unit may also include one or more processors, for example one or more microprocessors, one or more central processing units, CPU, one or more graphics processing units, GPU, and/or one or more signal processors, in particular one or more digital signal processors, DSP. The computing unit may also include a physical or a virtual cluster of computers or other of said units.

In various embodiments, the computing unit includes one or more hardware and/or software interfaces and/or one or more memory units.

A memory unit may be implemented as a volatile data memory, for example a dynamic random access memory, DRAM, a static random access memory, SRAM, or as a non volatile data memory, for example a read-only memory, ROM, a programmable read-only memory, PROM, an erasable read-only memory, EPROM, an electrically erasable read- only memory, EEPROM, a flash memory or flash EEPROM, a ferroelectric random access memory, FRAM, a magnetoresistive random access memory, MRAM, or a phase- change random access memory, PCRAM. Further features of the invention are apparent from the claims, the figures and the figure description. The features and combinations of features mentioned above in the description as well as the features and combinations of features mentioned below in the description of figures and/or shown in the figures may be comprised by the invention not only in the respective combination stated, but also in other combinations. In particular, embodiments and combinations of features, which do not have all the features of an originally formulated claim, are also comprised by the invention. Moreover, embodiments and combinations of features which go beyond or deviate from the combinations of features set forth in the recitations of the claims are comprised by the invention. In the figures:

Fig. 1 shows schematically a vehicle with an exemplary implementation of an electronic vehicle guidance system according to the invention;

Fig. 2 shows a schematic flow diagram of an exemplary implementation of a method for automatic visual perception according to the invention;

Fig. 3 shows schematically a first training phase for training a generative adversarial network;

Fig. 4 shows schematically a second training phase for training a generative adversarial network;

Fig. 5 shows a schematic flow diagram of an exemplary implementation of a computer-implemented method for training an artificial neural network according to the invention; and

Fig. 6 shows a schematic flow diagram of an exemplary implementation of a computer-implemented method for reducing adverse environmental influences in a camera image according to the invention. Fig. 1 shows schematically a motor vehicle 1 , which comprises an exemplary implementation of an electronic vehicle guidance system 3 according to the invention. The electronic vehicle guidance system 3 comprises an exemplary implementation of a system 2 for reducing adverse environmental influences in a camera image according to the invention.

The system 2 comprises a memory device 5, which stores an artificial neural network, ANN, which has been trained according to a computer-implemented method according to the invention. The system 2 also comprises a computing unit 4, which may for example be part of an electronic control unit, ECU, of the vehicle 1 or may comprise one or more ECUs of the vehicle 1. The electronic vehicle guidance system 3 further comprises a camera system 6 mounted to the vehicle 1 , for example a front-facing camera.

Furthermore, the electronic vehicle guidance system 3, for example the computing unit 4, comprises a control unit (not shown), which is configured to generate at least one control signal for guiding the vehicle 1 at least in part automatically. The control unit is, in particular, configured to generate the at least one control signal depending on a result of an algorithm for automatic visual perception, which is carried out as a part of a method for automatic visual perception according to the invention.

In particular, the electronic vehicle guidance system 3 is configured to carry out a method for automatic visual perception according to the invention. An exemplary flow diagram of such a method is shown schematically in Fig. 2.

In step S1 of the method, a camera image is generated by the camera system 6, wherein the camera image represents an environment of the vehicle 1. In step S2 of the method, the camera image is supplied as an input to a perception encoder module 7 of the algorithm for automatic visual perception. The perception encoder module 7 generates a first result depending on the camera image, wherein the first result corresponds to a set of image features of the camera image. The first result is used in step S3 of the method as an input for a first perception decoder module 8 of the algorithm for classifying the camera image depending on adverse environmental influences in the camera image. In particular, the classification may classify the camera image such that it corresponds to one of two classes, one of the classes corresponding to the presence of adverse environmental influences in the camera image and the other class corresponding to the absence of adverse environmental influences. However, also one or more additional classes may be used for a more refined classification. However, in any case, the result of the first perception decoder module 8 allows to distinguish whether the adverse environmental influences are present in the camera image or not. Consequently, in step S4 of the method, it is determined depending on the output of the first perception decoder module 8, whether or not the adverse environmental influences are present in the camera image.

If it is found in step S4 that adverse environmental influences are not present, the first result of the perception encoder module 7 may be provided as an input to one or more second perception decoder modules 9a, 9b, which carry out respective visual perception tasks such as object detection, semantic segmentation et cetera.

On the other hand, if it is found in step S4 that adverse environmental influences are present, an implementation of a computer-implemented method for reducing adverse environmental influences in the camera image according to the invention is carried out in step S5. More details regarding the computer-implemented method for reducing adverse environmental influences are provided with respect to Fig. 3 to Fig. 6. As a result of the method step S5, an improved image is generated, wherein the adverse environmental influences in the improved image are reduced compared to the original camera image.

In step S9, the improved image may be provided as another input to the perception encoder module 7, which generates a second result based on the improved image. Then, the one or more second perception decoder modules 9a, 9b may be applied to the second result of the perception encoder module 7 instead of the first result. In this way, the reliability and robustness of the output of the second perception decoder modules 9a, 9b is improved.

The perception encoder module 7 may for example comprise a number of convolutional layers for feature extraction and a number of pooling layers for feature space reduction. The perception encoder module 7 may be generic and may be trained to generate mostly task agnostic features. Features in the upper layers may however also be slightly task specific. Together with the perception decoder modules 8, 9a, 9b, it forms a multi-task network. Each of the perception decoder modules 8, 9a, 9b is trained for a specific task such as object detection, segmentation, scene analysis et cetera. Steps S6 to S8 of the method are optional. In step S6, an image quality measure of the improved camera image may be determined by the computing unit 4 and in step S7, the image quality measure is compared to a predefined minimum quality measure by the computing unit 4. If the image quality measure is equal to or greater than the minimum quality measure, the method proceeds with step S9 as described above. On the other hand, if it is found in step S7 that the image quality measure is smaller than the minimum quality measure, a cleaning system (not shown) of the vehicle 1 may be activated in step S8, in order to clean a lens unit of the camera system 6. In this case, neither the camera image nor the improved image is used for guiding the vehicle 1 at least in part automatically.

For determining the image quality measure, the sharpness or blurriness of the camera image may for example be determined and evaluated. In particular, the image quality measure may be denoted as FM and may be given by FM = [TH/ (M x N)]. Therein, (M x N) corresponds to a size of the improved image. For computing FM, the Fourier transform of the improved image is computed. T H may then be determined as the total number of pixels in the Fourier transform of the improved image, whose absolute value is greater than a predefined threshold.

The threshold may for example be determined by calculating the centered Fourier transform by shifting the origin of the Fourier transform of the improved image accordingly and then calculating the absolute value of the centered Fourier transform. Then a maximum value of the absolute value of the centered Fourier Transform is determined and the threshold value may be given as a predefined fraction of the maximum value, for example 1/1000 of the maximum value. Flowever, also other methods for fixing the threshold value or other approaches to compute a suitable image quality measure may be used. As depicted schematically in Fig. 3 and Fig. 4, a GAN has basically two components, namely a generator module G and a discriminator module D. The generator module G is basically a neural network that generates new data instances and the discriminator module D evaluates the authenticity of the instances. The generator module G may take random input and try to generate a sample of data. This data is fed to the discriminator module D. The task of the discriminator module D is to take input from either the real data set or from the generator module G and to try to predict whether the input is real or generated by the generator module G. It may then solve a binary classification problem using for example a Sigmoid function giving output in the range of 0 to 1.

For training the GAN, the generator module G may be frozen during a first training phase, during which the discriminator module D is trained for a number of training epochs, as indicated in Fig. 3. In Fig. 3, the discriminator module D takes either real input 10 or generated input 11 generated by the generator module G. In the second training phase, sketched in Fig. 4, the trained discriminator module D is used to train the generator module G.

Fig. 5 shows schematically a flow diagram of an exemplary implementation of a computer-implemented method for training an ANN comprising a GAN according to the invention and Fig. 6 shows a schematic flow diagram of applying such a trained artificial neural network ANN to generate the improved image as described with respect to step S5 of Fig. 2.

The GAN comprises a generator module, which contains a generator encoder module G e and a generator decoder module G d . Furthermore, the GAN comprises a discriminator module D. In addition, the ANN may comprise an attribute classificator network C, which comprises a classificator encoder module C e and a classificator decoder module C d . Therein, the classificator encoder module C e may for example be shared with the discriminator module D in some implementations.

For training the ANN, a predetermined training image Xa is provided, and the generator encoder module G e is applied to the training image Xa to generate a set of features z concerning the training image Xa. Furthermore, a predefined set of reference attributes b is provided, wherein each reference attribute specifies an intended absence of a corresponding adverse environmental influence factor in the improved training image Xb to be generated. The generator decoder module takes the set of features z and the references attributes b as an input and generates the improved training image Xb. Then, the trained discriminator module D is applied to the improved training image Xb to compute an adversarial loss 14a. The generator module G e , G d is adapted depending on the adversarial loss 14a. Optionally, a predefined set of training attributes a is provided, wherein each training attribute specifies an actual presence or absence of the corresponding adverse environmental influence factor in the training image Xa. The attribute classificator module C is applied to the training image Xa to infer a set of inferred attributes a for the training image Xa. By comparing the training attributes a with the inferred attributes a, an attribute consistency loss 14b may be determined and the classificator module and/or the GAN G e , G d may be adapted depending on the attribute consistency loss 14b.

Optionally, the generator decoder module G d is applied to the set of features z depending on the set of training attributes a to generate a reproduced image Xa. By comparing the training image Xa with the reproduced image Xa, a cycle loss 14c may be computed. The generator module G e , G d may be adapted depending on the cycle loss 14c.

When the training of the ANN is completed, the camera image Ya is used as an input for the generator encoder module G e to generate a set of features z concerning the camera image Ya as shown in Fig. 6. Furthermore, the attribute classificator module C is applied to the camera image Ya to generate a set of inferred attributes a for the camera image Ya. A severity estimation module 15, which may for example be a non-trainable module, may determine the severity of the adverse environmental influences in the camera image Ya depending on the set of inferred attributes a. The severity estimation module 15 may compare the severity to a predefined maximum severity and, if the severity is equal to or smaller than the maximum severity, provide the set of inferred attributes a as an input to the generator decoder module G d . Then, the generator decoder module G d is applied to the set of features z depending on the set of inferred attributes a to generate the improved image Yb. On the other hand, if the severity estimation module 15 finds that the severity is greater than the maximum severity, it may provide the set of reference attributes b to the generator decoder module G d instead of the set of inferred attributes a. Then, the generator decoder module G d is applied to the set of features z depending on the set of references attributes b to generate the improved image Yb.

As described, in particular with respect to the figures, the invention provides a possibility to reduce adverse environmental influences on camera images.

For training a respective ANN, adverse environmental influences such as soiling or adverse weather effects may be treated in the framework of an attribute editing problem. It is assumed that the affected camera image may be described by a fixed set of soiling and adverse weather attributes, which correspond for example to the presence of rain, an opaque or transparent soiling et cetera. Also the severity of the image quality decrease may be encoded. All such attributes may be described by a binary encoded attribute vector, where each dimension represents one particular attribute. To modify the image, a GAN with an encoder-decoder architecture may be used. For example, by means of semi-supervised learning, the lower dimensional representation of the scenery, which is visible by means of cameras, the generator encoder module may be trained jointly with the generator decoder module, which besides the lower dimensional representation of the scene takes a binary vector of the attributes as an input. Simultaneously, an attribute classificator module may be trained under full supervision, which, in some implementations, may have a common encoder with the discriminator module of the GAN. Besides the adversarial loss, the training may also take into account a cycle loss and/or an attribute consistency loss.

During runtime, that is after training is completed, the input image may be fed to the attribute classification module and the generator encoder module of the GAN. Based on an output of the attribute classification module, it may conditionally be decided whether the image is affected by the adverse environmental influences at a level, which is still below a threshold and in the positive case, the labels of the acquired attribute vector may be swapped in such a way that the resulting attribute vector will contain no soiling or adverse weather influences. Together with the lower dimensional representation of the input image the new attribute vector may be fed to the generator decoder module, which in turn provides the improved image as an output.

The generator module and the attribute classification module may be defined within one statically built graph. The training process may aim at minimization of the error defined by a complex loss function reflecting the distance of the annotation from the classification. The minimization may be carried out by means of a backpropagation algorithm, which adapts the classifier weights by difference of the gradients in the direction of minimization. The complex loss function may consist of several simpler losses, namely the cycle loss evaluating the reconstruction quality of the generator decoder module, the attribute consistency loss evaluating the successfulness of the attribute classification and the adversarial loss to achieve a Nash’s equilibrium of a minimax game, where the generator module tries to match the real world distribution by fooling the discriminator module, whose task is to successfully recognize the samples from the real world distribution and the generated samples. In view of GAN training stability, also another loss term may be added to the complex loss function, which is denoted as gradient penalty and may help to stabilize the GAN training.

The runtime of the proposed solution may consist of a parallel run of the attribute classification module and the generator encoder module. The former predicts the current setup for the scene parameters, that is presence or absence of the adverse environmental influences, while the latter extracts the lower dimensional representation of the scene. The predicted attributes are consequently modified in such a way that unwanted attributes are removed. The respective modified vector may also be denoted as vector of normalized attributes. Then, the lower dimensional representation of the scene and the normalized attribute vector may be fed as an input to the generator decoder module. The output of the generator decoder module is then an improved image.

In further implementations, an epistemic uncertainty measure may be used to provide a confidence metric of how good the improved image is. This may be posed as a regression problem where the uncertainty is learnt by providing supervised data with and without the influence of adverse environmental influences. This may for example be done using synthetic data sets and then domain adaption may be used to transfer the learnt regression model. The algorithm is independent of the object detection decoder.

In further implementations, spatial and/or temporal extensions may be considered. For example, adverse weather like rain or fog may uniformly affect all the cameras of the vehicle. Thus, the encoders assigned to each of the cameras may be combined by a 1x1 depth wise filter to produce a fused encoder which can then be fed to a single decoder, which recovers the cleaning parameters for images. A similar encoder fusion may be performed temporarily to leverage temporal continuity. Both spatial and temporal fusion may be performed simultaneously or spatial fusion may be performed first and may then be followed by temporal fusion.

Further implementations may also extend the invention to video restoration. In particular, previous frame encoders may be used in a rolling buffer fashion to minimize additional computations. Any scene, which is influenced by adverse environmental influences, may approximately have the same soiling pattern for consecutive camera frames. Therefore, the pattern may be learnt in the first few frames and the performance of the restoration may be significantly improved as the occluded part remains the same over time. In this way the adversarial learning mechanism may quantize the loss per scene rather than per frame. Also, the loss may be decreasing after the first few frames. This strategy may be helpful in case the soiling pattern is very complex but tends to maintain almost the same pattern for at least some time.

By including an implementation of the invention in an automotive system, in particular for partially or fully automatic driving, safety of autonomous or semi-autonomous driving may be improved.