Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IMPROVING THE RECOGNITION CAPABILITY OF A RECOGNITION NETWORK
Document Type and Number:
WIPO Patent Application WO/2020/169413
Kind Code:
A1
Abstract:
The invention discloses an automated method for improving the recognition capability of a recognition network (3) for a target object (6), providing: - input image data (1a) containing an image of the target object (6), - a recognition network (3) alterable by modifiable recognition network parameters, whereas the recognition network (3) is designed to recognize the target object (6) in image data by means of an object classifier (5), and - a deception network (4) alterable by modifiable deception network parameters, whereas the deception network is designed to generate deceived image data (1b) of the input image data (1a) by modifying the deception network parameters and whereas the deceived image data (1b) are forwarded to the recognition network (3), characterized by the following steps: a) generating (S1) the deceived image data (1b) by modifying the deception network parameters such as the uncertainty of the object classifier (5) determined by the recognition network (3) is increased, b) using (S2) the deceived image data (1b) of step a) concatenated with the input image data (1a) as input for the recognition network (3), whereas the recognition network parameters are modified such as the uncertainty of the object classifier (5) is reduced, and c) repeating steps a) and b). A use of the method, a computational device, a computer software product and a computer-readable medium are disclosed as well.

Inventors:
ZAKHAROV SERGEY (DE)
ILIC SLOBODAN (DE)
HUTTER ANDREAS (DE)
Application Number:
PCT/EP2020/053452
Publication Date:
August 27, 2020
Filing Date:
February 11, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SIEMENS AG (DE)
International Classes:
G06V10/764; G06V20/00
Other References:
YAROSLAV GANIN ET AL: "Domain-adversarial training of neural networks", JOURNAL OF MACHINE LEARNING RESEARCH, MIT PRESS, CAMBRIDGE, MA, US, vol. 17, no. 1, 1 January 2016 (2016-01-01), pages 2096 - 2030, XP058261862, ISSN: 1532-4435
BENJAMIN PLANCHE ET AL: "Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition", 9 October 2018 (2018-10-09), XP055561666, Retrieved from the Internet [retrieved on 20190806]
SHIWEI SHEN ET AL: "APE-GAN: Adversarial Perturbation Elimination with GAN", ARXIV:1707.05474V2 [CS.CV], 14 September 2017 (2017-09-14), XP055537418, Retrieved from the Internet [retrieved on 20181220]
KONSTANTINOS BOUSMALIS ET AL: "Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 16 December 2016 (2016-12-16), XP080744869, DOI: 10.1109/CVPR.2017.18
K. BOUSMALISN. SIL-BERMAND. DOHAND. ERHAND. KRISHNAN: "Unsupervised pixel-level domain adaptation with generative adversarial networks", IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 2017
M. RADM. OBERWEGERV. LEPETIT: "Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images", ARXIV:1712.03904 [CS, 2017
GANIN, YAROSLAVVICTOR LEMPITSKY: "Unsupervised Domain Adaptation by Backpropagation", INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2015, pages 1180 - 1189, XP055349552
Download PDF:
Claims:
Claims

1. Automated method for improving the recognition capability of a recognition network (3) for a target object (6), provid ing :

input image data (la) containing an image of the target obj ect ( 6 ) ,

a recognition network (3) alterable by modifiable recogni tion network parameters, wherein the recognition network (3) is designed to recognize the target object (6) in im age data by means of an object classifier (5), and

a deception network (4) alterable by modifiable deception network parameters, wherein the deception network is de signed to generate deceived image data (lb) of the input image data (la) by modifying the deception network parame ters and whereas the deceived image data (lb) are forward ed to the recognition network (3),

characterized by the following steps:

a) generating (SI) the deceived image data (lb) by modifying the deception network parameters such as the uncertainty of the object classifier (5) determined by the recognition network (3) is increased, wherein the background or the texture of the target object (6) are variated to modify the deception network parameters,

b) using ( S2 ) the deceived image data (lb) of step a) concat enated with the input image data (la) as input for the recognition network (3), whereas the recognition network parameters are modified such as the uncertainty of the ob ject classifier (5) is reduced, and

c) repeating steps a) (SI) and b) (S2) until a reduction of the uncertainty of the object classifier (5) is below a pre defined threshold.

2. Method according to claim 1,

wherein the object classifier (5) is a predefined class or predefined pose of the target object (6) . 3. Method according to one of the previous claims,

wherein the deception network (4) and/or the recognition net work (3) comprises an artificial neural network. 4. Method according to one of the previous claims,

wherein the input image data (la) are based on CAD data.

5. Using the recognition algorithm of the recognition network (3) improved by a method according to one of the previous claims to recognize a target object (6) in image data (la) generated by an image sensor (1) .

6. Using the recognition algorithm according to claim 5, wherein the image sensor is a camera or a laser system.

7. Computational device (2) designed to perform a method ac cording to one of the claims 1 to 4.

8. A computer program product comprising instructions which, when the program is executed by a computational device (2), cause the computational device (2) to carry out the steps of the method according to one of the claims 1 to 4.

9. A computer-readable storage medium comprising instructions which, when executed by a computational device (2), cause the computational device (2) to carry out the steps of the method of claim according to one of the claims 1 to 4.

Description:
Description

Improving the recognition capability of a recognition network

Field of the Invention

The present invention relates to an automated method for im proving the recognition capability of a recognition network for a target object, a computational device, a computer pro gram, a computer-readable storage medium and a use of the method for improving the capability of a recognition system.

Background of the Invention

Training on synthetic data is the holy grail of computer vi sion, but the domain gap between synthetic renderings and the real world is big. If one could find a mapping from one do main to another, it would allow for vast advances of computer vision applications in industry. The reason for this is that often there is no access to real physical objects to acquire training data required for deep-learning methods, meaning an notated sequences of images with real objects.

Nevertheless, what often is available are 3D CAD models of the objects of interest (= target object) . They can be easily taken and big amounts of required data can be rendered. The problem, however, is that these rendered images, despite the apparent visual similarity, are going to look different from the real ones coming from a sensor. This means that the deep learning method trained on perfect renderings is going to perform poorly on real data.

Possible sources of such discrepancy are manifold:

general difference between a 3D model and an actual ob- j ect,

image formation; synthetic: clean edges, unrealistic shad ing, aliasing issues; real: sensor noise, Bayer mosaicing & compression artifacts, variety of real images (textures, colours) lead to more generic filters. Training on synthetic leads to a poor filter variety.

Various domain adaptation works put their efforts to bridge the gap between the domains. One possible solution is to use a small unlabelled subset of real data to improve the realism of the synthetic data. For example, in "K. Bousmalis, N. Sil- berman, D. Dohan, D. Erhan, and D. Krishnan, Unsupervised pixel-level domain adaptation with generative adversarial networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2017" it is proposed to use deep-learning to learn the mapping from synthetic images to real. It works quite well for a chosen domain of interest, but to train such a net one needs real data.

As an alternative, one could try to learn domain-invariant features that work well for both real and synthetic domains. One of the recent examples is the method described in "M.

Rad, M. Oberweger, and V. Lepetit, Feature Mapping for Learn ing Fast and Accurate 3D Pose Inference from Synthetic Imag es, arXiv: 1712.03904 [cs], 2017".

But what if one does not have real data available? In this case, the answer is domain randomization. Domain randomiza tion is a popular approach, which goal is to randomize the parts of the domain that we do not want our algorithm to be sensitive to. For example, in the case of object classifica tion, one could randomize the background of the object since it should not play a role in recognizing the object itself.

In addition, one could also randomize the texture of the ob ject if no data on its appearance is available. This sort of parametrization allows to learn features that are invariant to properties of the domain. Nevertheless, the main question remains unsolved: What is the main cause of confusion given the domain change? Domain randomization tries to target all possible scenarios, but we do not really know which of them are useful to bridge the domain gap. Moreover, it is almost impossible to cover all the possible variations present in the real world by applying simple augmentations.

Summary of the Invention

The objective of the present invention is to provide a solu tion for improving the recognition capability of a recogni tion network.

To accomplish the objective, the present invention provides a solution according to the independent claims. Advantageous embodiments are provided in the dependent claims.

According to a first aspect of the invention, instead of the commonly used domain randomization technique, i.e. augmenting the image by manually adding random backgrounds and lightness parameters, a recognition network itself is used to produce augmentations that maximize the uncertainty of the output. By solving a minimax optimization problem much more robust map pings are achieved that scale well to different domains with out any target data available.

The invention claims an automated method for improving the recognition capability of a recognition network for a target object, providing:

input image data containing an image of the target object, a recognition network alterable by modifiable recognition network parameters, wherein the recognition network is de signed to recognize the target object in image data by means of an object classifier, and

a deception network alterable by modifiable deception net work parameters, wherein the deception network is designed to generate deceived image data of the input image data by modifying the deception network parameters and whereas the deceived image data are forwarded to the recognition net work,

characterized by the following steps: a) generating the deceived image data by modifying the decep tion network parameters such as the uncertainty of the ob ject classifier determined by the recognition network is increased, wherein the background or the texture of the target object are variated to modify the deception parame ters,

b) using the deceived image data of step a) concatenated with the input image data as input for the recognition network, whereas the recognition network parameters are modified such as the uncertainty of the object classifier is re duced, and

c) repeating steps a) and b) until a reduction of the uncer tainty of the object classifier is below a predefined threshold. Such the repetition of the steps comes to a de fined end.

The feature "recognition network parameter" is broadly used and can be any modifiable part of the recognition network.

For example, recognition parameters can be "weights" adjusta ble during learning in an artificial neural network.

The advantage of the invention is that instead of augmenting the input with random augmentations, the method of interest is used directly to learn the augmentation that maximizes the error of the method. As a result, the achieved mapping is highly invariant to the changes in the input data and gener alizes well to real data without having a single glimpse on it .

The method allows for a consistent domain generalization im provement for deep learning-based recognition algorithms, while requiring no target domain data. A clever, network spe cific pipeline allows to explicitly supervise the algorithm of interest to learn domain invariant features by defining learning constrains.

Methods that generalize well to different target domains are desired in industry since it is often impossible to acquire real data sequences of each new model and each new environ ment for training.

In a further embodiment the object classifier is a predefined class or predefined pose of the target object. Class for ex ample can be a specific working tool, pose can be a specific direction of the target object.

In a further embodiment the deception network and/or the recognition network comprises an artificial neural network, for example a convolutional neural network.

An artificial neuron network is a computational model based on the structure and functions of biological neural networks. Information that flows through the network affects the struc ture of the artificial neuron network because a neural net work changes - or learns, in a sense - based on that input and output .

Artificial neuron networks are considered nonlinear statisti cal data modelling tools where the complex relationships be tween inputs and outputs are modelled or patterns are found. Artificial neuron network is also known as a neural network.

In a further embodiment the input image data are based on CAD data. No real image data are necessary for the invention.

According to a second aspect of the invention a target object should be recognized by the trained recognition network.

The invention claims using the recognition algorithm improved by a method according to the invention to recognize a target object in image data generated by an image sensor.

In a further embodiment the image sensor is a camera or a la ser system. In a third aspect of the invention a device performing the invention should be claimed.

The invention claims a computational device designed to per form a method according to the invention.

In a fourth aspect of the invention a computer program and a computer-readable medium should be claimed.

The invention claims a computer program product comprising instructions which, when the program is executed by a compu tational device, cause the computational device to carry out the steps of the method according to the invention.

The invention further claims a computer-readable storage me dium comprising instructions which, when executed by a compu tational device, cause the computational device to carry out the steps of the method according to the invention.

Further benefits and advantages of the present invention will become apparent after a careful reading of the detailed de scription with appropriate reference to the accompanying drawings .

Brief Description of the Drawings

Fig. 1 shows a block diagram of the computational device, fig. 2 shows a flow chart of the method and fig. 3 shows a block diagram of a recognition system.

Detailed Description of the Invention

As an example of the invention, fig. 1 shows a block diagram of the computational device 2. The device consists of the following two components: a recognition network 3 designed for a specific task, e.g. classification, pose estimation, detection, etc., and a deception network 4 designed to modify input image data to maximize the uncertainty of the recogni tion network. The deception network comprises an encoder and a decoder.

The two networks 3 and 4 are trained competitively in a min/max game fashion as shown in the flow chart of fig. 2. A single training iteration consists of two steps:

First step SI: A forward pass through the deception network 4 that outputs a batch of images of the same size as the input (= deceived image data lb) . The deceived image data lb of the deception network 4 is then fed to the recognition network 3 to maximize the uncertainty of its output (= object classifi er 5) . Backpropagation is then applied to update the decep tion network's 4 parameters only. To reverse the loss func tion of the recognition network 3 to maximize the uncertainty of it a gradient reversal layer is used, following the work of "Ganin, Yaroslav, and Victor Lempitsky, Unsupervised Do main Adaptation by Backpropagation, International Conference on Machine Learning, pp . 1180-1189', 2015.

To achieve desired output transformations (= deceived image data lb) , constrains of the deception network 4 are defined. The constrains are task specific: in the case of having tex tured objects, any pixel changes on the object surface are penalized (= not allowed) and background changes are allowed; when no texture is available, gradient changes of the object on the output are penalized.

RGB:

1. Background augmentation

2. Object lighting augmentation based on the Phong lighting model

a. Additional lighting decoder is used to output the light source direction and the light colour b. Given the normal map of the object and the output of the lighting decoder, i.e. 3D light vector and 3D light colour, we generate:

i. Diffusion light: simulates the directional impact a light source has on the object

ii. Specular light: simulates the bright light spot that ap pears on shiny objects

Depth :

1. Background depth augmentation

Second step S2 : A forward pass through the recognition net work 3 providing the output of the deception network 4 as in put. The unmodified input (= image data la) is kept and con catenated with the transformed one for stability reasons. The objective here is to minimize the recognition network's 3 loss (= uncertainty of the object classifier 5) given the modified input. In this case, the network parameters are up dated only for the recognition network 3.

Constant repletion of described steps SI and S2 results in a minmax game where the deception network 4 tries to degrade the recognition network's 3 performance by modifying its in put in a way that should be irrelevant for a given task. On the other hand, the recognition network 3 learns features that are invariant to transformations introduced by the de ception network 3, making it increasingly more generalizable and robust. Advantageously, the repletion ends when a prede fined threshold of the uncertainty of the object classifier is reached.

Fig. 3 shows a block diagram of a recognition system with a computational device 2 comprising the recognition network 3 which has been trained by the aforementioned method. An image of the target object 6 is captured by an image sensor 1 and fed into the trained recognition network 3. The object clas sifier 5 of the target object 5 is outputted by the recogni tion network 3. The object classifier 5 is for example a class and/or a pose of the target object 5. The recognition network 3 is for example a convolutional neural network.

Although the invention has been explained in relation to its preferred embodiments as mentioned above, it is to be under stood that many other possible modifications and variations can be made without departing from the scope of the present invention. It is, therefore, contemplated that the appended claim or claims will cover such modifications and variations that fall within the true scope of the invention.

List of Reference Signs

1 image sensor

la image data

lb deceived image data 2 computational device

3 recognition network

4 deception network

5 object classifier 6 target object