**METHOD AND SYSTEM FOR PROCESSING INPUT DATA AND PROPAGATING VARIANCE IN A NEURAL NETWORK**

SHEKHOVTSOV, Oleksandr (Nedvezska 2231/18, PRAGUE, 100 00, CZ)

FLACH, Boris (Tetschener Straße 28, DRESDEN, 01277, DE)

CZECH TECHNICAL UNIVERSITY IN PRAGUE (Faculty of Electrical Engineering, Czech Technical University,Praha, Technicka 2 ., 166 27, CZ)

*;*

**G06N3/04**

**G06N3/08**WO2016145516A1 | 2016-09-22 |

US20160217368A1 | 2016-07-28 |

"JOURNAL OF MACHINE LEARNING RESEARCH", vol. 28 (2), 17 June 2013, MIT PRESS, CAMBRIDGE, MA, US, ISSN: 1532-4435, article SIDA I WANG ET AL: "Fast dropout training", pages: 118 - 126, XP055515749

YARIN GAL ET AL: "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning", 4 October 2016 (2016-10-04), XP055467519, Retrieved from the Internet

SRIVASTAVA ET AL.: "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", JOURNAL OF MACHINE LEARNING RESEARCH, vol. 15, 2014, pages 1929 - 1958, XP055193568

WANG, S; MANNING, C: "Fast dropout training", PROCEEDINGS OF THE 30TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING, PMLR, vol. 28, no. 2, 2013, pages 118 - 126

DROPOUT, A SIMPLE WAY TO PREVENT NEURAL NETWORKS FROM OVERFITTING

CLAIMS 1. A method for processing input data using a neural network comprising a plurality of consecutive neural network layers, the method comprising: - obtaining (E01) at least a first variance value associated with the input data, - processing (E02, E04) the input data, at least the first variance value, through the neural network, processing the input data comprising, by each neural network layer of the plurality of consecutive neural network layers, a processing, wherein the method comprises, after each processing by a neural network layer of the plurality of consecutive neural network layers, a determination (E03, E05) of at least an intermediate variance value. 2. The method of claim 1, wherein the determination of at least an intermediate variance value is performed through analytical calculations. 3. The method of claim 2, wherein the analytical calculations use at least an intermediate variance value determined after a processing by the preceding neural network layer in the plurality of consecutive neural network layers. 4. The method according to claim 2 or 3, wherein each neural network layer has a type, and the analytical calculations take the type of the preceding neural network layer into account. 5. The method according to any one of claims 1 to 4, wherein the input data has a plurality of input components each associated with a first variance value. 6. The method according to any one of claims 1 to 5, wherein each neural network layer receives a respective first number of input components and a first or intermediate variance value for each input component, and wherein each neural network layer delivers a respective second number of output components, and an intermediate variance value for each output component is determined. 7. The method according to any one of claims 1 to 6, wherein the method is performed during a testing phase and the first variance value is predetermined or the first variance values are predetermined. 8. The method according to claim 7 taken in its combination with claim 5, wherein each input component of the input data is associated with a respective predetermined first variance value. 9. The method according to any one of claims 1 to 6, wherein the method is performed during a training phase of the neural network, the input data belonging to a set of training input data, and wherein the first variance value relates to a variance over the set of training input data or the first variance value is predetermined. 10. The method according to claim 9, wherein at least a first mean value associated with the input data is obtained, the first mean value relating to a mean over the set of training input data. 11. The method according to any one of claims 1 to 10, wherein the input data is an image. 12. A system comprising a neural network for processing input data, the neural network comprising a plurality of consecutive neural network layers, the system being configured to: - obtain at least a first variance value associated with the input data, - process the input data, at least the first variance value, through the neural network, processing the input data comprising, by each neural network layer of the plurality of consecutive neural network layers, a processing, and - after each processing by a neural network layer of the plurality of consecutive neural network layers, determine at least an intermediate variance value. 13. A computer program including instructions for executing the steps of a method according to any one of claims 1 to 11 when said program is executed by a computer. 14. A recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method according to any one of claims 1 to 11. |

Field of the disclosure

The present disclosure is related to the field of data processing using neural networks, for example image processing using neural networks.

Description of the Related Art

It has been proposed to use neural networks, for example convolutional neural network or feed-forward neural networks, to process data.

By way of example, it has been proposed to process images through such neural networks, for example to detect objects on images. Typically, in a training phase, known images are inputted to the neural network and a scoring system is used to adjust the neural network so that it behaves as expected on these known images. The neural networks are then used in a phase called a testing phase on actual images without any knowledge of the expected output.

The expression "neural network" used in the present application can cover a combination of a plurality of known networks.

The current neural network models are deterministic systems which maps inputs (for example images) to outputs (for example regressions or class probabilities). Thus, the presence of noise at the input is critical.

It has been proposed to use a method called "dropout" to take noise into consideration during the training time. This method has been disclosed in document "Dropout: A Simple Way to Prevent Neural Networks from Overfitting" by Srivastava et al. (Journal of Machine Learning Research 15 (2014) 1929-1958), and in document "Fast dropout training" by Wang, S and Manning, C (Proceedings of the 30th International Conference on Machine Learning, PMLR 28(2):118-126, 2013).

In these methods, noises are replaced with an expectation inside each layer of the neural network.

More specifically, document "Dropout: A Simple Way to Prevent Neural Networks from Overfitting" describes a method for regularizing a neural network by randomly zeroing neuron activations (i.e. "randomly sampling which neurons to update") during each training iteration. The zeroing of the activations effectively changes the topology of the network, and thus the outcome of the training process can therefore be seen as the "expectation over an ensemble of networks". Training the network in this way usually improves the test-time accuracy; however this is obtained at the cost of significantly slower training speed.

Document "Fast dropout training" describes a way to approximate the effect of dropout analytically, thus obviating the need for the random sampling which causes the slowdown during training.

These methods do not provide a way to model the noise of the input data (e.g. the noise in the input image), i.e. there is no equivalent of the "variance propagation" proposed in this invention

The methods of the prior art are too specific and not suitable for various applications. Also, the methods of the prior art do not provide a way to model the noise in the input data.

Thus, there is a need for improved methods to process data using neural networks.

Summary of the disclosure

The present disclosure overcomes one or more deficiencies of the prior art by proposing a method for processing input data using a neural network (preferably a feed-forward neural network) comprising a plurality of consecutive neural network layers, the method comprising:

- obtaining at least a first variance value associated with the input data,

- processing the input data, at least the first variance value, through the neural network, processing the input data comprising, by each neural network layer of the plurality of consecutive neural network layers, a processing,

wherein the method comprises, after each processing by a neural network layer of the plurality of consecutive neural network layers, a determination of at least an intermediate variance value.

The invention therefore proposes to propagate variance values through the neural network, by performing a determination of at least an intermediate variance value after each processing by a neural network layer of the plurality of neural network layers.

By using a variance value inputted to the neural network, it is possible to obtain an improved expectation of the neurons.

In other words, when the input is uncertain, it is possible to obtain a better approximation of the actual output.

An uncertain input may be caused by sensor noise if a sensor is used to acquire the input data. It may also be an input from a computational sensor (for example a depth map computed inside a structured light sensor), or an input from another neural network.

Thus, the noise at the input can be modelled using the above method because the first variance value is an approximation of this noise.

By way of example, if the neural network layers are of the sigmoid type, then using the variance of the input may improve the stability or the robustness of the neural network.

According to an embodiment, the determination of at least an intermediate variance value is performed through analytical calculations.

According to an embodiment, the analytical calculations use at least an intermediate variance value determined after a processing by the preceding neural network layer in the plurality of consecutive neural network layers (i.e. the neural network layer right before the one where the calculation is performed).

Thus, the variance is calculated using previously obtained variance values so as to propagate through the neural network until reaching the output where the variance is associated with the output.

According to an embodiment, each neural network layer has a type, and the analytical calculations take the type of the preceding (the one that has just processed data before the calculation is carried out) neural network layer into account.

For example, a different formula may be used for each type of neural network layer.

This allows obtaining a modular approach, wherein each layer is seen as a module associated with a specific calculation depending on the type of layer. Thus, it is possible to use the above defined method on various known neural networks, such as known convolutional neural networks or known feed-forward neural networks.

According to an embodiment, the input data has a plurality of input components each associated with a first variance value.

According to an embodiment, each neural network layer receives a respective first number of input components (for example from the preceding neural network layer if there is a preceding neural network layer) and a first or intermediate variance value for each input component,

and wherein each neural network layer delivers a respective second number of output components, and an intermediate variance value for each output component is determined. It should be noted that the number of components at the input may differ from the number of components at the output for a given layer.

According to an embodiment, the method is performed during a testing phase and the first variance value is predetermined or the first variance values are predetermined.

This testing phase is defined as the phase during which input data is inputted without a priori knowledge of the expected output of the neural network. It is the phase which succeeds the training phase which is well known to the person skilled in the art.

By using a predetermined first variance value, it is possible to set a variance value which represents the input noise. This predetermined first variance value may be obtained during a calibration step.

By way of example, in case of an image inputted to the neural network, the noise estimate (another words the variance value) could be obtained by varying the exposure and gain settings of the sensor used to collect this image while observing a calibration pattern. The exposure and gain settings have a direct effect on pixel noise and this could be determined from the recorded images so as to obtained this variance value.

According to an embodiment, each input component of the input data is associated with a respective predetermined first variance value.

In other words, if the input has a plurality of components, it is possible to apply a predetermined variance on each component.

By way of example, if the input data is an image, each pixel value of this image is a component of the input data, and it is possible to apply a different variance value to each pixel according to, for example, the type of sensor used.

According to an embodiment, the method is performed during a training phase of the neural network, the input data belonging to a set of training input data,

and wherein the first variance value relates to a variance over the set of training input data or the first variance value is predetermined.

In the training phase, a set of known training input data is inputted to the neural network so as to observe the output of the neural network and adjust the neural network.

If the input data has a plurality of components, the variance for each component may be the variance for all the possible values of this component in the set of training input data. Alternatively, the first variance value is predetermined, for example set at a constant value in a prior step. This value may be chosen to correspond to sensor noise (if a sensor is used to acquire the input data) or the input variance of a system/sensor that provides an estimate of the component of the input data.

According to an embodiment, at least a first mean value associated with the input data is obtained, the first mean value relating to a mean over the set of training input data.

The mean and variance over the set of training input data can be inputted to the neural network in order to estimate the statistics of inner neurons over the dataset. Such statistics are not used for recognition or classification of a specific image, but may be needed in a subsequent normalization (to initialize and better condition the network training). In this embodiment, if the input data has a plurality of components, the mean for each component may be the mean for all the possible values of this component in the set of training input data.

It should be noted that when the mean is used, during training, all the above defined embodiments (except the ones relating to testing) which disclose how variances are calculated also apply to mean values.

For example, the method may comprise, after each processing by a neural network layer of the plurality of consecutive neural network layers, a determination of at least an intermediate mean value. That is to say, a propagation of the mean value is obtained.

The intermediate mean value may be determined through analytical calculations, for example using at least an intermediate mean value determined after a processing by the preceding neural network layer in the plurality of neural network layers. These analytical calculations may also take the type of the preceding neural network layer into account.

Each input component may be associated with a first mean value.

Also, in the testing phase, the same method may apply with a different first mean value which may be the sensed input if a sensor is used. Using a first variance value with this means implies that the noise is taken into account and this noise is represented by the first variance value. Thus, the mean is propagated along with the variance in a similar manner.

In the prior art techniques, there is no variance value and only the sensed input is used, this assumes a noise-free acquisition of input data.

Each neural network layer may receive a respective first number of input components and a first or intermediate mean value for each input component, and each neural network layer may deliver a respective second number of output components, and an intermediate mean value for each output component is determined.

According to an embodiment, the input data is an image.

By way of example, the image can be a color image such as an RGB (Red- Green-Blue) image known to the skilled person.

According to a second aspect, the invention also relates to a system, for example implemented by one or more computers, comprising a neural network for processing input data, the neural network comprising a plurality of consecutive neural network layers, the system being configured to:

- obtain at least a first variance value associated with the input data,

- process the input data, at least the first variance value, through the neural network, processing the input data comprising, by each neural network layer of the plurality of consecutive neural network layers, a processing, and

- after each processing by a neural network layer of the plurality of consecutive neural network layers, determine at least an intermediate variance value.

This system can be configured to carry out all the embodiments of the method for processing data described above.

In one particular embodiment, the steps of the method for processing input data are determined by computer program instructions.

Consequently, the invention is also directed to a computer program for executing the steps of a method as described above when this program is executed by a computer.

This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form.

The invention is also directed to a computer-readable information medium containing instructions of a computer program as described above.

The information medium can be any entity or device capable of storing the program. For example, the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.

Alternatively, the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution. Brief description of the drawings

How the present disclosure may be put into effect will now be described by way of example with reference to the appended drawings, in which:

- figure 1 is a block diagram of an exemplary method for processing data according to an example,

- figure 2 is a block diagram of an exemplary system for processing data according to an example,

- figure 3 is a representation of a neural network,

- figure 4 is a representation of a neural network layer,

- figure 5 shows the output of a rectifier linear unit layer, and

- figure 6 is an example of neural network and of its output.

Description of the embodiments

An exemplary method and system for processing data using neural networks will be described hereinafter. In the following examples, the data may be an image or a set of images.

The invention is however not limited to processing images and it may also relate to processing sounds or other types of data which can be processed using neural networks.

Figure 1 is a block diagram of an exemplary method for processing input data, which may be data acquired by a sensor during a testing phase, or input data from a set of input data used during a training phase.

A first step E01 is carried out in which first mean and variance values are obtained. The method of figure 1 is performed during a training phase. The input data may be an image belonging to a set of training images. This image comprises a plurality of pixels each having a pixel value and for each pixel, the mean is calculated for each pixel over the entire set of training images. The variance is also calculated for each pixel on the basis of the value of this pixel in each training image.

It should be noted that after training, during testing, instead of the mean value, the actual pixel value is used. During testing, the first variance values can be set at predetermined values, for example one value per pixel.

The input data, is inputted to the neural network after step E01. The first mean and variance values may also be inputted to the neural network. Even if the first mean and variance values are not used by the layers of the neural network, they will be used in intermediate calculations carried out between the layers of the neural network.

When the variance is zero the propagation method gives identical result to the standard propagation of the mean values. If the variance is positive, the propagation approximates the effect of propagating multiple samples through the layer that are distributed normally with the input mean and variance.

In step E02, processing is performed by a first neural network layer of a neural network, for example a feed-forward neural network. The neural network comprises a plurality of consecutive neural network layers.

The first layer may be, for example a linear layer, or even a rectifier linear unit. This first layer processes the input data and transmits the processed input data, along with the first mean and variance values to the following neural network layer.

Before performing processing by the following neural network, a calculation of intermediate variance and mean values is carried out in step E03 (during testing, only the intermediate variances may be calculated). This calculation may be an analytical calculation and the formulas to be used may depend on the type of the preceding neural network layer (the first layer in this case). The calculation uses the first mean and variance values.

After each processing, a similar calculation is carried out so as to propagate the variance values and the mean values (during training).

The final processing by a neural network layer is performed in step E04 and a final calculation of intermediate variances and means is carried out in step E05. This calculation takes into account the type of neural network layer of the final neural network layer and it also uses the variance and mean values, which have been determined in relation to the neural network layer which precedes the final neural network layer.

It should be noted that in the above method, each neural network receives a respective first number of input components and, for each input component, a first or intermediate variance value (only the first neural network layer will receive a first variance value, the following neural network layers will receive intermediate variance values) and a first or intermediate mean value.

Also, each neural network layer delivers a respective second number of output components, and in the subsequent calculation step, an intermediate variance value and an intermediate mean value for each output component is determined.

The steps of the method described in reference to figure 1 can be determined by computer instructions. These instructions can be executed on one or more computers. In the example of figure 2, a single computer is used to perform the method.

On this figure, a system 200 has been represented. This system comprises an image sensor 201 which acquires input data in the form of images when a neural network 203 comprised in the system is to be tested.

When the system is used during training, a module 201' is used to obtain an image from a set of training images.

Module 202 is used to obtain one or more variance value and one or more mean values (especially during training).

The modules 201, 20 and 202 can be implemented by computer instructions which can be executed by processor 210.

The neural network 203 comprises a first neural network layer 204, a module 205 for calculating intermediate variance and mean values associated with this first neural network layer 204.

The neural network 203 may comprise a plurality of additional layers and calculation modules which have not been represented for the sake of conciseness. The final neural network layer 206 has been represented with the associated module for calculating intermediate variance and mean values.

Figure 3 shows a Bayesian network representation of a neural network wherein the input/outputs of neural network layers are denoted by X ^{k } with each component being represented by a circle (with k an index indicating the layer). Each neural network layer is represented by the probability of having a corresponding output based on the input of the layer, namely p(X ^{k+1 }|X ^{k }).

The probability p(X ^{k+1 }|X ^{k }) may define a deterministic relation, for example if:

X ^{k } = wx ^{K~x }

The probability p(X ^{k+1 }|X ^{k }) may also define a stochastic relation, for example if:

Wherein Z~N (0,1) means that Z is a random variable that has a standard normal distribution. The inventors of the present invention have noted that this representation allows expressing the probability p(X ^{k }\x° for a given output component noted with the index i of a given neural network layer through a similar probability of the preceding layer, leading to the recurrent (which may also be called feed-forward) calculation:

Wherein E is the expectation, and capital X ^{k } denotes a random variable, while small x ^{k } denotes a specific value taken by X ^{k }, identified also with the event that that variable takes this value.

The inventors have observed that it is possible to consider a factorized approximation of the posterior distribution p(X ^{k }\x°) which is:

The inventors have observed that knowing the mean and variance of X ^{k } fully specifies the normal distribution for continuous variables which may be used to approximate q(X ^{k }) in the case of X ^{k } being continuous and Bernoulli distribution for discrete binary variables which define q(X ^{k } ) in a precise manner in the case of x ^{k } being a discrete binary variable. This allows for a universal treatment of both cases using mean and variance statistics. In the case when X ^{k } is multinomial (for example categorical), which may be used in the last layer of a classification network, the full non-parametric form of q(X ^{k }) may be computed.

Factorized approximations are commonly used in variational inference methods. The approximation is possible, because in many recognition or regression problems, when the network input x° is specified, the corresponding class label or the regression value is unique. When the input x° is specified with some uncertainty, the states of variables in the hidden layers tend to have unimodal distributions concentrated around their mean values. According to the present disclosure, the factorized approximation q can be computed layer by layer by substituting the already approximated quantities in the exact recurrent expression above. The following equation is obtained:

In the case of X ^{k } being a discrete Bernoulli variable, this leads to the following equations for the mean p _{t } and the variance a _{t }:

°i = l - i)

For continuous variables X ^{k } the mean m, and the variance s _{( }· become:

When p(x ^{k } = takes the form of a deterministic mapping X ^{k } = /(Y ^{fc_1 }) as in neural networks, the mean m* and the variance s _{έ } can be simplified into:

Thus, it is possible to approximate the mean and variance between each layer of a neural network. The expectations of / and of f ^{2 } mentioned above, known as moments, may be computed analytically under the assumption that X ^{k } is normally distributed with a known mean and variance, as has been assumed for continuous random variables. The skilled person will know how to obtain or approximate these moments for different functions /.

Figure 4 is a schematic illustration of a neural network layer. This layer receives four input component noted with the index i and having a mean m, and a variance s _{£ }.

On the output, the neural network layer delivers four output components each having a mean m'; and a variance s .

By way of example, the inventors of the present invention have observed that the mean and variance calculations can be written as follows, according to the type of neural network layer. In the following equations, X is a vector of all the input components, Y as a vector of all the output components of the neural network layer.

For a linear layer of coefficient W (a vector of components w _{u } with j and i indexes corresponding to components):

For a Heaviside layer Y = x ³ 01:

m' = F (m/s ) s; ^{2 } = m'(1 - m')

Where F is the cumulative distribution function of the standard normal distribution. For a rectifier linear unit Y = max(0,;T):

m' = mF(c) + sf (c )

s' ^{2 } = s ^{2 }K(c)

Wh the probability density function of standard normal distribution

cf(c) + (x ^{2 } + 1)F(c)— (cF(c) + f (c)) ^{2 }

On figure 5, the output of a rectifier linear unit has been represented with the calculated mean and the variance.

This figure illustrates that the propagation of mean and variance through a rectified linear unit, which is a mapping of 2 input values into 2 output values, can be efficiently calculated using only two functions of a scalar variable, the ratio x = These functions plotted in the figure define the propagation rules.

On figure 6, an exemplary embodiment of neural network has been represented in which a final Softmax layer well known to the skilled person is used for classification purposes.

On this figure, two output components have been also represented in graphs in which:

- API is the result of feed-forward propagation of a clean image with no noise.

- AP2 is the result of the propagation of the same image with noise using the method of the present invention.

- MC is the result of a Monte-Carlo simulation obtained by inputting several times (for example 10000 times) this image with random noise each time without using the method of the invention. The results are sampled each time and the output is represented.

It can be seen that the method of the invention provides distributions (Gaussian using the mean and variances) which correspond to MC.

The histogram shows what is the probability to see a particular value of the neuron output in a network with noisy input. It is seen that AP2 provides usable estimates of this probability distribution because it is close to MC.

Thus, the invention provides a good approximation of the output of a neural network by propagating the mean and the variance, and, during training, the mean. This method may be used to perform normalisation as disclosed in the international patent application filed by the same applicants on the same day as the present application and titled "Method and system for processing input data using a neural network and normalizations", which is incorporated entirely to the present application.

**Previous Patent:**SYSTEMS AND METHODS OF HANDLING TUBERS

**Next Patent: METHOD AND SYSTEM FOR PROCESSING INPUT DATA USING A NEURAL NETWORK AND NORMALIZATIONS**