**METHOD AND SYSTEM FOR PROCESSING INPUT DATA USING A NEURAL NETWORK AND NORMALIZATIONS**

SHEKHOVTSOV, Oleksandr (Nedvezska 2231/18, Prague, 100 00, CZ)

FLACH, Boris (Tetschener Straße 28, Dresden, 01277, DE)

CZECH TECHNICAL UNIVERSITY IN PRAGUE (Faculty of Electrical Engineering, Czech Technical UniversityPraha, Technicka 2 ., 166 27, CZ)

*;*

**G06N3/08***G06N3/04*

US20160217368A1 | 2016-07-28 | |||

US20160217368A1 | 2016-07-28 |

JIMMY LEI BA ET AL: "Layer Normalization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 21 July 2016 (2016-07-21), XP080714068

CLAIMS 1. A method for processing input data using a neural network comprising a plurality of consecutive neural network layers, each neural network layer delivering output components, the method comprising: - for each output component of each neural network layer, obtaining (S02, S05) an intermediate mean value and an intermediate variance value associated with this output component of the neural network layer, - for each output component of each neural network layer, normalizing (S03, S06) this output component using the obtained intermediate mean and variance values associated with this output component of the neural network layer. 2. The method according to claim 1, wherein the output component is normalized using the formula: wherein Xy is the output component of index j of the neural network layer of index i, m 3. The method according to claim 1 or 2, wherein the method is performed during a training phase of the neural network using a set of training data each comprising a plurality of components respectively corresponding to input components of the first neural network layer, the method further comprising: - obtaining a first mean value and a first variance value for each input component of the first neural network layer, wherein each first mean value is calculated as the mean of all the values of this component for all the training data of the training data set, and wherein each first variance value is calculated as the variance over all the values of this component for all the training data of the training data set, - for each output component of the first neural network layer, calculating the intermediate mean value and the intermediate variance value using at least a first variance value and a first mean value, - for each output component of each neural network layer after the first neural network layer, calculating the intermediate mean value and the intermediate variance value using at least an intermediate value and an intermediate mean value associated with the preceding neural network layer. 4. The method according to claim 3, wherein calculating intermediate mean values and intermediate variance values is performed using an analytical calculation. 5. The method according to claim 4, wherein the analytical calculation depends on the type of neural network layer associated with the mean and variance values. 6. The method according to any one of claims 3 to 5, wherein the method is performed during a testing phase following the training phase, and the intermediate variance and mean values determined during training are used to perform the normalizations. 7. The method according to any one of claims 1 to 6, wherein the input data is an image. 8. A system comprising a neural network for processing input data, the neural network comprising a plurality of consecutive neural network layers, each neural network layer delivering output components, the system being configured to: - for each output component of each neural network layer, obtain an intermediate mean value and an intermediate variance value associated with this output component of the neural network layer, - for each output component of each neural network layer, normalize this output component using the obtained intermediate mean and variance values associated with this output component of the neural network layer. 9. A computer program including instructions for executing the steps of a method according to any one of claims 1 to 7 when said program is executed by a computer. 10. A recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method according to any one of claims 1 to 7. |

Field of the disclosure

The present disclosure is related to the field of data processing using neural networks, for example image processing using neural networks.

Description of the Related Art

It has been proposed to use neural networks, for example convolutional neural network or feed-forward neural networks, to process data.

By way of example, it has been proposed to process images through such neural networks, for example to detect objects on images. Typically, in a training phase, known images are inputted to the neural network and a scoring system is used to adjust the neural network so that it behaves as expected on these known images. The neural networks are then used in a phase called a testing phase on actual images without any knowledge of the expected output.

The expression "neural network" used in the present application can cover a combination of a plurality of known networks.

It has been proposed in document US 2016 0217368 to perform normalizations between the layers of a neural network. Normalizations compensate the co-variate shift during training and also solve the issue of vanishing/exploding gradients.

The solution of document US 2016 0217368 presents the following drawbacks:

- The testing phase requires an up-to-date dataset of statistics used for normalization. It is difficult to keep track of the training/validation error during the training phase.

- The application of this method to recurrent networks is problematic because one has to unfold running statistics over the recurrence.

- The computation overhead is not negligible. The behavior of this solution is also more stochastic than other methods.

- The effectiveness depends on the size of the training batch

- Application in generative adversarial networks has been observed to destabilize the training.

Thus, there is a need for improved methods to process data using neural networks. Summary of the disclosure

The present disclosure overcomes one or more deficiencies of the prior art by proposing a method for processing input data using a neural network comprising a plurality of consecutive neural network layers, each neural network layer delivering output components, the method comprising:

- for each output component of each neural network layer, obtaining an intermediate mean value and an intermediate variance value associated with this output component of the neural network layer,

- for each output component of each neural network layer, normalizing this output component using the obtained intermediate mean and variance values associated with this output component of the neural network layer.

The invention therefore proposes to obtain intermediate mean and variance values which may differ for each output component of a neural network layer, thereby improving the normalization step.

The normalization can re-initialize the neural network to a point where gradients do not vanish or explode so that if a training of the neural network is carried out it may start efficiently.

The normalization also compensates the internal co-variant shift during the training, it preconditions the training by choosing a normalized parameterization of the model.

The normalization also introduces an internal scale and bias-free representation of hidden units that can be used for a scale-free regularization such as injecting a noise with a fixed distribution. Normally, if an additive noise is injected in a neural network layer, its effect is dependent on the initial scaling of the data or the learned weights of the network. If however an additive noise is injected after the normalization layer, its effect is independent of the global scaling of the input data or of the layer linear coefficients.

It should be noted that while the normalization steps may be implemented by layers inserted in the neural network, in the present application, the expression neural network layer is directed at the layers which are between two consecutive normalization layers (or between a normalization layer and the input of the neural network or the output of the neural network). Thus, a neural network layer may comprise one or more sub-layer each associated with a specific function.

It should be noted that a pair formed by an intermediate mean value and an intermediate variance value may be called a neuron statistic. According to an embodiment, the output component is normalized using the formula:

Xij - i/

IJ

wherein _{i } is the output component of index j of the neural network layer of index i, m _{ί } and being the associated mean and variance values.

According to an embodiment, the method is performed during a training phase of the neural network using a set of training data (sometimes called a batch which has a batch size equal to the number of possible input data in the training data, for example the number of images) each comprising a plurality of components respectively corresponding to input components of the first neural network layer, the method further comprising:

- obtaining a first mean value and a first variance value for each input component of the first neural network layer (the one where the input data is inputted), wherein each first mean value is calculated as the mean of all the values of this component for all the training data of the training data set, and wherein each first variance value is calculated as the variance over all the values of this component for all the training data of the training data set,

- for each output component of the first neural network layer, calculating the intermediate mean value and the intermediate variance value using at least a first variance value and a first mean value,

- for each output component of each neural network layer after the first neural network layer, calculating the intermediate mean value and the intermediate variance value using at least an intermediate value and an intermediate mean value associated with the preceding neural network layer.

This embodiment uses a set of training data, in other words a plurality of known input data for the neural network which can be used for training.

There is no dependency on the batch size.

Also, the above method imposes very low overhead in the propagation (for example, in the case of image processing by a network consisting of convolutional layers and coordinate- ise non-linear layers, the method can be performed on an input image of size 1 pixel with input data mean and input data variance in order to calculate the normalization, which is very fast in comparison with processing full images).

This embodiment allows propagating the mean and variance by calculating new values which take into account the preceding values of mean and variance. It has been observed that this allows obtaining better normalization results.

It should be noted that if the input data is an image, the set of training data is a set of training images. The first mean value is the mean, for a given pixel at a location, of all the pixel values of the pixels at this location on all the images of the set of training images. The first variance value is the variance, for a given pixel at a location, over all the pixel values of the pixels at this location on all the images of the set of training images. If the network is not convolutional then the first mean and variance values at different pixels may be used in the method without averaging them into a single pixel representation of these statistics as is suitable in the case of convolutional networks.

According to an embodiment, calculating intermediate mean values and intermediate variance values is performed using an analytical calculation.

According to an embodiment, the analytical calculation depends on the type of neural network layer associated with the mean and variance values.

In this embodiment, each neural network layer has a type, and the analytical calculations take the type of the preceding neural network layer into account. For example, a different formula may be used for each type of neural network layer.

This allows obtaining a modular approach, wherein each layer is seen as a module associated with a specific calculation depending on the type of layer. Thus, it is possible to use the above defined method on various known neural networks, such as known convolutional neural networks or known feed-forward neural networks.

According to an embodiment, the method is further performed during a testing phase following the training phase (ie as defined above in which the method has already been performed to calculate intermediate means and variances), and the intermediate variance and mean values determined during training are used to perform the normalizations.

During the training phase, intermediate means and variance values have been determined; during the testing phase these values may be used again.

Using these values during the testing ensures that the testing performance of the network matches the training validation performance.

According to an embodiment, the input data is an image.

By way of example, the image can be a color image such as an RGB (Red- Green-Blue) image known to the skilled person. According to a second aspect, the invention also relates to a system comprising a neural network for processing input data, the neural network comprising a plurality of consecutive neural network layers, each neural network layer delivering output components, the system being configured to:

- for each output component of each neural network layer, obtain an intermediate mean value and an intermediate variance value associated with this output component of the neural network layer,

- for each output component of each neural network layer, normalize this output component using the obtained intermediate mean and variance values associated with this output component of the neural network layer.

This system can be configured to carry out all the embodiments of the method for processing input data as described above.

In one particular embodiment, the steps of the method for processing input data are determined by computer program instructions.

Consequently, the invention is also directed to a computer program for executing the steps of a method as described above when this program is executed by a computer.

This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form.

The invention is also directed to a computer-readable information medium containing instructions of a computer program as described above.

The information medium can be any entity or device capable of storing the program. For example, the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.

Alternatively, the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.

Brief description of the drawings

How the present disclosure may be put into effect will now be described by way of example with reference to the appended drawings, in which:

- figure 1 illustrates the initialization issue according to the prior art, - figure 2 is a block diagram of an exemplary method for processing data according to an example,

- figure 3 is a block diagram of an exemplary system for processing data according to an example,

- figure 4 is a representation of a neural network used during a training phase, and

- figure 5 is a representation of the same neural network layer used during a testing phase.

Description of the embodiments

Figure 1 illustrates a neural network according to the prior art which is configured to process input data in the form of images such as image 100 which comprises characters. The neural network of this figure may be configured to detect these characters.

A first neural network layer 101 processes the image 100 and delivers output components to a second neural network layer 102 which delivers an output 103.

During training, several images are inputted to the neural network formed by neural network layer 101 and neural network layer 102.

It can happen that all the input components to a neuron of a neural network layer are in a saturated part of the activation function of the following neuron. This is shown as the distribution 104 which is in a saturated part of a sigmoid function which may be the activation function of the following neuron in neural network layer 102.

This prevents (or at best significantly slows down) the optimization of the neural network parameters by stochastic gradient descent methods or similar methods because the gradient in the illustrated neuron's input is close to zero for all training examples.

Thus, normalization is needed to allow better initialization (start in a non- saturated regime) and normalization (a re-parametrization which better conditions the gradient descent used to adjust the neural network).

An exemplary method and system for processing data using neural networks will be described hereinafter. In the following examples, the data may be an image or a set of images. The invention is however not limited to processing images and it may also relate to processing sounds or other types of data which can be processed using neural networks.

Figure 2 is a block diagram of an exemplary method for processing data according to an example. The steps described hereinafter in relation to figure 2 are performed during a training phase of the neural network. Thus, the input data (here an image) belongs to a set of training data.

In a first step SOI, the first neural network layer processes the components of the input data.

Then, in a second step S02, intermediate mean and variance values are obtained by an analytical calculation which depends on the type of the first neural network layer. The calculated mean and variance values are associated with the first neural network layer and each output component of this first neural network layer is associated with a variance value and a mean value.

This calculation uses first mean and variance values which are, for each component of the input data:

- The mean value of this component over the set of training data (for example the mean value for a specific pixel in each training image), and

- The variance value of this component over the set of training data (for example the variance value of a specific pixel over each training image).

By way of example, if the first neural network is a linear layer of coefficient W _{1 } having components w _{lji f } the output component of index j of the first neural network layer will have the following mean m' and variance values:

For a linear layer of coefficient w (a vector of components w _{lji } with j and i indexes corresponding to components):

A linear layer may also be a convolutional layer.

Wherein m _{oί } is the first mean value for component i and is the first variance value for component i.

Other formulas may be used in accordance with the type of neural network layer.

For example, the method for determining mean and variance values associated with a neural network layer may be the one disclosed in the international patent application filed by the same applicants on the same day as the present application and titled "Method and system for processing input data and propagating variance in a neural network", which is incorporated entirely to the present application.

Afterwards, a step S03 is performed in which the output components of the first neural network layer are normalized using the following formula for the component of index j:

Xlj l j

si j

wherein C _{ί} } is the output component of index j of the first neural network layer.

Steps SOI, S02, and S03 are performed again for each subsequent neural network layer.

Steps S04, S05 and S06 respectively correspond to steps SOI, S02, and S03 but relate to the final neural network layer.

In step S05, the intermediate mean m _{h] } and variance values will be calculated using the following values for component j:

Pn-lj

an-lj

In step S06, the following formula will be used to normalize the output of the final neural network layer:

The steps of the method described in reference to figure 2 can be determined by computer instructions. These instructions can be executed on one or more computers. In the example of figure 3, a single computer is used to perform the method.

On this figure, a system 300 has been represented. This system comprises an image sensor 301 which acquires input data in the form of images when a neural network 304 comprised in the system is to be tested.

When the system is used during training, a module 302 is used to obtain an image from a set of training images.

Module 303 of system 300 is used to obtain first variance values and first mean values.

The modules 301, 302 and 303 can be implemented by computer instructions which can be executed by processor 311 of the system 300. The neural network 304 comprises a first neural network layer 305, a module 306 for calculating intermediate variance and mean values associated with this first neural network layer 305, and a module 307 for normalizing the output components of the first neural network layer 305. The modules and layer 305 to 307 are configured to perform steps SOI to S03 described in reference to figure 2.

Not all the neural network layers and modules have been represented on figure 3 for the sake of conciseness. The final neural network layer 308 has been represented along with a module 309 to calculate intermediate variance and mean values associated with this final neural network layer and a module 310 for normalizing the output components of the final neural network layer 308. The modules and layer 308 to 310 are configured to perform steps S04 to S06 described in reference to figure 2.

Figure 4 is a partial representation of a neural network used during a training phase. Two neural network layers LI and L2 have been represented.

For the sake of simplicity, indexes indicating components have not been used on the figure.

In this example, the input image is referenced 400 and it is inputted to a first neural network layer 401 which performs a linear function with coefficient Wl.

First mean value m _{0 } and first variance value s are inputted to the neural network, but it should be noted that these are not to be considered as input to the first neural network layer.

These first mean and variance values will be used to compute intermediate mean m _{1 } and variance values s .

The normalization is then implemented by an intermediate layer 402 of the neural network which is implemented between the two neural network layers LI and L2.

The second neural network layer L2 comprises an affine sub-layer 403, a nonlinear layer 404 of function f(x) (for example a coordinate-wise transformation), and a linear layer 405.

Intermediate mean m and variance values Rvalues are used to determine new intermediate mean m _{2 } and variance values s _{2 } ^{2 }. These new intermediate mean m _{2 } and variance values s _{2 } will then be used in a normalization layer 406.

On figure 5, the neural network of figure 4 has been partially represented, in a situation which corresponds to a testing phase posterior to the training phase of figure 4. In other words, the method of figure 5 is only carried out once at least an image has been processed through the neural network during a training phase. Various mean and variance values have been calculated during the training phase, and these values are used in the normalization layers of figure 5.

When an image 500 is inputted to the first layer 501 (or LI), a normalization layer 502 normalizes this image using intermediate mean and variance values o values described in reference to figure 4. The output of the normalization layer is then processed in the second layer L2 (comprising a sublayer 503 and a sub-layer 504).

It should be noted that the same neural network is used in the training and testing phases, the only difference is that neuron statistics (intermediate mean and variance values) are not computed during the testing as they have already been computed during training.

It should be noted that the present disclosure is not limited to the type of layers included in the neural network.

Also, the normalizations may be performed by additional layers in the neural network, which should not be confused with the actual neural network layers.

The present disclosure also allows decoupling over blocks delimited by normalization layers and is applicable to recurrent neural network.

The present disclosure is also continuously differentiable (gradients of the normalization statistics (intermediate mean and variance values) change smoothly with the parameters during training).

The present disclosure is also non-stochastic (because the propagation of mean and variance values is performed through analytical calculations).

**Previous Patent:**METHOD AND SYSTEM FOR PROCESSING INPUT DATA AND PROPAGATING VARIANCE IN A NEURAL NETWORK

**Next Patent: METHOD FOR PROCESSING ARTICLES AND METHOD FOR HIGH-PRESSURE TREATMENT OF ARTICLES**