Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NEURAL NETWORK FOR VARIABLE BIT RATE COMPRESSION
Document Type and Number:
WIPO Patent Application WO/2021/001594
Kind Code:
A1
Abstract:
Example embodiments provide a neural data compression network which may be configured to output variable bit rate codes and a decompression network capable of decompressing the variable bit rate codes. This is achieved based on training an encoder network to be capable of outputting variable size activations. Output activations may be divided into a plurality of blocks and a subset of the blocks may be selected based on a desired quality level. A decoder network may be trained as part of an auto-encoder comprising the encoder network. Apparatuses, methods, and computer programs are disclosed.

Inventors:
AYTEKIN CAGLAR (FI)
CRICRI FRANCESCO (FI)
Application Number:
PCT/FI2020/050403
Publication Date:
January 07, 2021
Filing Date:
June 10, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA TECHNOLOGIES OY (FI)
International Classes:
G06N3/08; G06N20/00; G06T9/00; H04N19/102; H04N19/192
Domestic Patent References:
WO2020165490A12020-08-20
Foreign References:
US10192327B12019-01-29
Other References:
AYTEKIN, C. ET AL.: "Block-optimized Variable Bit Rate Neural Image Compression", ARXIV.ORG, May 2018 (2018-05-01), pages 1 - 4, XP080883107, Retrieved from the Internet [retrieved on 20200129]
ASHOK, A.K. ET AL.: "Autoencoders with Variable Sized Latent Vector for Image Compression", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) WORKSHOPS, 2018, pages 2547 - 2550, XP055784045, Retrieved from the Internet [retrieved on 20200908]
HETTINGER, C. ET AL.: "Forward Thinking: Building and Training Neural Networks One Layer at a Time", ARXIV.ORG, June 2017 (2017-06-01), pages 1 - 9, XP080768397, Retrieved from the Internet [retrieved on 20200129]
See also references of EP 3994625A4
Attorney, Agent or Firm:
NOKIA TECHNOLOGIES OY et al. (FI)
Download PDF:
Claims:
CLAIMS

1. An apparatus, comprising:

at least one processor; and

at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to:

receive an input at a neural network comprising a plurality of blocks;

receive an indication of a quality level;

determine a subset of the blocks based on the indication of the quality level; and

provide an output from the neural network based on the received input and the subset of the blocks.

2. The apparatus according to claim 1, wherein a block comprises a predetermined number of weights or filters.

3. The apparatus according to claim 1 or claim 2, wherein the quality level indicates a number of the blocks for providing the output.

4. The apparatus according to any preceding claim, wherein the neural network comprises N blocks and wherein a quality level Q indicates providing an output from first Q blocks in an order of blocks and not providing an output from last N-Q blocks in the order of blocks.

5. The apparatus according to any preceding claim, wherein the apparatus is further caused to:

train an ith block based on keeping parameters of i-1 previous blocks substantially frozen, setting last N-i blocks to zero, and updating parameters of the ith block, wherein N is a number of blocks at the neural network.

6. The apparatus according to any preceding claim, wherein the plurality of blocks are non-overlapping and/or wherein the plurality of blocks are located at an output layer of the neural network.

7. The apparatus according to claim 6, wherein training the ith block further comprises updating parameters of one or more preceding layers of the neural network.

8. The apparatus according to any preceding claim, wherein the neural network comprises an encoder network of an auto encoder neural network, and wherein the input comprises image data, video data, audio data, or a representation of another neural network.

9. A method, comprising:

receiving an input at a neural network comprising a plurality of blocks;

receiving an indication of a quality level;

determining a subset of the blocks based on the indication of the quality level; and

providing an output from the neural network based on the received input and the subset of the blocks.

10. The method according to claim 9, wherein a block comprises a predetermined number of weights or filters.

11. The method according to claim 9 or claim 10, wherein the quality level indicates a number of the blocks for providing the output. 12. The method according to any of claims 9 to 11, wherein the neural network comprises N blocks and wherein a quality level Q indicates providing an output from first Q blocks in an order of blocks and not providing an output from last N-Q blocks in the order of blocks.

13. The method according to any of claims 9 to 12, further comprising :

training an ith block based on keeping parameters of i-1 previous blocks substantially frozen, setting last N-i blocks to zero, and updating parameters of the ith block, wherein N is a number of blocks in the neural network.

14. A computer program comprising program code configured to cause an apparatus at least to:

receive an input at a neural network comprising a plurality of blocks;

receive an indication of a quality level;

determine a subset of the blocks based on the indication of the quality level; and

provide an output from the neural network based on the received input and the subset of the blocks.

15. An apparatus comprising:

a decoder neural network obtainable by a training method comprising:

training an ith block of an encoder neural network of an auto-encoder based on keeping parameters of i-1 previous blocks substantially frozen, setting last N-i blocks to zero, and updating parameters of the ith block, wherein N is a number of blocks in the encoder neural network, and wherein training the ith block further comprises updating parameters of the decoder neural network.

Description:
NEURAL NETWORK FOR VARIABLE BIT RATE COMPRESSION

TECHNICAL FIELD

[0001] The present application generally relates to artificial intelligence, machine learning and neural networks. In particular, some embodiments of the present application relate to compression and/or decompression of data, for example image data, video data, audio data, or representations of neural networks .

BACKGROUND

[0002] Neural networks (NN) may be utilized for many different applications in many different types of devices, such as mobile phones. Examples of technologies include image and video analysis and processing, social media data analysis, device usage data analysis, or the like. A neural network is a computation graph comprising several layers of computation. A layer may comprise one or more units, for example nodes. Each unit may be configured to perform an elementary computation. A unit may be connected to one or more other units and this connection may be associated with a weight. The weight may be used for scaling the signal passing through the associated connection. Weights are an example of learnable parameters, which may be updated during a training process to train the neural network for a particular task.

SUMMARY

[0003] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. [0004] Example embodiments provide a neural network that is capable of compressing and/or decompressing data with configurable quality and data rate. This is achieved by the features of the independent claims. Further implementation forms are provided in the dependent claims, the description and the figures.

[0005] According to a first aspect, an apparatus comprises at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: receive an input at a neural network comprising a plurality of blocks; receive an indication of a quality level; determine a subset of the blocks based on the indication of the quality level; and provide an output from the neural network based on the received input and the subset of the blocks.

[0006] According to a second aspect, a method comprises receiving an input at a neural network comprising a plurality of blocks; receiving an indication of a quality level; determining a subset of the blocks based on the indication of the quality level; and providing an output from the neural network based on the received input and the subset of the blocks .

[0007] According to a third aspect, a computer program is configured, when executed by an apparatus, to cause the apparatus at least to: receive an input at a neural network comprising a plurality of blocks; receive an indication of a quality level; determine a subset of the blocks based on the indication of the quality level; and provide an output from the neural network based on the received input and the subset of the blocks. [0008] According to a fourth aspect, an apparatus comprises means for receiving an input at a neural network comprising a plurality of blocks; means for receiving an indication of a quality level; means for determining a subset of the blocks based on the indication of the quality level; and means for providing an output from the neural network based on the received input and the subset of the blocks.

[0009] Many of the attendant features will be more readily appreciated as they become better understood by reference to the following detailed description considered in connection with the accompanying drawings.

DESCRIPTION OF THE DRAWINGS

[001 0] The accompanying drawings, which are included to provide a further understanding of the example embodiments and constitute a part of this specification, illustrate example embodiments and together with the description help to explain the example embodiments. In the drawings:

[001 1] FIG. 1 illustrates an example of a compression and decompression system, according to an embodiment.

[001 2] FIG. 2 illustrates an example of an apparatus configured to train and/or execute a neural network, according to an embodiment;

[001 3] FIG. 3 illustrates an example of a neural network, according to an embodiment;

[001 4] FIG. 4 illustrates an example of an elementary computation unit, according to an embodiment;

[001 5] FIG. 5 illustrates an example of a convolutional neural network, according to an embodiment;

[001 6] FIG. 6 illustrates an example of an auto-encoder comprising an encoder neural network and a decoder neural network, according to an embodiment; [001 7] FIG. 7 illustrates an example of a method for training an auto-encoder, according to an example embodiment.

[001 8] FIG. 8 illustrates an example of a method for compressing data, according to an example embodiment.

[001 9] Like references are used to designate like parts in the accompanying drawings .

DETAILED DESCRIPTION

[0020] Reference will now be made in detail to example embodiments, examples of which are illustrated in the accompanying drawings. The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example may be constructed or utilized. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.

[0021] According to an example embodiment, an encoder neural network is configured to output a variable bit rate codes. This is achieved based on training the neural network such that the encoder network learns to output variable size activations. For example, an encoder layer, for example the last layer of the encoder neural network, may be divided into N subsets, for example N blocks. Each block may for example comprise M learnable parameters, for example weights or filters, but the number of parameters per block may be also variable. The N blocks may in one example be N different layers. For example, first block may be the second layer, second block may be the fourth layer. In another example, the N blocks may be parts of different layers of the encoder. For example, first block may be part of the first layer, second and third blocks may be part of the fourth layer. Moreover, each block may comprise parts of different layers. For example, first block may comprise part of first layer and part of second layer. Any combination of these possibilities may be applied in the example embodiments.

[0022] At each training iteration, only one block of the last encoder layer may be trained at a time, while previous blocks may be kept frozen and subsequent blocks may be set to zero and kept frozen. Setting a block to zero may comprise setting weights, activations, and/or other parameters associated with the block to zero. In the next iteration, the subsequent block is trained in a similar fashion. Keeping a set of weights frozen may refer to not training those weights during the considered training iteration. The order of training may be different from the order of locations of the blocks in the neural network. For example, a fourth block in the order of locations may be trained after a second block in the order of locations and before a third block in the order of locations. In this example, during training of the fourth block, the first block, the second block and the third block (in the order of locations) may be considered to be previous blocks and kept frozen. The fifth and all other blocks may be considered to be subsequent blocks and therefore set to zero and frozen.

[0023] After training, at inference stage, a user may select a certain quality or bit rate level, or directly a number K of blocks to be used for data compression. Then, the activations output by the first K blocks will be used, for example, stored or transmitted to the decoder side. Since the network has been trained to provide good compression for different number of blocks, an efficient compression for different data rates can be achieved with the same encoder network without compromising reconstruction quality.

[0024] FIG. 1 illustrates an example of a compression and decompression system 100 comprising a compressor 102 and a decompressor 104. Compressor 102 may be configured to receive input data I and produce an encoded representation C of the input data I, that is a code. The code C may be generated based on a quality level Q, which may be provided as a parameter to compressor 102. The code C may be delivered to decompressor 104 by various means, for example over a communication network. Alternatively, code C may be stored on a storage medium such as for example a hard drive or an external memory and retrieved form the memory by decompressor 104. According to an example embodiment, compressor 102 and decompressor 104 may be located at different devices, but they may be also located within a single device, for example as dedicated software and/or hardware components. Compressor 102 may be also referred to as an encoder, an encoder device, an encoder apparatus, or encoder software. Decompressor 104 may be also referred to as a decoder, a decoder device, a decoder apparatus, or decoder software. The code output by the encoder may be subject to additional processing steps such as binarization, quantization, and/or lossless coding.

[0025] FIG. 2 illustrates an example of an apparatus 200 according to an embodiment, for example compressor 102 or decompressor 104. Apparatus 200 may comprise at least one processor 202. The at least one processor may comprise, for example, one or more of various processing devices, such as for example a co-processor, a microprocessor, a controller, a digital signal processor (DSP) , a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC) , a field programmable gate array (FPGA) , a microcontroller unit (MCU) , a hardware accelerator, a special-purpose computer chip, or the like.

[0026] The apparatus may further comprise at least one memory 204. The memory may be configured to store, for example, computer program code or the like, for example operating system software and application software. The memory may comprise one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination thereof. For example, the memory may be embodied as magnetic storage devices (such as hard disk drives, floppy disks, magnetic tapes, etc.), optical magnetic storage devices, or semiconductor memories (such as mask ROM, PROM (programmable ROM) , EPROM (erasable PROM) , flash ROM, RAM (random access memory), etc.) .

[0027] Apparatus 200 may further comprise communication interface 208 configured to enable apparatus 200 to transmit and/or receive information, for example messages to/from other devices. In one example, apparatus 200 may use communication interface 208 to transmit or receive code C of FIG. 1. The communication interface may be configured to provide at least one wireless radio connection, such as for example a 3GPP mobile broadband connection (e.g. 3G, 4G, 5G) ; a wireless local area network (WLAN) connection such as for example standardized by IEEE 802.11 series or Wi-Fi alliance; a short range wireless network connection such as for example a Bluetooth, NFC (near-field communication) , or RFID connection; a local wired connection such as for example a local area network (LAN) connection or a universal serial bus (USB) connection, or the like; or a wired Internet connection.

[0028] Apparatus 200 may further comprise a user interface 210 comprising an input device and/or an output device. The input device may take various forms such a keyboard, a touch screen, or one or more embedded control buttons. The output device may for example comprise a display, a speaker, a vibration motor, or the like. User interface 210 may be used for example to provide instructions associated with compression and/or decompression to apparatus 200.

[0029] When the apparatus is configured to implement some functionality, some component and/or components of the apparatus, such as for example the at least one processor and/or the memory, may be configured to implement this functionality. Furthermore, when the at least one processor is configured to implement some functionality, this functionality may be implemented using program code 206 comprised, for example, in the memory 204.

[0030] The functionality described herein may be performed, at least in part, by one or more computer program product components such as software components. According to an embodiment, the apparatus comprises a processor or processor circuitry, such as for example a microcontroller, configured by the program code when executed to execute the embodiments of the operations and functionality described. Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs) , application-specific Integrated Circuits (ASICs) , application-specific Standard Products (ASSPs) , System-on-a-chip systems (SOCs) , Complex Programmable Logic Devices (CPLDs) , Graphics Processing Units (GPUs) .

[0031] The apparatus comprises means for performing at least one method described herein. In one example, the means comprises the at least one memory including program code, at least one processor, and program code configured to, when executed by the at least one processor, cause the apparatus to perform the method (s) .

[0032] Apparatus 200 may comprise for example a computing device such as for example a server, mobile phone, a tablet computer, a laptop, an internet of things (IoT) device, or the like. Examples of IoT devices include, but are not limited to, consumer electronics, wearables, and smart home appliances. In one example, apparatus 200 may comprise a vehicle such as for example a car. Although apparatus 200 is illustrated as a single device it appreciated that, wherever applicable, functions of apparatus 200 may be distributed to a plurality of devices, for example to implement embodiments as a cloud computing service.

[0033] FIG. 3 illustrates an example of a neural network 300, according to an embodiment. Neural network 300 may comprise an input layer, one or more hidden layers, and an output layer. Nodes of the input layer, i 1 to i n , may be connected to one or more of the m nodes of the first hidden layer, rin to ni m . Nodes of the first hidden layer may be connected to one or more of the k nodes of the second hidden layer, rgi to g k ยท It is appreciated that even though the example neural network of FIG. 1 illustrates two hidden layers, a neural network may apply any number and any type of hidden layers. Neural network 300 may further comprise an output layer. Nodes of the last hidden layer, in the example of FIG. 3 the nodes of second hidden layer, may be connected to one or more nodes of the output layer, eg to O j . It is noted that the number of nodes may be different for each layer of the network. A node may be also referred to as a neuron, a computation unit, or an elementary computation unit. Terms network, neural network, neural net model, may be used interchangeably. Weights of the neural network may be referred to as learnable parameters or simply as parameters. In the example of FIG. 2, one or more of the layers may be fully connected layers, for example layers where each node is connected to every node of a previous layer.

[0034] Two example architectures of neural networks include feed-forward and recurrent architectures. Feed-forward neural networks are such that there is no feedback loop. Each layer takes input from one or more previous layers and provides its output as the input for one or more of the subsequent layers. Also, units inside certain layers may take input from units in one or more of preceding layers and provide output to one or more of following layers.

[0035] Initial layers, for example layers close to the input data, may extract semantically low-level features. In an example image data, the low-level features may correspond to edges and textures in images. Intermediate and final layers may extract more high-level features. After the feature extraction layers there may be one or more layers performing a certain task, such as classification, semantic segmentation, object detection, denoising, style transfer, super-resolution, or the like.

[0036] In recurrent neural networks there is a feedback loop from one or more nodes of one or more subsequent layers. This causes the network to become becomes stateful. For example, the network may be able to memorize information or a state .

[0037] FIG. 4 an example of a node 401. A node may be configured to receive one or more inputs, ai to a n , from one or more nodes of one or more previous layers and compute an output based on the input values received. A node may also feedback from one or more nodes of one or more subsequent layers. Inputs may be associated with parameters to adjust the influence of a particular input to the output. For example weights wi to w n associated with the inputs ai to a n may be used to multiply the input values ai to a n . The node may be further configured combine the inputs to an output, or an activation. For example, the node may be configured to sum the modified input values. A bias or offset b may be also applied to add a constant to the combination of modified inputs. Weights and biases may be learnable parameters. For example, when the neural network is trained for a particular task, the values of the weights and biases associated with different inputs and different nodes may be updated such that an error associated with performing the task is reduced to an acceptable level.

[0038] Furthermore, an activation function f () may be applied to control when and how the neuron provides the output. Activation function may be for example a non-linear function that is substantially linear in the region of zero but limits the output of the node when the input increases or decreases. Examples of activation functions include, but are not limited to, a step function, a sigmoid function, a tanh function, a ReLu (rectified linear unit) function. The output may be provided to nodes of one or more following layers of the network, and/or to one or more nodes of one or more previous layers of the network.

[0039] A forward propagation or a forward pass may refer to feeding a set of input data through the layers of the neural network 300 and producing an output. During this process the weights and biases of the neural network 300 affect the activations of individual nodes and finally the output provided by the output layer.

[0040] One property of neural networks and other machine learning tools is that they are able to learn properties from input data, for example in supervised way or in unsupervised way. Learning may be based on teaching the network by a training algorithm or based on a meta-level neural network providing a training signal.

[0041] In general, a training algorithm may include changing some properties of the neural network such that its output becomes as close as possible to a desired output. For example, in the case of classification of objects in images, the output of the neural network may be used to derive a class or category index, which indicates the class or category that the object in the input image belongs to. Training may happen by minimizing or decreasing the output's error, also referred to as the loss.

[0042] During training the generated or predicted output may be compared to a desired output, for example ground-truth data provided for training purposes, to compute an error value. The error may be calculated based on a loss function. Updating the neural network may be then based on calculating a derivative with respect to learnable parameters of the network. This may be done for example using a backpropagation algorithm that determines gradients for each layer starting from the final layer of the network until gradients for the learnable parameters have been obtained. Parameters of each layer are updated accordingly such that the loss is iteratively decreased. Examples of losses include mean squared error, cross-entropy, or the like. In deep learning, training comprises an iterative process, where at each iteration the algorithm modifies parameters of the neural network to make a gradual improvement of the network's output, that is, to gradually decrease the loss.

[0043] Training phase of the neural network may be ended after reaching an acceptable error level. Inference phase may refer to applying the trained neural network for a particular task. For example, to provide a classification of an unseen image to one of a plurality of classes based on content of an input image.

[0044] Training a neural network may be seen as an optimization process, but the final goal may be different from a typical goal of optimization. In optimization, the goal may be to minimize a functional. In machine learning, a goal of the optimization or training process is to make the model learn the properties of the data distribution from a limited training dataset. In other words, the goal is to learn to use a limited training dataset in order to learn to generalize to previously unseen data, that is, data which was not used for training the model. This is usually referred to as generalization.

[0045] In practice, data may be split into at least two sets, a training data set and a validation data set. The training data set may be used for training the network, for example to modify its learnable parameters in order to minimize the loss. The validation data set may be used for checking performance of the network on data which was not used to minimize the loss as an indication of the final performance of the model. In particular, the errors on the training set and on the validation data set may monitored during the training process to understand the following issues: 1) if the network is learning at all - in this case, the training data set error should decrease, otherwise the model is in the regime of underfitting; 2) if the network is learning to generalize - in this case, also the validation set error should decrease and not be much higher than the training data set error. If the training data set error is low, but the validation data set error is much higher than the training data set error, or it does not decrease, or it even increases, the model is in the regime of overfitting. This means that the model has just memorized properties of the training data set and performs well on that set, but performs poorly on a set not used for tuning its parameters.

[0046] FIG. 5 illustrates an example of a convolutional neural network 500. A convolutional neural network 500 comprises at least one convolutional layer. A convolutional layer performs convolutional operations to extract information from input data, for example image 502, to form a plurality of feature maps 504. A feature map may be generated by applying a filter or a kernel to a subset of input data, for example block 512 in image 502, and sliding the filter through the input data to obtain a value for each element of the feature map. The filter may comprise a matrix or a tensor, which may be for example multiplied with the input data to extract features corresponding to that filter. A plurality of feature maps may be generated based on applying a plurality of filters. A further convolutional layer may take as input the feature maps from a previous layer and apply the same filtering principle on the feature maps 504 to generate another set of feature maps 506. Weights of the filters may be learnable parameters and they may be updated during a training phase, similar to parameters of neural network 300. Similar to node 401, an activation function may be applied to the output of the filter (s) . The convolutional neural network may further comprise one or more other type of layers such as for example fully connected layers 508 after and/or between the convolutional layers. An output may be provided by an output layer 510.

[0047] Neural networks may be used in image and video compression. Hybrid video compression is based on performing intra- and inter- prediction and then transform-coding the residual or prediction-error. For example, a video frame may be divided into blocks and each block may be encoded separately (but not necessarily independently) . A block may be coded either in intra-mode or in inter-mode. In intra-mode, the block may be predicted from spatially nearby blocks of the same video frame, which were already reconstructed. In inter-mode, the block may be predicted from temporally-nearby blocks (other frames) . In either case, only the prediction error (residual) may be encoded. The prediction error is the different between the intra or inter predicted block and the real block. The prediction-error, or residual, may be transform-coded to reduce redundancy and achieve higher compression gains. Suitable transforms comprise discrete cosine transforms (DCT) . The transform-domain block comprises a set of coefficients representing the content of the block (the residual) at different frequencies. Next, a quantization may be applied on the transform coefficients. Quantization may be uniform for all or a plurality of transform coefficients. Alternatively, adaptive quantization matrices may be used for example where low frequency coefficients are quantized with higher precision or granularity and higher frequency coefficients are quantized with lower precision or granularity. Finally, the quantized transform coefficients may be entropy-coded, for example using an arithmetic encoder.

[0048] Neural networks may be used in image and video compression, either to perform the whole compression or decompression process or to perform some steps of the compression or decompression process. The former option may be referred to as end-to-end learned compression. When the encoder neural network is used for some step(s), the encoder neural network could be for example for a step where the input is a decorrelated version of the input data, for example an output of a transform such as Fourier transform or discrete cosine transform (DCT) . The encoder neural network may be for example configured to perform one of the last steps of the encoding process. The encoder neural network may be followed by one or more post-processing steps such as for example binarization, quantization, and/or an arithmetic encoding step, for example a lossless encoder such as an entropy encoder. The decoder neural network may be located at a corresponding position at the decoder and be configured to perform a corresponding inverse function.

[0049] According to an example embodiment, a neural network may be used in the compression (or encoding) part, for example at compressor 102. The output of the neural network comprises encoded data, or code, which may then be further processed, for example to quantize the encoded data and/or to further compress the encoded data for example by an entropy coder. According to an example embodiment, the neural network may be the encoder part of a neural auto-encoder.

[0050] FIG. 6 illustrates an example of a neural auto encoder network 600, according to an example embodiment. An auto-encoder is a neural network comprising an encoder part 610, which is configured to compress data or make the input data to be more compressible at its output (for example having lower entropy), and a neural decoder part 620, which takes the compressed data (e.g., the data output by the encoder 610 or data output by a step performed after the encoder 610) and outputs a reconstruction of the original data, eventually with some loss. It is noted that embodiments may be applied to various other type of neural networks configured to be applied in compression or decompression process and the neural auto encoder is provided only as an example.

[0051] The neural auto-encoder 600 may be trained based on a training dataset. For each training iteration, a subset of data may be sampled from the training dataset and input to the encoder 610. The output of the encoder 610 may be subject to further processing steps, such as for example quantization and/or entropy coding. The output of the encoder 610 and any additional steps after that, is input to the decoder 620 which reconstructs the original data input to the encoder 610. The reconstructed data may however differ from the original input data. The difference between the input data and the reconstructed data may be referred to as the loss. However, the auto-encoder pipeline may be also designed in a way that there is no loss in the reconstruction. A loss or error value may be computed by comparing the output of the decoder to the input of the encoder 610. The loss value may be computed for example based on a mean-squared error (MSE) loss function. Another loss function may be used for encouraging the output of the encoder to be more compressible, for example to have low entropy. This loss may be used in addition to the loss measuring the quality of data reconstruction. In general, a plurality of losses may be computed and then added together for example via a linear combination (weighted average) to obtain a final loss. The final loss value may be then differentiated with respect to the weights and/or other parameters of the encoder 610 and decoder 620. Differentiation may be done for example based on backpropagation, as described above. The obtained gradients may then be used to update or change the parameters (e.g. weights), for example based on a stochastic gradient descent algorithm or any other suitable algorithm. This process may be iterated until a stopping criterion is met. As a result, the neural auto-encoder is trained to compress the input data and to reconstruct from the compressed version.

[0052] A challenge with auto-encoders is that it may be difficult to achieve variable bitrate. Variable bitrate may refer to the ability of a system to output a different bitrate depending on a target quality. For example, if a user desires to achieve a high reconstruction quality, it would be desired to output a code with high bitrate, for example a long code. If a user desires to achieve a low bit rate and accepts lower reconstruction quality, it would be desired to output a code with low bitrate, for example a short code. However, if the output layer of a neural encoder has a fixed size, and thus the number of output activations is also fixed,

it is difficult to achieve variable bitrate independently of the compressibility of the input data.

[0053] According to example embodiments, an encoder and decoder are trained such that after training a user may request to encode an image at a certain quality level, where a lower quality level means that the reconstructed image will have lower quality than a reconstructed image which was compressed using a specified higher quality level. Quality may be measured for example by a metric which is used also during the training phase .

[0054] Referring back to the example of FIG. 6, an output layer 611 of the encoder neural network 610 may be divided into a plurality of blocks 612. A block 612 may comprise a subset of filters or weights of the output layer 611. The blocks 612 may have different sizes. For example, a block may have a different number of weights or filters compared to another block. However, each block may also have same size. The blocks may be associated with an order such that it is possible to identify location of the block at the output layer. For example, a first block may comprise the left-most part of the output layer 611, a second block may comprise the second left-most part of the output layer 611, etc. The blocks 612 may be non-overlapping.

[0055] According to an example embodiment, the output layer

611 of the encoder neural network 610 may comprise a plurality of blocks which are trained all the time during the training phase, in addition to a plurality of blocks which are trained only one at a time as explained in the rest of this document. Such always-trained blocks may be located in any pattern, for example as the first blocks, or every second block. The output layer 611 may therefore comprise a first plurality of blocks and a second plurality of blocks. The first plurality of blocks may be trained as explained in some example embodiments, i.e., one at a time and by keeping the previous blocks in this plurality of blocks frozen and the following blocks in this plurality of blocks with zero output activations. The second plurality of blocks may be trained all the time during training, i.e., at every training iteration all blocks in this second plurality may be trained.

[0056] According to an embodiment the output layer 611 may comprise a convolutional layer. A convolutional output layer 611 may provide sufficient use of global features with a reasonable number of parameters. A convolutional output layer 611 may be beneficial for example if the number of layers in the encoder neural network 610 is high. The convolutional output layer 611 may for example comprise a 4-dimensional tensor of weights. Two dimensions of the tensor may correspond to spatial sizes of the filter, one dimension may correspond to number of input activations from a previous layer (N- 1), and one dimension may correspond to the number of output features maps from the convolutional output layer 611. For a convolutional output layer 611, the division in blocks may be with respect to the dimension or axis, which refers to the number of output feature maps. Decoder 620 may comprise corresponding layers 1 to N.

[0057] According to an example embodiment, the output layer 611 may comprise a fully connected layer.

A fully connected output layer 611 may be used for example to further improve utilization of global features because features learned from a fully connected layer cover the entire space. The fully connected output layer 611 may for example comprise a two-dimensional tensor, a matrix, where the number of rows corresponds to the dimensionality of the output of the previous layer (N-1) . For a fully-connected output layer 611, the division in blocks may be with respect to the dimension or axis which refers to the number of output activations. Decoder 620 may comprise corresponding layers 1 to N. After training the decoder 620 is able to reconstruct the data even from a subset of activations, for example when inputting a vector where only the initial activations are non-zero.

[0058] FIG. 6 illustrates an example where blocks 612 are located at the output layer 611. However, according to an example embodiment, blocks may be located also elsewhere in the encoder neural network 610 and the blocks may comprise entire layer (s) of the neural network. For example, a first block may comprise one layer and a second block may comprise another layer. A block may alternatively comprise a part of a layer, for example one or more weights or filters. As further illustrated in FIG. 6, the plurality of blocks in the neural network may comprise one or more blocks 612 at the output layer and one or more blocks 613 at other layer (s) . Hence, the plurality of blocks may comprise a plurality of layers, or portions thereof, in any suitable combination. The blocks may be non-overlapping.

[0059] FIG. 7 illustrates an example of a method 700 for training an auto-encoder, according to an example embodiment. The auto-encoder subject to training method 700 may for example comprise auto-encoder 600 with encoder 610 having N blocks at its output layer 611. However, method 700 may be also adapted to encoders comprising one or more blocks at other layer (s) of the network, as described above. Therefore, blocks of the encoder neural network may not be located at the output layer. Training the network may comprise repeating the following steps 700 to 707 for T training iterations, where T is either predefined or determined by satisfying a stopping criterion, such as for example when convergence is achieved. Convergence may be determined to be achieved for example when the training or validation error is below a predefined threshold or when the training or validation error does not decrease for more than a predefined number of training iterations.

[0060] A training iteration may comprise training the auto encoder with different blocks enabled. A training iteration may be initiated at operation 701, where i may be set to 1.

[0061] At operation 702, previous blocks of the output layer

1 to i-1 may be frozen, or substantially frozen, such that parameters of these blocks may not be updated while training block i. Block i may be kept unfrozen, i.e. trainable, such that parameters of block i may be updated.

[0062] At operation 703, parameters of subsequent blocks i+1 to N may be set to zero and frozen such that parameters of these blocks are kept at zero while training block i.

[0063] At operation 704, the auto-encoder is trained, this may include updating parameters associated with block i of the output layer and parameters of other layers of the encoder part and the decoder part.

[0064] At operation 705, it is determined whether i has reached N, that is, whether the auto-encoder has been trained for the N blocks.

[0065] If i has not reached N, the procedure may move to operation 706 to increase i and to repeat operations 702 to 705 for the next block.

[0066] If i has reached N, a next training iteration may be started at operation 707.

[0067] In other words, training an ith block may comprise updating parameters of the ith block, keeping parameters of i- 1 previous blocks substantially frozen, and setting last N-i blocks to zero, or substantially zero, wherein N is the number of blocks at the output layer. Training the ith block may comprise updating parameters of one or more preceding layers of the neural network.

[0068] Training the auto-encoder 600 according to method

700 trains the encoder network 610 to be capable of providing good compression performance for different numbers of blocks 612 at the output layer 611 without modifying other layers 1 to N- 1 of the encoder 610. The number of blocks 612 used for providing an output affects the length of the code generated by the encoder 610. Therefore, a benefit associated with the training method is that a configurable and cost-efficient encoder network may be obtained. This network may be used for example to provide variable rate encoding with different quality levels. Another benefit is that the training procedure causes the decoder part 620 of the auto-encoder 600 to be capable of decoding codes produced based on different numbers of blocks 612. Therefore, the same decoder 620 may be used to decode different variable rate codes generated by the encoder 610.

[0069] A further advantage of the training method is that, during training, blocks are trained for a similar number of iterations. Therefore, there may not be blocks which are trained more than other blocks. This enables to obtain smoothly behaving compression quality among different quality levels.

[0070] At inference phase, a user may be enabled to select a desired quality level. The user may for example select the desired quality level as an integer number which corresponds to the number of blocks that the last encoder layer will use to output the code. In one example embodiment, in response to receiving an indication of a certain desired quality level Q, the compressor 102 may configure parameters, for example weights, of blocks 1 to Q to be left untouched (as they were after training) , while configuring weights of blocks Q+l to N to be set to zero such that their output activations or feature maps will be zero too. According to another example embodiment, given a desired quality level Q, the encoder may be configured to leave all blocks untouched (not set to zero) , but the output activations or feature maps may be processed in such a way that the activations or feature maps output by the blocks Q+l to N will be set to zero.

[0071] Even though encoder 610 and decoder 620 may be trained as one entity, it is noted that at inference phase the encoder 610 may be used separately from the decoder 620. For example, a compressor device 102 comprising encoder 620 may be separate to a decompression device 104 comprising the decoder 620.

[0072] At inference phase, an input may be received at encoder 610. As described above, the encoder 610, or an entity controlling the encoder 610 may further receive an indication of a desired quality level Q. In response to receiving the indication of the quality level the encoder neural network 610 may be configured to provide an output based on the input and a subset of the blocks.

[0073] The subset of blocks 612 may be determined based on the quality level Q. For example, each quality level Q may be associated with a number of blocks 612, or a particular subset of blocks 612. A mapping between a quality level Q and the number or subset of blocks 612 may be provided for example in a look-up table. In one example embodiment, the quality level Q is an integer value equal to the number of blocks 612.

[0074] According to an example embodiment, the output layer may comprise N blocks 612, and the quality level Q may indicate providing an output from first Q blocks 612 in the order of blocks and not providing an output from last N-Q blocks 612 in the order of activations sets.

[0075] According to an example embodiment, the output layer may comprise a convolutional layer. The convolutional output layer may be divided into plurality of blocks with respect to a dimension associated with the number of output feature maps of the convolutional layer. According to an example embodiment, the output layer may comprise a fully connected layer. The fully connected output layer may be divided into a plurality of blocks with respect to a dimension associated with the number of output activations of the fully connected layer.

[0076] According to an embodiment, input to the encoder neural network may comprise video data, for example one or more video frames; audio data, for example a plurality of audio samples corresponding to one or more audio channels; or a representation of another neural network, for example a data format comprising representation of parameters of the other neural network.

[0077] FIG. 8 illustrates an example of a method 800 for variable rate data compression, according to an example embodiment .

[0078] At 801, the method may comprise receiving an input at a neural network comprising a plurality of blocks.

[0079] At 802, the method may comprise receiving an indication of a quality level.

[0080] At 803, the method may comprise determining a subset of the blocks based on the indication of the quality level

[0081] At 804, providing an output from the neural network based on the received input and the subset of the blocks.

[0082] Further features of the methods directly result from the functionalities and parameters of the server 110 and/or client 120 as described in the appended claims and throughout the specification and are therefore not repeated here. It is noted that one or more steps of the method may be performed in different order.

[0083] An apparatus, for example compressor 102 or decompressor 104 may be configured to perform or cause performance of any aspect of the method (s) described herein. Further, a computer program may comprise instructions for causing, when executed, an apparatus to perform any aspect of the method (s) described herein. Further, an apparatus may comprise means for performing any aspect of the method (s) described herein. According to an example embodiment, the means comprises at least one processor, and memory including program code, the at least one processor, and program code configured to, when executed by the at least one processor, cause performance of any aspect of the method.

[0084] Any range or device value given herein may be extended or altered without losing the effect sought. Also, any embodiment may be combined with another embodiment unless explicitly disallowed.

[0085] Although the subject matter has been described in language specific to structural features and/or acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as examples of implementing the claims and other equivalent features and acts are intended to be within the scope of the claims.

[0086] It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to 'an' item may refer to one or more of those items.

[0087] The steps or operations of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the scope of the subject matter described herein. Aspects of any of the embodiments described above may be combined with aspects of any of the other embodiments described to form further embodiments without losing the effect sought.

[0088] The term 'comprising' is used herein to mean including the method, blocks, or elements identified, but that such blocks or elements do not comprise an exclusive list and a method or apparatus may contain additional blocks or elements .

[0089] As used in this application, the term 'circuitry' may refer to one or more or all of the following: (a) hardware- only circuit implementations (such as implementations in only analog and/or digital circuitry) and (b) combinations of hardware circuits and software, such as (as applicable) : (i) a combination of analog and/or digital hardware circuit (s) with software/firmware and (ii) any portions of hardware processor (s) with software (including digital signal processor ( s )) , software, and memory (ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) hardware circuit (s) and or processor ( s ) , such as a microprocessor ( s ) or a portion of a microprocessor ( s ) , that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation. This definition of circuitry applies to all uses of this term in this application, including in any claims . [0090] As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.

[0091] It will be understood that the above description is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments. Although various embodiments have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from scope of this specification.