Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR DECODING A SCALABLE VIDEO BIT-STREAM, AND CORRESPONDING DECODING DEVICE
Document Type and Number:
WIPO Patent Application WO/2013/001013
Kind Code:
A1
Abstract:
The invention relates to video compression of scalable video (11). Where the base layer (14) is decoded using temporal prediction and the enhancement layer (20) has no temporal prediction, the decoding of the enhancement layer comprises: decoding (331), from the bit-stream EBS, and dequantizing (332) DCT coefficients formula (I) of the enhancement layer; using the motion information Ml of the base layer to predict (405) coefficients of the enhancement layer; and transforming the predicted coefficients into DCT predicted coefficients Y; using parameters α,β in the bit-stream to obtain a probabilistic distribution GGD(αi,βi) of the DCT coefficients; obtaining a probabilistic distribution GGD(αn,βn) of the differences between the dequantized DCT coefficients formula (I) and the DCT predicted coefficients Y; and merging (38) these coefficients, based on the obtained probabilistic distributions. This improves the rate-distortion ratio of encoded scalable video.

Inventors:
LE LEANNEC FABRICE (FR)
LASSERRE SEBASTIEN (FR)
Application Number:
PCT/EP2012/062586
Publication Date:
January 03, 2013
Filing Date:
June 28, 2012
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CANON KK (JP)
LE LEANNEC FABRICE (FR)
LASSERRE SEBASTIEN (FR)
International Classes:
H04N7/50; H04N7/26
Foreign References:
US6700933B12004-03-02
Other References:
BRUNO MACCHIAVELLO ET AL: "A STATISTICAL MODEL FOR A MIXED RESOLUTION WYNER-ZIV FRAMEWORK", 26. PICTURE CODING SYMPOSIUM; LISBON, 7 November 2007 (2007-11-07), XP030080372
DEBARGHA MUKHERJEE ET AL: "A simple reversed-complexity Wyner-Ziv video coding mode based on a spatial reduction framework", VISUAL COMMUNICATIONS AND IMAGE PROCESSING; SAN JOSE, 30 January 2007 (2007-01-30), XP030081145
LASSERRE S ET AL: "Low Complexity Scalable Extension of HEVC intra pictures based on content statistics", 9. JCT-VC MEETING; GENEVA; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16), no. JCTVC-I0190, 26 April 2012 (2012-04-26), XP030052774
Attorney, Agent or Firm:
SANTARELLI (14 avenue de la Grande-Armée, Paris Cedex 17, FR)
Download PDF:
Claims:
CLAIMS

1. A method for decoding a scalable video bit-stream, comprising decoding a base layer from the bit-stream, decoding an enhancement layer from the bit-stream and adding the enhancement layer to the base layer to obtain a decoded video of high resolution images, wherein decoding the enhancement layer comprises:

- decoding, from the bit-stream, and dequantizing encoded transformed coefficients of the enhancement layer;

- using motion information of the base layer to predict residual blocks of coefficients of the enhancement layer, and transforming the predicted residual blocks of coefficients into transformed residual blocks;

- obtaining at least one first probabilistic distribution of the transformed coefficients;

- obtaining at least one second probabilistic distribution of the differences between the dequantized transformed coefficients and the coefficients of the transformed residual blocks; and

- merging the dequantized transformed coefficients and the coefficients of the transformed residual blocks, based on the obtained probabilistic distributions.

2. A method for decoding a scalable video bit-stream, comprises:

- decoding a low resolution version of the video, the decoding of the low resolution version comprising using motion information to temporally predict blocks of a low resolution image from blocks of a decoded reference low resolution image;

- decoding an enhancement version of the video, each enhancement image of the enhancement version having a high resolution and temporally corresponding to a low resolution image of the low resolution video; and

- adding each decoded enhancement image to an up-sampled version of the corresponding low resolution image, to obtain a decoded video of decoded high resolution images;

wherein decoding a first enhancement image temporally corresponding to a first low resolution image comprises:

- decoding, from the bit-stream, and dequantizing blocks of encoded quantized transformed coefficients of the first enhancement image;

- obtaining at least one first probabilistic distribution of the quantized transformed coefficients; - using the motion information to obtain residual blocks from a decoded reference high resolution image temporally corresponding to the decoded reference low resolution image; and transforming said residual blocks into transformed residual blocks;

- obtaining at least one second probabilistic distribution of the differences between the coefficients of the transformed residual blocks and the dequantized transformed coefficients; and

- merging the dequantized blocks of dequantized transformed coefficients with the transformed residual blocks, based on the first and second probabilistic distributions

3. The decoding method of Claim 1 or 2, wherein the step of merging comprises merging a dequantized transformed coefficient with a collocated coefficient in the transformed residual blocks, using first and second probabilistic distributions associated with these collocated coefficients, on a quantization interval associated with the value of the corresponding quantized transformed coefficient.

4. The decoding method of Claim 3, wherein the first and second probabilistic distributions are integrated using Riemann sums over the quantization interval during the merging step.

5. The decoding method of any one of Claims 1 to 4, wherein the step of merging comprises calculating the expectation of a block coefficient, given the quantization interval associated with the value of the corresponding quantized transformed coefficient and given its corresponding value in the transformed residual blocks, based on the first and second probabilistic distributions.

6. The decoding method of Claim 5, wherein calculating the expectation xi of a block coefficient / comprises calculating the following value:

f x PDF,(x) PDFN(x - Υ0)άχ

X =

_ PDF,(x) PDFN(x - r0)dx

where PDF, is the first probabilistic distribution associated with the block coefficient /', PDFN is the second probabilistic distribution, Y_ is the value of the coefficient collocated with said block coefficient /' in the transformed residual blocks, and Qm is the quantization interval associated with the value of the quantized transformed coefficient collocated with said block coefficient /'.

7. The decoding method of any one of Claims 1 to 6, wherein the probabilistic distributions are generalized Gaussian distributions GGDia, , β± ι x) = — exp (-|x / a, I Pi ) , where a, and β, are

2air(l / y?i )

two parameters.

8. The decoding method of any one of Claims 1 to 7, wherein the obtaining of the second probabilistic distribution comprises fitting a Generalized Gaussian Distribution model onto the differences between the coefficients in the transformed residual blocks and the dequantized transformed coefficients.

9. The decoding method of any one of Claims 1 to 8, wherein the obtaining of the first probabilistic distribution comprises obtaining parameters from the bit-stream and applying these parameters to a probabilistic distribution model.

10. The decoding method of any one of Claims 1 to 9, wherein the low resolution or base image temporally corresponding to a first enhancement image to decode is an image bi-directionally predicted from reference low resolution or base images using motion information in each of the two directions, and the decoding of the first enhancement image comprises obtaining transformed residual blocks for each direction and merging together the transformed residual blocks in both directions with the dequantized blocks of dequantized transformed coefficients.

11. The decoding method of Claim 10, wherein the step of merging comprises calculating the merger value x± of a block coefficient /' using the formula:

f xPDF,(x)PDFN(x - y0)PDFN.(x - Y )dx

1 ~ f PDF,(x) PDFN(x - y0)PDFN,(x - y-0 )dx

On

where PDF, is the first probabilistic distribution associated with the block coefficient /', PDFN and PDFN. are the second probabilistic distributions for respectively each of the two directions, Y0 and Y'0 are the value of the coefficient collocated with said block coefficient in the transformed residual blocks in respectively each of the two directions, and Qm is the quantization interval associated with the value of the quantized transformed coefficient collocated with said block coefficient /'.

12. The decoding method of any one of Claims 1 to 11 , wherein obtaining residual blocks comprises:

- obtaining, using the motion information, motion predictor blocks from a decoded reference high resolution image;

- up-sampling the low resolution image temporally corresponding to the first enhancement image to decode into high resolution, to obtain up-sampled blocks; - subtracting each motion predictor block from a corresponding up- sampled block to obtain the residual blocks.

13. The decoding method of Claim 12, wherein, before using the motion information, that motion information is up-sampled into high resolution.

14. The decoding method of Claim 13, wherein the motion information that is up-sampled comprises, for a given block, a motion vector and a temporal residual block; and the obtaining of the motion predictor blocks comprises:

- obtaining blocks of the decoded reference high resolution image using the up-sampled motion information, and

- adding the up-sampled temporal residual block to the obtained blocks.

15. The decoding method of any one of Claims 1 to 14, further comprising filtering, using a deblocking filter, the obtained decoded high resolution images; and wherein parameters of the deblocking filter depend on the first and second probabilistic distributions used during the merger.

16. The decoding method of any one of Claims 1 to 15, wherein the second probabilistic distributions are obtained for blocks collocated with enhancement image blocks of the corresponding low resolution or base image that are encoded with the same coding mode.

17. The decoding method of any one of Claims 1 to 16, wherein first probabilistic distributions are obtained for respectively each of a plurality of channels, wherein a channel is associated with collocated coefficients having the same block coefficient position in their respective blocks.

18. A decoding device for decoding a scalable video bit-stream, comprising a base layer decoder configured to decode a base layer from the bit- stream, an enhancement layer decoder configured to decode an enhancement layer from the bit-stream and a video building unit configured to add the enhancement layer to the base layer to obtain a decoded video, wherein the enhancement layer decoder is further configured to:

- decode, from the bit-stream, and dequantize encoded transformed coefficients of the enhancement layer;

- use motion information of the base layer to predict residual blocks of coefficients of the enhancement layer, and transform the predicted residual blocks of coefficients into transformed residual blocks;

- obtain at least one first probabilistic distribution of the transformed coefficients; - obtain at least one second probabilistic distribution of the differences between the dequantized transformed coefficients and the coefficients of the transformed residual blocks; and

- merge the dequantized transformed coefficients and the coefficients of the transformed residual blocks, based on the obtained probabilistic distributions

19. A non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in an apparatus, causes the apparatus to perform the steps of any one of Claims 1 to 17.

20. A method of decoding video data comprising:

- decompressing, by a method according to any of Claims 1 to 17, video data of an enhancement layer to generate residual data having a first resolution;

- decoding video data of a base layer to generate decoded base layer video data having a second resolution, lower than the first resolution, and upsampling the decoded base layer video data to generate upsampled video data having the first resolution;

- forming a sum of the upsampled video data and the residual data to generate enhanced video data.

Description:
METHOD FOR DECODING A SCALABLE VIDEO BIT-STREAM, AND

CORRESPONDING DECODING DEVICE

FIELD OF THE INVENTION

The present invention concerns a method for decoding a scalable video bit- stream, and an associated decoding device.

BACKGROUND OF THE INVENTION

Video compression algorithms, such as those standardized by the standardization organizations ITU, ISO, and SMPTE, exploit the spatial and temporal redundancies of images in order to generate bitstreams of data of smaller size than original video sequences. These powerful video compression tools, known as spatial (or intra) and temporal (or inter) predictions, make the transmission and/or the storage of video sequences more efficient.

Video encoders and/or decoders (codecs) are often embedded in portable devices with limited resources, such as cameras or camcorders. Conventional embedded codecs can process at best high definition (HD) digital videos, i.e 1080x1920 pixel frames.

Real time encoding and decoding are however limited by the limited resources of the portable devices, especially regarding slow access to the working memory (e.g. random access memory, or RAM) and regarding the central processing unit (CPU).

This is particularly striking for the encoding or decoding of ultra-high definition (UHD) digital videos that are about to be handled by the latest cameras. This is because the amount of pixel data to consider for spatial or temporal prediction is huge.

UHD is typically four times (4k2k pixels) the definition of an HD video which is the current standard definition video. Furthermore, very ultra high definition, which is sixteen times that definition (i.e. 8k4k pixels), is even being considered in a more long- term future. SUMMARY OF THE INVENTION

Faced with these constraints in terms of limited power and memory access bandwidth, the inventors provide a UHD codec with low complexity based on scalable encoding.

Basically, the UHD video is encoded into a base layer and one or more enhancement layers.

The base layer results from the encoding of a low resolution version of the UHD images, in particular having a HD resolution, with a standard existing codec (e.g. H.264 or HEVC - High Efficiency Video Coding). As stated above, the compression efficiency of such a codec relies on spatial and temporal predictions.

Further to the encoding of the base layer, an enhancement image is obtained from subtracting an interpolated (or up-scaled or upsampled) decoded image of the base layer from the corresponding original UHD image. The enhancement images, which are residuals or pixel differences with UHD resolution, are then encoded into an enhancement layer.

Figure 1 illustrates such approach at the encoder 10.

An input raw video 1 , in particular a UHD video, is down-sampled 12 to obtain a so-called base layer, for example with HD resolution, which is encoded by a standard base video coder 13, for instance H.264/AVC or HEVC. This results in a base layer bit stream 14.

To generate the enhancement layer, the encoded base layer is decoded 15 and up-sampled 16 into the initial resolution (UHD in the example) to obtain the up- sampled decoded base layer.

The latter is then subtracted 17, in the pixel domain, from the original raw video to get the residual enhancement layer X.

The information contained in X is the error or pixel difference due to the base layer encoding and the up-sampling. It is also known as a "residual".

A conventional block division is then applied, for instance a homogenous 8x8 block division (but other divisions with non-constant block size are also possible).

Next, a block-based DCT transform 18 is applied to each block to generate

DCT blocks forming the DCT image X DCT having the initial UHD resolution.

This DCT image X DCT is encoded in XDCT.Q ^V an enhancement video encoding module 19 into an enhancement layer bit stream 20.

The encoded bit-stream EBS resulting from the encoding of the raw video 1 1 is made of: - the base layer bit-stream 14 produced by the base video encoder 13;

- the enhancement layer bit-stream 20 encoded by the enhancement video encoder 19; and

- parameters 21 determined and used by the enhancement video encoder.

Examples of those parameters are given here below.

Figure 2 illustrates the associated processing at the decoder 30 receiving the encoded bit-stream EBS.

Part of the processing consists in decoding the base layer bit-stream 14 by the standard base video decoder 31 to produce a decoded base layer. In particular, the decoding of the base layer comprises using motion information to temporally predict blocks of a first base image from blocks of a decoded reference base image. The concept of "reference image" is well-known from conventional encoding methods.

The decoded base layer is then up-sampled 32 into the initial resolution, i.e. UHD resolution.

In another part of the processing, a decoding of the enhancement layer is performed, wherein each enhancement image of the enhancement layer has a high resolution (in the example UHD) and temporally corresponds to a base image of the base layer. Due to the scalability of the video, at a given time, there is a temporal correspondence between an image of the video, a base image and at least one enhancement image.

In particular, both the enhancement layer bit-stream 20 and the parameters 21 are used by the enhancement video decoding and dequantization module 33 to generate Χ^ α . The image X^ c is the result of the quantization and then the inverse quantization on the image X DCT .

An inverse DCT transform 34 is then applied to each block of the dequantized image to obtain the decoded residual (of UHD resolution) in the pixel domain.

Each decoded enhancement image is then added, in the pixel domain, to the corresponding up-sampled decoded base image (block by block), to obtain a decoded video of decoded high resolution images. For example, each decoded residual X^ T Q -\ is added 35 to the corresponding block in the up-sampled decoded base layer to obtain decoded images of the video.

Filter post-processing, for instance with a deblocking filter 36, is finally applied to obtain the decoded video 37 which is output by the decoder 30.

Reducing UHD encoding and decoding complexity relies on simplifying the encoding of the enhancement images at the enhancement video encoding module 19 compared to the conventional encoding scheme.

To that end, the inventors dispense with the temporal prediction and possibly the spatial prediction when encoding the UHD enhancement images. This is because the temporal prediction is very expensive in terms of memory bandwidth consumption, since it often requires accessing other enhancement images as reference images. Low-complexity codecs may then be designed, in particular at the encoding side.

While this simplification reduces by 80% the slow memory random access bandwidth consumption during the encoding process, not using those powerful video compression tools may deteriorate the compression efficiency, compared to the conventional standards.

In this context, the inventors have developed several additional tools for increasing the efficiency of the encoding of those enhancement images.

For example, the enhancement video encoding module 19 (or "enhancement layer encoder") may model the statistical distribution of the DCT coefficients within the DCT blocks of a current enhancement image X by fitting a parametric probabilistic model.

This fitted model becomes the channel model of DCT coefficients and the fitted parameters are output in the parameter bit-stream 21 coded by the enhancement layer encoder. As will become more clearly apparent below, a channel model may be obtained for each DCT coefficient position within a DCT block based on fitting the parametric probabilistic model onto the corresponding collocated DCT coefficients throughout all the DCT blocks of the image X DCT or of part of it.

Based on the channel models and corresponding probabilistic distributions issued from the modelling, a selection of efficient quantizers and improved entropy coding may be implemented. Conversely, at the decoder, the channel models are reconstructed from the received parameters 21 , enabling retrieval of the selected quantizers and the entropy coding used.

In the context of the present invention, conventional quantizers and conventional entropy coding, such as Huffman codes, may also be used.

The present invention particularly focuses on the decoding of a resulting scalable video bit-stream.

In particular, in the absence of temporal prediction in the enhancement layer, decoding a first enhancement image may involve using the motion information of the corresponding first base image in order to obtain residual blocks from a "reference" decoded UHD image (temporally corresponding to the decoded reference base image used for predicting the first base image). Such blocks may then be used to correct the enhancement image data directly obtained from the bit-stream 20.

Using temporal prediction information of the base layer to decode the enhancement layer is known from the standard SVC (standing for "Scalable Video Coding"). One may note that such approach cannot be applied to blocks which are not temporally predicted in the base layer (i.e. the so-called Intra images or the intra- predicted blocks).

The present invention intends to improve the efficiency of a decoding method based on predicting the enhancement layer. This aims at improving the quality of reconstructed high resolution (e.g. UHD) images, while keeping low complexity at the encoding and decoding sides.

To that end, a first aspect of the invention concerns a method for decoding a scalable video bit-stream, comprising decoding a base layer from the bit-stream, decoding an enhancement layer from the bit-stream and adding the enhancement layer to the base layer to obtain a decoded video of high resolution images, wherein decoding the enhancement layer comprises:

- decoding, from the bit-stream, and dequantizing encoded transformed coefficients of the enhancement layer;

- using motion information of the base layer to predict residual blocks of coefficients of the enhancement layer, and transforming the predicted residual blocks of coefficients into transformed residual blocks;

- obtaining at least one first probabilistic distribution of the transformed coefficients; - obtaining at least one second probabilistic distribution of the differences between the dequantized transformed coefficients and the coefficients of the transformed residual blocks; and

- merging the dequantized transformed coefficients and the coefficients of the transformed residual blocks, based on the obtained probabilistic distributions.

In more detail, the method for decoding a scalable video bit-stream, comprises:

- decoding a low resolution version of the video, the decoding of the low resolution version comprising using motion information to temporally predict blocks of a low resolution image from blocks of a decoded reference low resolution image;

- decoding an enhancement version of the video, each enhancement image of the enhancement version having a high resolution and temporally corresponding to a low resolution image of the low resolution video; and

- adding each decoded enhancement image to an up-sampled version of the corresponding low resolution image, to obtain a decoded video of decoded high resolution images;

wherein decoding a first enhancement image temporally corresponding to a first low resolution image comprises:

- decoding, from the bit-stream, and dequantizing blocks of encoded quantized transformed coefficients of the first enhancement image;

- obtaining at least one first probabilistic distribution of the quantized transformed coefficients;

- using the motion information to obtain residual blocks from a decoded reference high resolution image temporally corresponding to the decoded reference low resolution image; and transforming said residual blocks into transformed residual blocks;

- obtaining at least one second probabilistic distribution of the differences between the coefficients of the transformed residual blocks and the dequantized transformed coefficients; and

- merging the dequantized blocks of dequantized transformed coefficients with the transformed residual blocks, based on the first and second probabilistic distributions.

The blocks of the encoded enhancement layer obtained from the bit-stream and the residual blocks obtained using the motion information of the base layer are merged together to form parts of the decoded enhancement image. As explained above this decoded enhancement image is then added to an up-sampled decoded base image to obtain a decoded high resolution (e.g. UHD) image.

This approach refines the quality of the decoded transformed (i.e. DCT) coefficients in the decoder.

According to the invention, the quality of the decoded high resolution image is improved compared to known techniques. This is due to the use of two probabilistic distributions that model both the original transformed coefficients and an error of temporal prediction, when merging the transformed coefficients (e.g. DCT coefficients).

The first probabilistic distribution corresponding to the transformed coefficients encoded in the bit-stream may be obtained from the bit-stream itself, for instance from the parameters 21 defined above. These may represent statistical modelling of the original transformed coefficients (i.e. before quantization and encoding).

The second probabilistic distributions that correspond to the blocks predicted using the motion information of the base layer, provide information about the noise of temporal prediction. In particular they provide modelled information on the difference between those predicted coefficients and the transformed coefficients. Since the original transformed coefficients are not known by the decoder, the decoded and dequantized transformed coefficients known at the decoding side are used in place of the original transformed coefficients. The inventors have observed that using those coefficients rather than the original ones provides modelling that is quite close to reality.

Since the decoded transformed coefficients and the transformed predicted coefficients (or residual blocks) both bring relevant information about the original transformed coefficients (DGT coefficients before encoding), using the above probabilistic distributions enables the obtaining of transformed coefficients to be statistically optimized so as to be closer to the original values than in the known techniques.

In particular, probabilities, such as the expected value example below, provide good statistical results.

For example, for low bitrate (meaning large quantization intervals), the temporally predicted blocks may more often provide relevant information on the original DCT coefficients than the quantization level obtained by the dequantization. For high bitrate, the opposite occurs. The invention allows gains up to several dBs in rate-distortion performance at almost no cost of additional complexity at the decoder, and at the cost of zero additional rate when the parameters 21 have already been transmitted.

As a further advantage, the approach according to the invention does not necessarily have to be performed at the decoding side. For example, it may be switched off in case of very low complexity decoders. Further, the encoding is independent of the switching decision.

Correlatively, the invention also relates to a decoding device for decoding a scalable video bit-stream, comprising a base layer decoder configured to decode a base layer from the bit-stream, an enhancement layer decoder configured to decode an enhancement layer from the bit-stream and a video building unit configured to add the enhancement layer to the base layer to obtain a decoded video, wherein the enhancement layer decoder is further configured to:

- decode, from the bit-stream, and dequantize encoded transformed coefficients of the enhancement layer;

- use motion information of the base layer to predict residual blocks of coefficients of the enhancement layer, and transform the predicted residual blocks of coefficients into transformed residual blocks;

- obtain at least one first probabilistic distribution of the transformed coefficients;

- obtain at least one second probabilistic distribution of the differences between the dequantized transformed coefficients and the coefficients of the transformed residual blocks; and

- merge the dequantized transformed coefficients and the coefficients of the transformed residual blocks, based on the obtained probabilistic distributions.

In more detail, the decoding device for decoding a scalable video bit- stream, comprises:

- a base decoder configured to decode a low resolution version of the video, using motion information to temporally predict blocks of a low resolution image from blocks of a decoded reference low resolution image;

- an enhancement decoder configured to decode an enhancement version of the video, each enhancement image of the enhancement version having a high resolution and temporally corresponding to a low resolution image of the low resolution video; and - an image building unit configured to add each decoded enhancement image to an up-sampled version of the corresponding low resolution image, to obtain a decoded video of decoded high resolution images;

wherein the enhancement decoder is further configured to:

- decode, from the bit-stream, and dequantize blocks of encoded quantized transformed coefficients of the first enhancement image;

- obtain at least one probabilistic distribution of the quantized transformed coefficients;

- use the motion information to obtain residual blocks from a decoded reference high resolution image temporally corresponding to the decoded reference low resolution image; and transform said residual blocks into transformed residual blocks;

- obtain at least one second probabilistic distribution of the differences between the coefficients of the transformed residual blocks and the dequantized transformed coefficients; and

- merge the dequantized blocks of dequantized transformed coefficients with the transformed residual blocks, based on the first and second probabilistic distributions.

Another aspect of the invention relates to an information storage means, able to be read by a computer system, comprising instructions for a computer program adapted to implement the decoding method as set out above, when the program is loaded into and executed by the computer system.

Yet another aspect of the invention relates to a computer program product able to be read by a microprocessor, comprising portions of software code adapted to implement the decoding method as set out above, when it is loaded into and executed by the microprocessor.

The decoding device, the computer program and the information storage means may have features and advantages that are analogous to those set out above and below in relation to the decoding method, in particular that of refining decoded transformed DCT coefficients and of improving the quality of decoded high resolution images.

Another aspect of the invention relates to a method for decoding an image substantially as herein described with reference to, and as shown in, Figure 5; Figures 5 and 11 ; Figures 5, 10 and 11 of the accompanying drawings. Another aspect of the invention relates to a decoding device for encoding an image substantially as herein described with reference to, and as shown in, Figure 5; Figures 5 and 10 of the accompanying drawings.

Optional features of the invention are further defined in the dependent appended claims.

For example, the step of merging may merge a dequantized transformed coefficient with a collocated coefficient in the transformed residual blocks (meaning collocated blocks and collocated coefficients within those blocks), using first and second probabilistic distributions associated with these collocated coefficients, on a quantization interval associated with the value of the corresponding quantized transformed coefficient (i.e. the value before the quantized transformed coefficient is dequantized).

This ensures an accurate merged transformed coefficient to be provided, given its quantized value that has been transmitted in the encoded bit-stream.

In particular, the first and second probabilistic distributions are integrated using Riemann sums over that quantization interval during the merging step. This provision makes it possible to perform a probabilistic merger of transformed coefficients, on low complexity decoders.

According to a particular feature, the step of merging comprises calculating the expectation of a block coefficient, given the quantization interval associated with the value of the corresponding quantized transformed coefficient and given its corresponding value in the transformed residual blocks, based on the first and second probabilistic distributions.

In particular, calculating the expectation x ± of a block coefficient / comprises calculating the following value:

f x PDF t (x) PDF (x - Y Q )dx

X = _¾

PDF,Cx) PDF N (x - y 0 )dx

where PDFi is the first probabilistic distribution associated with the block coefficient / ' , PDF N is the second probabilistic distribution, Y 0 is the value of the coefficient collocated with said block coefficient / ' in the transformed residual blocks, and Q m is the quantization interval associated with the value of the quantized transformed coefficient collocated with said block coefficient /.

These approaches combine the probabilities of occurrence of the considered coefficient in the quantization interval (i.e. the first probabilistic distribution), the predicted value (Y 0 ) and the noise modelling of the prediction (i.e. the second probabilistic distribution).

The probabilistic best value is thus obtained for the transformed coefficients when reconstructing the decoded high resolution images. These images prove to be statistically improved with respect to quality.

In one embodiment of the invention, the probabilistic distributions are generalized Gaussian distributions

GGD{a, , /?,. , x) = — exp (-lx / a, | ft ) , where <¾ and β, are two parameters. This parametric model is well-suited for modelling noise, such as the residuals.

In particular, the obtaining of the second probabilistic distribution comprises fitting a Generalized Gaussian Distribution model onto the differences between the coefficients in the transformed residual blocks and the dequantized transformed coefficients. In that case, the second probabilistic distribution is statistically obtained based on the coefficients that are actually handled by the decoder.

According to a particular feature, the obtaining of the first probabilistic distribution comprises obtaining parameters from the bit-stream and applying these parameters to a probabilistic distribution model.

In one particular embodiment of the invention, the low resolution or base image temporally corresponding to a first enhancement image to decode is an image bi-directionally predicted from reference low resolution or base images using motion information in each of the two directions, and the decoding of the first enhancement image comprises obtaining transformed residual blocks for each direction and merging together the transformed residual blocks in both directions with the dequantized blocks of dequantized transformed coefficients.

This applies for example to enhancement images corresponding to B-type images of the base layer.

This approach proves to be more precise than an approach which first determines a single transformed residual block based on prediction in both directions. This is because a motion prediction noise estimation in each direction is separately obtained, improving a probabilistic merger.

Similarly to the case briefly described above, the merging can be based on calculating an expectation. For example, the step of merging may comprise calculating the merger value x. of a block coefficient / ' using the formula:

[ x PDF^ PDF^x - r 0 ) PDF N ,(x - Y' 0 )dx

* PDF.(x) PDF N (x - y 0 ) PDF N ,(x - Y )dx where PDF, is the first probabilistic distribution associated with the block coefficient / ' , PDF N and PDF N ' are the second probabilistic distributions for respectively each of the two directions, Y 0 and Y' 0 are the value of the coefficient collocated with said block coefficient in the transformed residual blocks in respectively each of the two directions, and Q m is the quantization interval associated with the value of the quantized transformed coefficient collocated with said block coefficient / ' .

In one embodiment of the invention, obtaining residual blocks comprises:

- obtaining, using the motion information, motion predictor blocks from a decoded reference high resolution image;

- up-sampling the low resolution image temporally corresponding to the first enhancement image to decode into high resolution, to obtain up-sampled blocks;

- subtracting each motion predictor block from a corresponding (i.e. collocated) up-sampled block to obtain the residual blocks.

These steps define the temporal prediction of the enhancement layer based on the images already reconstructed. They produce another enhancement layer (since each obtained block is the difference with the base layer) from which a modelling of the temporal prediction noise can be performed.

In one embodiment of the invention, before using the motion information, that motion information is up-sampled (or interpolated) into high resolution. This is because the reference image on which that information is about to be used is of high resolution.

According to a particular feature, the motion information that is up-sampled comprises, for a given block, a motion vector and a temporal residual block; and the obtaining of the motion predictor blocks comprises

- obtaining blocks of the decoded reference high resolution image using the up-sampled motion information, and

- adding the up-sampled temporal residual block to the obtained blocks.

It may further comprise a reference image index identifying said reference low resolution image, when the encoding with temporal prediction uses multiple reference images. In another embodiment of the invention, the decoding method may further comprise filtering, using a deblocking filter, the obtained decoded high resolution images; wherein parameters (e.g. the filter strength parameter or the quantization- dependent parameter) of the deblocking filter depend on the first and second probabilistic distributions used during the merger.

This makes it possible to locally adjust the post-processing for filtering discontinuities resulting from the modelling according to the invention.

In yet another embodiment, the second probabilistic distributions are obtained for blocks collocated with enhancement image blocks of the corresponding low resolution or base image that are encoded with the same coding mode. The coding mode of the base (low resolution) layer is for example the INTER mode, which may be further subdivided into an INTER P-prediction mode and an INTER B-prediction mode, or the SKIP mode (as defined in H.264).

In another embodiment, first probabilistic distributions are obtained for respectively each of a plurality of channels, wherein a channel is associated with collocated coefficients having the same block coefficient position in their respective blocks. Furthermore, a channel may be restricted to the blocks collocated with base layer blocks having the same coding mode.

According to another aspect of the present invention, there is provided a method of decoding video data comprising:

- decompressing video data of an enhancement layer to generate residual data having a first resolution;

- decoding video data of a base layer to generate decoded base layer video data having a second resolution, lower than the first resolution, and upsampling the decoded base layer video data to generate upsampled video data having the first resolution;

- forming a sum of the upsampled video data and the residual data to generate enhanced video data.

Preferably, the decompression of the video data of the enhancement layer employs a method embodying the aforesaid first aspect of the present invention.

In one embodiment, the decoding of the base layer video data is in conformity with HEVC.

In one embodiment, the first resolution is UHD and the second resolution is HD. As already noted, it is proposed that the compression of the residual data does not involve temporal prediction and/or that the compression of the residual data also does not involve spatial prediction.

BRIEF DESCRIPTION OF THE DRAWINGS

Other particularities and advantages of the invention will also emerge from the following description, illustrated by the accompanying drawings, in which:

- Figure 1 schematically shows an encoder for a scalable codec;

- Figure 2 schematically shows the corresponding decoder;

- Figure 3 schematically illustrates the enhancement video encoding module of the encoder of Figure 1 ;

- Figure 4 schematically illustrates the enhancement video decoding module of the encoder of Figure 2;

- Figure 5 is a more detailed schematic illustration of the decoder of Figure 2 according to the invention;

- Figure 6 illustrates a structure of a 4:2:0 macroblock;

- Figure 7 illustrates an example of a quantizer based on Voronoi cells;

- Figure 8 illustrates the spatial random access property;

- Figure 9 illustrates an implementation of entry points to allow spatial random access as illustrated in Figure 8;

- Figure 10 illustrates the prediction of the enhancement layer according to the invention;

- Figure 11 illustrates the probabilistic merging according to an embodiment of the invention;

- Figure 12 illustrates the performance of the present invention; and

- Figure 13 shows a particular hardware configuration of a device able to implement methods according to the invention.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

The detailed description below focuses on the encoding and the decoding of a UHD video as introduced above with reference to Figures 1 and 2. It is however to be recalled that the invention applies to the decoding of an encoded scalable video bit- stream in which the base layer has been encoded using temporal prediction and the enhancement layer has been encoded without temporal prediction and possibly without spatial prediction. Referring to Figure 3 which illustrates an embodiment of an enhancement video encoding module 19 (or "enhancement layer encoder"), a low resolution version of the initial image has been encoded into an encoded low resolution image, referred above as the base layer; and a residual enhancement image has been obtained by subtracting an interpolated high resolution (or up-sampled) decoded version of the encoded low resolution image from said initial image.

Conventionally, that residual enhancement image is then transformed from the spatial domain (i.e. pixels) into the (spatial) frequency domain, using for example a block-based DCT transform, to obtain an image of transformed block coefficients. In the Figure, that image is referenced X DCT , which comprises a plurality of DCT blocks, each comprising DCT coefficients.

As an example, the residual enhancement image has been divided into blocks B k , for instance 8x8 blocks but other divisions may be considered, on which the DCT transform is applied. Within a block, the DCT coefficients are associated with a block coefficient position or "index" i (e.g. i = 1 to 64), along a zigzag scan for successive handling when encoding, for example.

Blocks are grouped into macroblocks MB k . A very common case for so- called 4:2:0 YUV video streams is a macroblock made of 4 blocks of luminance Y, 1 block of chrominance U and 1 block of chrominance V, as illustrated in Figure 6. Here too, other configurations may be considered.

In the example developed below, a macroblock MB k is made of 16x16 pixels of luminance Y and the chrominance has been down-sampled by a factor two both horizontally and vertically to obtain 8*8 pixels of chrominance U and 8*8 pixels of chrominance V. The four blocks within a macroblock MB k are referenced k' a k> ti k> a k■

To simplify the explanations, only the coding of the luminance component is described below. However, the same approach can be used for coding the chrominance components.

Starting from the image X DCT , a probabilistic distribution P of each DCT coefficient is determined using a parametric probabilistic model. This step is referenced 190 in the Figure.

Since, in the present example, the image X DCT is a residual image, i.e. information is about a noise residual, it is efficiently modelled by Generalized Gaussian Distributions (GGD) having a zero mean: DCT X » GGD(a,fi) _ where α,β are two parameters to be determined and the GGD follows the

following two-parameter distribution: and where Γ is the well-known Gamma function: Π ζ ) = |° t z ~ x e ~t dt .

The DCT coefficients cannot be all modelled by the same parameters and, practically, the two parameters α,β may depend on:

- video content. This means that the parameters must be computed each image or every n images for instance;

- the index i of the DCT coefficient within a DCT block B k . Indeed, each DCT coefficient has its own behaviour. A DCT channel is thus defined as the DCT coefficients collocated (i.e. having the same index) within a plurality of DCT blocks (possibly all the blocks of the image). A DCT channel can therefore be identified by the corresponding index i ; and/or

- the encoding mode used for the collocated block of the base layer, referred in the present document as to the "base coding mode". Typically, Intra blocks of the base layer do not behave the same way as Inter blocks. Blocks with a coded residual in the base layer do not behave the same way as blocks without such a residual (i.e. Skipped blocks). And blocks coded with a non-nil texture data according to the coded-block-pattern syntax element as defined in H.264/AVC do not behave the same way as those blocks without non-nil texture data.

It is to be noted that, due to the down-sampling of the base layer, the collocation of blocks should take into account that down-sampling. For example, the four blocks of the n-th macroblock in the residual enhancement layer with UHD resolution are collocated with the n-th block of the base layer having a HD resolution. That is why, generally, all the blocks of a macroblock in an enhancement image have the same base coding mode.

For illustrative purposes, if the residual enhancement image X DCT is divided into 8x8 pixel blocks, the modelling 190 has to determine the parameters of 64 DCT channels for each base coding mode.

In addition, since the luminance component Y and the chrominance components U and V have dramatically different source contents, they must be encoded in different DCT channels. For example, if it is decided to encode the luminance component Y on one channel and the chrominance components UV on another channel, 128 channels are needed for each base coding mode. At least 64 pairs of parameters for each base coding mode may appear as a substantial amount of data to transmit to the decoder (see parameters 21). However, experience proves that this is quite negligible compared to the volume of data needed to encode the residuals of Ultra High Definition (4k2k or more) videos. As a consequence, one may understand that such a technique is preferably implemented on large videos, rather than on very small videos because the parametric data would be too costly.

For the sake of simplicity of explanation, a set of DCT blocks corresponding to the same base coding mode is now considered. The process below may then be applied to each set corresponding to each base coding mode. Furthermore, as suggested above, it may also be directly applied to the entire image, regardless of the base coding modes.

To obtain the two parameters , defining the probabilistic distribution P, for a DCT channel i, the Generalized Gaussian Distribution model is fitted onto the DCT block coefficients of the DCT channel, i.e. the DCT coefficients collocated within the DCT blocks with the same base coding mode. Since, this fitting is based on the values of the DCT coefficients before quantization (of the DCT blocks having the same base coding mode in the example), the probabilistic distribution is a statistical distribution of the DCT coefficients within a considered channel i.

For example, the fitting may be simply and robustly obtained using the moment of order k of the absolute value of a GGD:

Determining the moments of order 1 and of order 2 from the DCT coefficients of channel i makes it possible to directly obtain the value of parameter β,:

M 2 = rq / #)Γ(3 / fl)

( ,) 2 Γ(2 / β,Ϋ

The value of the parameter β, can thus be estimated by computing the above ratio of the two first and second moments, and then the inverse of the above function of β,.

Practically, this inverse function may be tabulated in a memory of the encoder instead of computing Gamma functions in real time, which is costly. The second parameter a, may be determined from the first parameter β, and the second moment, using the equation: M 2 = σ 2 = <¾ 2 Γ(3 / / Γ(1 / ¾).

The two parameters α,,β, being determined for the DCT coefficients i, the probabilistic distribution P, of each DCT coefficient / in a considered block is defined by

Ρ, ίχ) = GGDia, , β, , χ) =

2α. ; Γ(1 1 exp (-| / « ' ) .

/ β. )

Still referring to Figure 3, a quantization 192 of the DCT coefficients of X DCT is then performed, to obtain quantized DCT coefficients X ' DCT>Q (i.e. symbols or values).

As shown in the Figure, the quantization of those coefficients may involve optimal quantizers chosen (step 191) for each DCT channel i based on the corresponding probabilistic distribution P,(x) of the DCT coefficients.

In a variant, the quantizers may be predefined prior to the encoding. Since the quantization is not the core of the present invention, it is here assumed that a quantizer is selected for each DCT channel and each base coding mode as defined above, meaning that various quantizers are generally used for quantizing various DCT coefficients.

Figure 7 illustrates an exemplary Voronoi cell based quantizer.

A quantizer is made of M Voronoi cells distributed over the values of the

DCT coefficients. Each cell corresponds to an interval [ m ^m+i ] , called quantum Q m .

Each cell has a centroid c m , as shown in the Figure.

The intervals are used for quantization: a DCT coefficient comprised in the interval [^, +ι ] is quantized by a symbol a m associated with that interval.

The centroids are used for de-quantization: a symbol a m associated with an interval is de-quantized into the centroid value c m of that interval.

For each possible symbol a m of a current quantizer / (i.e. associated with a DCT channel /), a probability of occurrence p i m may be calculated based on the probabilistic distribution P,(x): p iim = p i^ x ^ x on the quantum Q m .

The probabilities {p i m } for an alphabet A, thus define the probabilistic distribution of the possible symbols or values defined therein.

Given the considered DCT channel i for a base coding mode, the probabilistic distribution is the same for the alphabets associated with DCT coefficients collocated within a plurality of the blocks of the image. Such probabilities may be computed off-line and stored in memory of the encoder, in order to decrease the complexity of real time encoding. This is for example possible when the parameters α,β for modelling the distribution of the DCT coefficients are chosen from a limited number of possible parameters, and when the possible quantizers are know in advance.

The probabilities {pi ,m } and the quantized symbols or values a m obtained from each DCT coefficient in the DCT image X DCT are then provided for entropy coding 193 as shown in Figure 3. For this, the quantized DCT coefficients are generally processed according to a zigzag scan.

The entropy coding may then take into account these probabilities to provide improved encoding. However, in the context of the invention, conventional Huffman entropy coding may be implemented in step 193.

The entropy coding 193 compresses the DCT image X DCT and generates encoded DCT images X D C C NC T Q which constitute the enhancement layer bit-stream 20.

This encoding scheme of the enhancement layer has spatial random access properties due to the absence of inter frame (temporal) and possibly intra block (spatial) predictions.

Indeed, the absence of prediction in the enhancement layer ensures that no dependence between macroblocks exists. In particular, if an entry point of the generated bit-stream 20 and the index of the associated macroblock are given, it is possible to perform the decoding from that point, without decoding other parts of the encoded video.

It is said that the bit-stream has the random spatial access property because it is possible to decode only a part of the image (a region of interest) once the associated entry points are given.

Figure 8 illustrates how the residual enhancement image may be subdivided into spatial zones made of macroblocks, with entry points in order to allow efficient random access compliant coding.

The position of the entry points may be encoded in the header of the bit- stream 20 in order to facilitate easy extraction from the server side and allow the reconstruction of a valid stream on the decoder side.

Figure 9 shows the meta-organization of a bit-stream header. For example, the slice header shown in the Figure re-uses the slice header of the H.264/AVC video compression standard. However, to provide the entry points, there is added, at the beginning of each slice header, a new field ("coded slice length") which indicates the length in bytes of the coded slice. The entry points can therefore be easily computed from the "coded slice length" fields.

Another advantage of this independence between macroblocks of the residual enhancement image is the possibility to perform parallel entropy decoding on the decoder side. Each decoding thread starts decoding from one of the entry points as defined above.

Figure 4 illustrates the associated enhancement video decoder 33.

From the received parameters 21 , the channel models are reconstructed, meaning that a probabilistic distribution GGD(ai, i) is known for each encoded DCT coefficient of a channel / ' .

Quantizers are chosen 330 from the pool of quantizers, possibly based on these probabilistic distributions.

Next decoding, from the bit-stream, and dequantizing blocks of encoded quantized DCT coefficients of the first enhancement image are performed.

This comprises the following operations:

- an entropy decoder 331 is applied to the received enhancement layer bit-stream 20 to obtain the quantized DCT image X DEC . As suggested above, conventional Huffman codes can be used, possibly taking into account the probabilistic distributions; and

- a dequantization (or inverse quantization) 332 is then performed by using the chosen quantizers for each coefficient, to obtain a dequantized version of the

DCT image. The dequantized version is referenced X c , since it is different from the original version X DCT due to the lossy quantization.

The present invention particularly focuses on the rest of the decoding process, from the dequantized DCT coefficients X c of that dequantized image, as described now with reference to Figure 5.

As shown in this Figure, the decoding method according to the invention comprises a step of merging 38 the dequantized DCT blocks X^ c of dequantized

DCT coefficients with DCT residual blocks (or "predictor blocks") Y.

The DCT residual blocks Y are generated by an enhancement prediction module 40 which is further detailed below. The DCT blocks X form a first version of the residual enhancement image currently decoded, while the DCT residual blocks Y form, at least partly, a second version of the same residual enhancement image, that is temporally predicted based on base layer motion information and an already decoded UHD image of the video, as explained below.

The merger of the blocks X c with the blocks Y may be a probabilistic merging process that is based on the parameters 21 (i.e. the probabilistic distributions of the DCT coefficients as determined by the encoder) and on a second probabilistic distribution that characterizes the temporal prediction of the enhancement layer by the module 40. In particular, the second probabilistic distribution is a probabilistic distribution of the differences between the coefficients of the DCT residual blocks Y and the dequantized DCT coefficients of the dequantized DCT blocks X^ 0 .

Figure 10 illustrates the generation of the DCT residual blocks Y, i.e. of transformed residual blocks of the enhancement image associated with a current image I to decode.

This prediction successively consists in temporally predicting current enhancement image in the pixel domain (thanks to up-sampled motion information), computing the pixel difference data between temporal predicted image and up-sampled reconstructed base image and then applying a DCT transform on the difference image.

It is first assumed that the image l B of the base layer corresponding to the current image I to decode has already been decoded by the base layer decoder 31 using temporal prediction based on an already-decoded reference image l R B of the base layer. For each block B or macroblock MB of the image l B (depending on the granularity of the prediction), motion information Ml is known and temporarily stored when decoding the base layer, for the needs of the present invention.

This motion information comprises, for each block or macroblock, a base motion field BMF (including a motion vector and a reference image index) and a base residual BR, as well-known by one skilled in the art of video coding.

Each of the base blocks or base macroblocks that have been temporally predicted in the base layer are now successively considered.

It may be noted that the present invention also applies when not all the images l B of the base layer are encoded in the bit-stream EBS. For example, an image l B of the base layer may be obtained by interpolating other decoded base images. In that case, the available motion information for those other decoded base images may also be interpolated to provide motion information specific to blocks or macroblocks of the interpolated base image l B . The following explanation also applies for such kind of base image l B .

First, the corresponding motion information is up-sampled 400 into high resolution corresponding to the resolution of the enhancement layer (e.g. UHD). It is shown in the Figure by the references UMF (up-sampled motion field) and UR (up- sampled residual).

In particular, this up-sampling comprises for each base macroblock:

- the up-sampling of the macroblock partitioning (into blocks) to the resolution level of the enhancement layer. In the above example of a dyadic spatial scalability (UHD v.s. HD), up-sampling the partition consists in multiplying the width and height of macroblock partitions by a factor of 2;

- the up-sampling, by a factor of 2 (in width and height), of the base residual associated with the base macroblock. This texture up-sampling process may use an interpolation filter that is identical to that used in inter-layer residual prediction mechanisms of the SVC scalable video compression standard; and

- the up-sampling, by two (x- and y- coordinates), of the motion vector associated with the base macroblock.

Once the up-sampling 400 has been performed, the generation of a DCT residual macroblock Y comprises a motion compensated prediction step 405 from the decoded UHD image that temporally corresponds to the reference base image l R B used for the decoding of the base layer, and based on the up-sampled motion information UMF and UR. That decoded UHD image is considered, for the temporal prediction, as the reference decoded image l R UHD .

It may for example be the reconstructed UHD image that temporally precedes the current image to decode, as shown in the Figure.

This motion compensation 405, in the pixel domain, leads to obtaining, using the motion information, motion predictor blocks from the decoded reference high resolution image l R UHD . In particular, the up-sampled prediction information is applied to the reference decoded image l R UHD to determine predicted macroblocks.

One may note that the motion compensation results in a partially- reconstructed image. This is because the macroblocks reconstructed by prediction are obtained at spatial positions corresponding to INTER macroblocks in the base image l B only (because there is no motion information for other macroblocks). In other words, there is no predicted block that is generated for the macroblocks collocated with INTRA macroblocks in the base layer.

Next, residual blocks are obtained by subtracting 410 each motion predictor block from a corresponding (i.e. collocated) up-sampled block in the up-sampled decoded base image (which is obtained by the up-sampling 32 of Figure 2). This step calculates the difference image (or residual) between the temporally predicted image and the up-sampled reconstructed base layer image. This difference image has the same nature as the residual enhancement image.

The module 40 ends by applying 415 a block-based transform, e.g. DCT on 8x8 blocks, on the obtained residual blocks to obtain transformed residual blocks that are the DCT residuals Y discussed above.

Therefore, a plurality of DCT residual macroblocks Y is obtained for the current image to decode, which generally represent a partial predicted enhancement image.

The next steps of the decoding method according to the invention may be applied to the entirety of that plurality of macroblocks Y, or to a part of it depending for example on the base coding mode (P image Inter prediction, B image Inter prediction, Skip mode) in which case only the DCT predictor macroblocks Y and the dequantized

DCT macroblocks X^ c collocated with base macroblocks having the same coding mode are handled together.

For the rest of the description, macroblocks Y collocated with P, B and SKIP base macroblocks are considered separately, as was done at the encoder when determining the probabilistic distribution of each DCT channel.

Based on the considered DCT residual macroblocks Y, a probabilistic distribution of the differences between the coefficients in the transformed residual blocks Y and the dequantized transformed coefficients X^ c is calculated. This aims to model the noise associated with the motion prediction 405.

A probabilistic distribution may be obtained for the entire set of coefficients of the considered DCT residual macroblocks, or for each DCT channel / ' in which case the explanation below should be applied for the DCT coefficients of the same channel.

Each DCT residual macroblock Y made of DCT coefficients for the current image to decode is considered as a version of the original DCT coefficients that would have been altered through a communication channel. It has been observed that the quantity Ί-Χ DCT (i.e. the noise of the residual Y compared to the DCT coefficients before encoding) can be well modelled by a generalized Gaussian distribution as introduced above: DCT (Y - X DCT ) « GGD(a N ,fi N )

By knowing the statistical distribution of the predictor noise (Y- ^. ), it is therefore possible to retrieve a good approximation of the original DCT coefficients of blocks X DCT .

Since the exact coefficients of DCT image X DCT are not known by the decoder (because of the quantization 192 at the encoding side), the exact prediction noise cannot be modelled. However, the inventors have observed that using the dequantized DCT macroblocks X^ instead of the original DCT macroblocks X DCT provides a GGD modelling that is close to the theoretical modelling with X DCT .

For this, the modelling of the predictor noise thus comprises fitting a Generalized Gaussian Distribution model onto the differences between the coefficients in the transformed residual blocks Y and the dequantized transformed coefficients vDEC The same mechanisms based on the first and second moments as described above can be applied to obtain the two parameters α Ν ,βΝ (either for all the considered macroblocks Y and X^ E S C , or for each DCT channel of coefficients in those macroblocks).

Next, the merging 38 of the considered macroblocks Y and X is performed, in the DCT domain.

As introduced above, it is based on the probabilistic distribution of the DCT coefficient of X DCT that is obtained through the parameters ,,βι 21 of each DCT channel: and on the probabilistic distribution of the predictor noise as determined above (parameters CCN. PN possibly for the considered DCT channel): P(Z=z)=GGD(a N ,p N ,x) where Z is the estimated noise Y-X^ c .

For each coefficient / of the considered macroblocks, the merged value can take the form of a probabilistic estimation of the original DCT coefficients value, given the known quantization interval of this DCT coefficient, and an aside approximation of the coefficient resulting from its motion compensated temporal prediction (blocks Y). For example, a merged value according to the invention, denoted x t , may be the expectation (or the "expected value") of the considered coefficient given the quantization interval Q m associated with the value of the corresponding quantized transformed coefficient in X DEC and given its corresponding value Y 0 in the residual blocks Y. The quantization interval Q m is directly retrieved from the quantized DCT coefficient obtained from the bit-stream 20, since its value a m is the index of the quantization interval Q m given the quantizer used.

Such expectation is calculated based on the probabilistic distributions mentioned previously, for example

The probabilistic calculation of x. is illustrated in Figure 11.

In this Figure, the GGD distribution of X DCT as well as the statistical distribution of the prediction noise X DCT -Y 0 have been drawn. The quantization interval

Q m associated with the considered DCT coefficient is also indicated.

The two distributions are multiplied over the interval Q m to calculate the desired conditional expected value of X DCT . The integrals of those distributions can be computed using Riemann sums over the quantization interval.

Thanks to the invention, there is no need to force the obtained value Y 0 to be within the current quantization interval Q m . On the contrary, the fact that it could be outside that interval (as shown in the Figure) is thus taking into account to refine the DCT decoded coefficients.

The values x i calculated for the DCT coefficients of all the considered macroblocks are stored in memory to form, at least partially, the merged enhancement image corresponding to the current image to decode.

Since no fitting and merging step is applied to INTRA macroblocks

(because there is no motion information), their values obtained in the dequantized DCT macroblocks X^ c may be used to form the remaining part of the merged enhancement image. Finally, an entire merged enhancement image is obtained which is input to the inverse transform I DCT 34 as shown in Figures 2 and 5. The present invention has been illustrated and provides significant improvements in rate-distortion, of about several dBs. As explained in the detailed embodiment above, the invention may rely on:

- statistically modelling, at the encoder, the DCT coefficients of the residual enhancement image and then transmitting the obtained model parameters to the decoder;

- statistically modelling, at the decoder, the DCT coefficients of a predicted residual enhancement image (issued from temporal prediction using the motion information of the base layer); and

- probabilistically estimating (through a conditional expectation calculation) the combined DCT coefficients of these two residual enhancement images, based on their statistical modelling.

Figure 12 illustrates the performance of the present invention, in which the rate-distortion curves are plotted when the merging according to the invention is respectively not implemented and implemented.

The Figure shows that an improvement the codec rate distortion performance is obtained, especially at low bitrates. This may be understood intuitively, since the quantization intervals get larger as the bitrate decreases, therefore increasing the relevant information brought by the temporal DCT residuals Y compared to the quantization level obtained by the dequantization step 332.

One may then note that the invention also works for zero bitrate (meaning that no enhancement layer bitstream 20 is encoded or received by the decoder). In that case the parameters 21 ( ,,βι for each DCT channel) are received and are used with the parameters α Ν Ν calculated with the present invention to obtain an improvement of the decoding quality of the base layer by several dBs.

Furthermore, the above performance is obtained with no complexity cost at the encoding side and with no additional bitrate when the parameters 21 are already needed and transmitted (e.g. for selecting the quantizers and/or entropy decoding). The complexity increase due to the merging step remains reasonable at the decoding side.

With reference now to Figure 13, a particular hardware configuration of a device for decoding video bit-stream able to implement methods according to the invention is now described by way of example.

A device implementing the invention is for example a microcomputer 50, a workstation, a personal digital assistant, or a mobile telephone connected to various peripherals. According to yet another embodiment of the invention, the device is in the form of a photographic apparatus provided with a communication interface for allowing connection to a network.

The peripherals connected to the device comprise for example a digital camera 64, or a scanner or any other image acquisition or storage means, connected to an input/output card (not shown) and supplying image data to the device.

The device 50 comprises a communication bus 51 to which there are connected:

- a central processing unit CPU 52 taking for example the form of a microprocessor;

- a read only memory 53 in which may be contained the programs whose execution enables the methods according to the invention. It may be a flash memory or EEPROM;

- a random access memory 54, which, after powering up of the device 50, contains the executable code of the programs of the invention necessary for the implementation of the invention. As this memory 54 is of random access type (RAM), it provides fast access compared to the read only memory 53. This RAM memory 54 stores in particular the various images and the various blocks of pixels as the processing is carried out (transform, quantization, storage of the reference images) on the video sequences;

- a screen 55 for displaying data, in particular video and/or serving as a graphical interface with the user, who may thus interact with the programs according to the invention, using a keyboard 56 or any other means such as a pointing device, for example a mouse 57 or an optical stylus;

- a hard disk 58 or a storage memory, such as a memory of compact flash type, able to contain the programs of the invention as well as data used or produced on implementation of the invention;

- an optional diskette drive 59, or another reader for a removable data carrier, adapted to receive a diskette 63 and to read/write thereon data processed or to process in accordance with the invention; and

- a communication interface 60 connected to the telecommunications network 61 , the interface 60 being adapted to transmit and receive data.

In the case of audio data, the device 50 is preferably equipped with an input/output card (not shown) which is connected to a microphone 62. The communication bus 51 permits communication and interoperability between the different elements included in the device 50 or connected to it. The representation of the bus 51 is non-limiting and, in particular, the central processing unit 52 unit may communicate instructions to any element of the device 50 directly or by means of another element of the device 50.

The diskettes 63 can be replaced by any information carrier such as a compact disc (CD-ROM) rewritable or not, a ZIP disk or a memory card. Generally, an information storage means, which can be read by a micro-computer or microprocessor, integrated or not into the device for processing a video sequence, and which may possibly be removable, is adapted to store one or more programs whose execution permits the implementation of the method according to the invention.

The executable code enabling the coding device to implement the invention may equally well be stored in read only memory 53, on the hard disk 58 or on a removable digital medium such as a diskette 63 as described earlier. According to a variant, the executable code of the programs is received by the intermediary of the telecommunications network 61 , via the interface 60, to be stored in one of the storage means of the device 50 (such as the hard disk 58) before being executed.

The central processing unit 52 controls and directs the execution of the instructions or portions of software code of the program or programs of the invention, the instructions or portions of software code being stored in one of the aforementioned storage means. On powering up of the device 50, the program or programs which are stored in a non-volatile memory, for example the hard disk 58 or the read only memory 53, are transferred into the random-access memory 54, which then contains the executable code of the program or programs of the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention.

It will also be noted that the device implementing the invention or incorporating it may be implemented in the form of a programmed apparatus. For example, such a device may then contain the code of the computer program(s) in a fixed form in an application specific integrated circuit (ASIC).

The device described here and, particularly, the central processing unit 52, may implement all or part of the processing operations described in relation with Figures 1 to 19, to implement methods according to the present invention and constitute devices according to the present invention.

The above examples are merely embodiments of the invention, which is not limited thereby. In particular, when the reference base image for the inter encoding of the base layer may be selected from multiple reference images, the calculation of the DCT residual macroblocks Y is adapted, meaning that they must be predicted from the correct decoded UHD image temporally corresponding to the selected reference image for the base layer.

In the case of B images (i.e. the base image is an image bi-directionally predicted from decoded reference base images using motion information in each of the two directions), it may be decided to calculate separately transformed residual blocks Y, Y' for each direction, in which case the merging consists in merging together these two transformed residual blocks with the dequantized DCT blocks X^ c .

For example, this may result in the following calculation of the expectation

where ' Ν ,β'Ν and Y' 0 are similar to Ν Ν and Y 0 but for the second transformed residual block Y\

In a particular embodiment of the invention, the parameters (e.g. the filter strength parameter or the quantization-dependent parameter) of the deblocking filter 36 depends on the first and second probabilistic distributions ΘΘϋ(α,,βι) and GGD(a N ,p N ) used during the merging.

This is because the merging according to the invention may increase or modify the conventional blocking artefact due to the quantization. In this respect, adjusting the deblocking filter based on the probabilistic distributions that drives the merging permits to optimise this filtering.

As an example, if the DCT coefficients issued from the merging step significantly differ in neighboring blocks, then this results in a visual blocking artifact between the blocks. As a matter of fact, if the difference between the two blocks is high in the DCT level, then it may be decided to strengthen the deblocking filter process between the two blocks. This may be done by specifying a higher filter strength parameter, according to the deblocking filtering process of the H.264/AVC standard.