Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VIDEO DECODING
Document Type and Number:
WIPO Patent Application WO/2008/029346
Kind Code:
A2
Abstract:
A video decoder comprises a motion composition module, a reference frame buffer, an IQ/IDCT module, a quantification matrix store, and a summer. The motion composition module and the IQ/IDCT module are arranged to receive and perform operations on an encoded video stream comprising residual data and motion vectors. The summer is arranged to receive and perform operations on an output of the motion composition module and the IQ/IDCT module and to output a decoded video stream. The IQ/IDCT module is arranged to subsample the residual data using scaled quantification matrices stored by the quantification matrix store. The frame reference buffer is arranged to store a corresponding subsampled reference frame, and the motion composition module includes a % pixel interpolation filter.

Inventors:
MUTZ STEPHANE (FR)
DURIEUX PHILIPPE (FR)
Application Number:
PCT/IB2007/053556
Publication Date:
March 13, 2008
Filing Date:
September 04, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NXP BV (NL)
MUTZ STEPHANE (FR)
DURIEUX PHILIPPE (FR)
International Classes:
H04N7/50
Foreign References:
US5262854A1993-11-16
US20020154696A12002-10-24
Attorney, Agent or Firm:
WHITE, Andrew, G. et al. (IP DepartmentCross Oak Lane, Redhill Surrey RH1 5HA, GB)
Download PDF:
Claims:

CLAIMS

1. A video decoder comprising a motion composition module (16), a reference frame buffer (18), an IQ/IDCT module (20), a quantification matrix store (22), and a summer (24), the motion composition module (16) and the IQ/IDCT module (20) arranged to receive and perform operations on an encoded video stream (10) comprising residual data (28) and motion vectors (30), the summer (24) arranged to receive and perform operations on an output from the motion composition module (16) and the IQ/IDCT module (20) and to output a decoded video stream (32), wherein the IQ/IDCT module (20) is arranged to subsample the residual data (28) using scaled quantification matrices stored by the quantification matrix store (22), the frame reference buffer (18) is arranged to store a corresponding subsampled reference frame, and the motion composition module (16) includes a % pixel interpolation filter.

2. A decoder according to claim 1 , wherein the IQ/IDCT module (20), when subsampling the residual data (28) using scaled quantification matrices stored by the quantification matrix store (22), subsamples the residual data (28) by half horizontal downsampling.

3. A decoder according to claim 1 or 2, wherein the IQ/IDCT module (20), when subsampling the residual data (28) using scaled quantification matrices stored by the quantification matrix store (22), subsamples the residual data (28) by half vertical downsampling.

4. A decoder according to claim 1 , 2 or 3, and further comprising a scaling function, wherein the quantification matrix store (22) is arranged to generate the scaled quantification matrices by scaling the quantification matrices with a vector.

5. A method of operating a video decoder, the video decoder comprising a motion composition module (16), a reference frame buffer (18),

an IQ/IDCT module (20), a quantification matrix store (22), and a summer (24), the motion composition module (16) including a % pixel interpolation filter, the method comprising the steps of receiving an encoded video stream (10) comprising residual data (28) and motion vectors (30), subsampling the residual data (28) at the IQ/IDCT module (20) using scaled quantification matrices stored by the quantification matrix store (22), storing a corresponding subsampled reference frame in the frame reference buffer (18) and executing motion composition using the % pixel interpolation filter.

6. A method according to claim 5, wherein the step of subsampling the residual data (28) using scaled quantification matrices stored by the quantification matrix store (22), comprises subsampling the residual data (28) by half horizontal downsampling.

7. A method according to claim 5 or 6, wherein the step of subsampling the residual data (28) using scaled quantification matrices stored by the quantification matrix store (22), comprises subsampling the residual data (28) by half vertical downsampling.

8. A method according to claim 5, 6 or 7, and further comprising generating the scaled quantification matrices by scaling the quantification matrices with a vector.

Description:

DESCRIPTION

VIDEO DECODING

This invention relates to a video decoder and to a method of operating such a video decoder.

In the field of digital video transmission, for example in a television broadcast environment, the original moving pictures (the video) and the accompanying audio are encoded according to an agreed standard. One very widely used standard is MPEG2, which is a video and audio coding and compression algorithm that enables a data rate of typically from 2Mb/s up to 10Mb/s for standard resolution. In relation to the transmission of the video portion of a broadcast, the MPEG2 standard defines three different types of frame data, an l-frame, a P-frame and a B-frame (respectively, an intra frame, a forward predicted frame and a bi-directionally predicted frame).

The MPEG2 video compression scheme exploits temporal correlation of successive images in a video sequence (meaning that two consecutives frames are often very similar). It is therefore possible to predict the content of a frame in the sequence based on other frames from this sequence (called reference frames) thereby reducing the amount of information to be transferred. As part of the method of achieving this, each frame to be coded is subdivided into a set of 16x16 pixel blocks called macroblocks (MB).

In MPEG 2, for each MB, a search of a block of pixels with a similar content (called a motion predictor) is made in the reference frames (the l-frame and/or P-frame). Only the location of the motion predictors in the reference frames and the difference between the motion predictors and the content of the MB to be coded is transmitted (this difference is called the residue). In the case of MPEG2, this search is made with Vi pixel accuracy and up to 2 reference frames are used.

To decompress a video sequence, an MPEG2 decoder is forced to keep the content of the reference frames in memory. The decoder reads the

block of pixels from the references frames based on the motion vectors information stored in the MPEG stream, filters them to restore the 14 pixel accuracy of the motion predictors and combines them with the residue information to get the pixels of the original frame. In typical hardware or software implementations, the reference frames are store in a memory buffer. Therefore, the cost of implementation of the decoder function depends on the size of the reference frames.

A significant amount of research and innovation exists in the field of improving MPEG2 decoding (and, obviously, in other aspects of MPEG2 such as encoding etc.). For example, United States Patent Application Publication US 2002/0154696 discloses systems and methods for MPEG subsample decoding. This document relates to the broadcast of High Definition Television (HDTV), which is a standard that involves the broadcast of a much larger amount of information, as much as 5Mbis/s for the input bit rate (90 MByte/s for the uncompressed video stream), than is the case with standard definition MPEG2. With this emergence of MPEG2 High Definition broadcast, there is a need to optimise system costs of new MPEG2-HD receiver/decoder, compared to existing MPEG2-SD receiver/decoder, thus especially in term of memory footprint and memory bandwidth required for the MPEG2 decoding process. To solve this problem, a large number of potential solutions are discussed in this prior art document. For example, it is stated that a class of methods to solve this problem use algorithms that are executed on the receiver. These methods attempt to reduce the size of the decompressed video images and the associated reference buffers. These reductions in size have an effect of reducing memory footprint for the buffers, reducing memory bandwidth for processing the decompressed video images, and reducing image resampling computational requirements. Most of these algorithms entail reducing the number of samples in frames in the horizontal and vertical directions by a factor of 2 N , where N is normally 1. One method, disclosed in US 2002/0154696, involves resampling the video frame, after the frame has been decompressed using an MPEG decoder and prior to storing the decompressed frame in memory. This method can

reduce memory footprint by a factor of four if the video frame is subsampled by a factor of two in the horizontal and vertical directions. This involves subsampling motion vectors by a factor of two, then upsampling fetched motion reconstruction data by a factor of two in the horizontal and vertical directions. In a parallel operation, frequency coefficients are dequantized and passed through an IDCT (inverse discrete cosine transform) module, which converts the coefficients back into spatial domain data. The spatial domain data and the upsampled fetched motion reconstruction data are then summed by a summer. The output of summer is then subsampled by a factor of two in each direction.

This method is hindered by the fact that the output subsampling may require some extra buffering in order to allowing vertical filtering. Also, for relatively static scenes or constant pans, the error terms coming from the IDCT are nearly zero, which results in effectively the same image data being upsampled and downsampled many generations. This generational loss progressively degrades the image quality until an l-frame is decoded, in which case the image is refreshed. This results in a "beating" effect that is most noticeable and irritating to the viewer.

It is therefore an object of the invention to improve upon the known art.

According to a first aspect of the present invention, there is provided a video decoder comprising a motion composition module, a reference frame buffer, an IQ/IDCT module, a quantification matrix store, and a summer, the motion composition module and the IQ/IDCT module arranged to receive and perform operations on an encoded video stream comprising residual data and motion vectors, the summer arranged to receive and perform operations on an output from the motion composition module and the IQ/IDCT module and to output a decoded video stream, wherein the IQ/IDCT module is arranged to subsample the residual data using scaled quantification matrices stored by the quantification matrix store, the frame reference buffer is arranged to store a corresponding subsampled reference frame, and the motion composition module includes a % pixel interpolation filter.

According to a second aspect of the present invention, there is provided a method of operating a video decoder, the video decoder comprising a motion composition module, a reference frame buffer, an IQ/IDCT module, a quantification matrix store, and a summer, the motion composition module including a % pixel interpolation filter, the method comprising the steps of receiving an encoded video stream comprising residual data and motion vectors, subsampling the residual data at the IQ/IDCT module using scaled quantification matrices stored by the quantification matrix store, storing a corresponding subsampled reference frame in the frame reference buffer and executing motion composition using the % pixel interpolation filter.

Owing to the invention is it possible to provide a low cost MPEG2 decode. In the context of a multi-standard video decoding engine, this invention aims at enabling MPEG2 decoding with half resolution reference frames by reusing MPEG4 motion compensation mechanism. The key advantages are a lower system cost due to downscaled reference frames (lower memory footprint and bandwidth) and very limited extra logic as the most costly part is a direct reuse of the MPEG4 motion compensation unit.

The memory footprint and bandwidth required to decode high definition video sequences has a significant impact on the cost of consumer equipment. The present invention has a large potential for application because high definition sources are increasingly deployed, many consumer devices are already supporting MPEG4 decoding (e.g. for DivX playback), many displays do not offer full resolution HD rendering, and consumers will be interested in watching content distributed in high definition only without bearing the high cost associated to high definition rendering.

An efficient solution is achieved by subsampling the reference pictures and the residues, for example, by a factor 2 in the horizontal and/or vertical direction and rebuilding the motion predictors with the % pixel interpolation filter of the MPEG4 motion compensation process thereby restoring the 14 accuracy of the MPEG2 motion predictor from the subsampled reference frames. The additions made to an MPEG2/MPEG4 capable decoder are only the residue coefficient scaling step.

In the video decoder, motion vectors are divided by a factor of two. This division means that each vector has a quarter pixel resolution, and the motion composition module re-uses the MPEG4 quarter pixel interpolation filter. In the prior art system, there is a simple division by two of motion vectors which results in a loss of information. The video decoder of the prior art system is configured to use a simple MPEG2 bi-linear interpolation filter that operates only with a 1/2 pixel accuracy. With the present invention, there is no loss of motion accuracy as the video decoder is working with quarter pixel accuracy motion vectors. In this way, there is a very large reduction in the loss induced by the reference frame sub-sampling. The video decoder of the present invention is working with quarter pixel accuracy motion vectors/interpolation filters (as defined in MPEG4).

Advantageously, the IQ/IDCT module, when subsampling the residual data using scaled quantification matrices stored by the quantification matrix store, subsamples the residual data by half horizontal downsampling. Additionally or alternatively, the IQ/IDCT module, when subsampling the residual data using scaled quantification matrices stored by the quantification matrix store, subsamples the residual data by half vertical downsampling. By reducing the frame data by half in either or both directions, the amount of data needed to be stored as a reference frame is greatly reduced.

Preferably, the video decoder further comprises a scaling function, wherein the quantification matrix store is arranged to generate the scaled quantification matrices by scaling the quantification matrices with a vector. In the video decoder, a scaling function can be used for subsampling, in the frequency domain in the IQ/IDCT module, which results in the quantization tables, that are being used, being scaled. High frequency cut off is performed by downscaling quantization tables in the higher frequencies. With this process, no subsampling filter is required, and a simple decimation is applied to get a 4x8 block at the output of the IQ/ IDCT module. As decimation is done at the output of the IDCT (there is already a 4x8 residual transform block), there is no requirement for an up-sampling at the output of the motion compensation module, before the reconstruction.

Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:-

Figure 1 is a schematic diagram illustrating the video standard MPEG2, Figure 2 is a schematic diagram of a prior art MPEG2 video decoder,

Figure 3 is a schematic diagram of an MPEG2 video decoder, according to an embodiment of the invention,

Figure 4 is a further schematic diagram of the MPEG2 video decoder of Figure 3, and Figure 5 is a flowchart of a method of operating a video decoder.

Figure 1 shows an MPEG2 video stream 10, which consists of a series of successive frames 12. The data that is transmitted for each frame 12 comprises residual data and motion vectors, although for each l-frame only residual data is transmitted, there are no motion vectors. One principal way in which MPEG 2 decreases the amount of data that is to be transmitted is by using the motion vectors to indicate portions of other frames that contain the image data to be used. In the example of Figure 1 , which is a "talking head" against a static background, once the first frame (an l-frame) has been transmitted, then much of that frame can be recycled for use in the later frames.

Figure 2 illustrates an example of a conventional MPEG2 video decoder 8, which receives the video stream 10. The video decoder 8 comprises a motion composition module 16, a reference frame buffer 18, an IQ/IDCT module 20, a quantification matrix store 22, and a summer 24. The video stream 12 is received by an MPEG2 VLD 26, which separates the residual data 28 and the motion vectors 30 for each frame 12.

The motion composition module 16 and the IQ/IDCT module 20 are arranged to receive and perform operations on the encoded video stream 10, which comprises the residual data 28 and the motion vectors 30. The summer 24 is arranged to receive and perform operations on an output from the motion composition module 16 and the IQ/IDCT module 20 and to output a decoded

video stream 32. The summer 24 is simply adding together the outputs of the modules 16 and 20 to create an entire frame. That frame is used in the rendering of the video stream 10, but is also transmitted via a feedback loop 34 to the reference frame buffer 18. The reference frame buffer 18 stores complete frames that are then used when P-frames and B-frames are received. These frames 12 refer to data within one or more other frames 12 (via their respective motion vectors 30). For example, in the stream 10 of Figure 1 , the fourth frame is a P-frame that refers back to the original (first received) l-frame, which l-frame will be stored in the reference frame buffer 18. The motion vectors for the later P- frame will define one or more portions of the l-frame which are to be re-used when the decoder 14 is reconstructing that individual frame.

The motion composition module 16 will be instructed via the vectors 30 which parts of the l-frame to retrieve from the reference frame buffer 18, and these are then passed to the summer 24 to be added to the residual data for that frame to recreate the original frame. As for the first l-frame, that frame is used in the rendering of the video stream 10, but is also transmitted via a feedback loop 34 to the reference frame buffer 18, as other frames 12 may refer to that P-frame. Indeed in the sequence of eight frames 12 shown to be forming the video stream 10 in Figure 1 , six other frames refer to data within the fourth P-frame 12.

Figure 3 shows a first embodiment of an improved MPEG2 video decoder 14, according to an example of the invention. This decoder 14 has the same functional components as the prior art decoder 8 in Figure 1 , but the operation of the decoder 14 in Figure 2, is substantially amended, and some of the functioning of the individual components is changed, to provide an improved but still effective video decoder, that has a greatly reduced storage requirement for the reference frame buffer 18.

In the decoder 14 of Figure 3, the IQ/IDCT module 20 is arranged to subsample the residual data 28 using scaled quantification matrices stored by the quantification matrix store 22. These scaled matrices can be generated by reducing weight of high frequency coefficients inside the standard MPEG

quantification matrices, thus to generate a set of matrices that will scale the residual data as it is reconstituted by the IQ/IDCT module 20.

The quantisation matrixes are scaled down, with a vector. int rescale_table[8]= {

{256, 242, 218, 176, 128, 79, 37, 9}, // Vi filter

};

in H direction for (i=0;i<8;i++) matrix_coeff[i][j]= (rescale_table[i] * matrix_coeff[i][j])/256;

in V direction for (j=0;j<8;j++) matrix_coeff[i][j]= (rescale_table[j] * matrix_coeff[i][j])/256;

Scaling is applied in horizontal and/or vertical directions. In the example of the video decoder 14 shown in Figure 3, the scaled quantification matrices are designed to produce a half horizontal downsampling of the data to produce a 4x8 block of pixel data, instead of the usual 8x8 residual data that would be produced in the MPEG2 decoder 14 of Figure 2. Downsampling could occur in the vertical plane at the same time, reducing the data, for example, by a factor of two in both directions (x and y). The extent of the downsampling is a design choice.

This subsampling process can be performed by different means but a straightforward method used here is to apply a vector to the residue coefficients before the IQ/IDCT step, as a scaling function. This processing can be achieved with minimum resources by multiplying the quantification matrices by a constant matrix to scale down the high frequency components.

Figure 3 is illustrating the operation of the new decoder 14 when an I- frame is received. As discussed above, an l-frame has no motion vectors in the frame data which is received in the video stream 10 for that specific frame.

An l-frame consists purely in residual data, which defines all of the pixels for

that frame. In Figure 3, the motion composition module 16 does not receive any motion vectors 30, and does not execute any processing, when an l-frame is received. The summer 24 produces an 8x16 block of the original picture, which is rendered and also stored. There are 4 luminance DCT blocks per Macroblock (so you get an 8x16 MB with 4 times 4x8).

When the frame has passed through the summer 24 to be rendered, the frame is transmitted, via the feedback loop 34, to the frame reference buffer 18, which is arranged to store the corresponding subsampled reference frame. All of the reference frames that are to be stored in the reference frame buffer 18 are reduced in size. The frame data has been subsampled by the IQ/IDCT module 20 during the IQ/IDCT stage of the MPEG2 decode.

This video decoder 14 of Figure 3 is configured to operate in such a way that the decoding of MPEG2 video streams is achieved with a reduced amount of memory and a reduction in the memory bandwidth required by a typical decoder.

Figure 4 shows the embodiment of the improved MPEG2 video decoder 14, according to an example of the invention, when the decoder 14 has received a P-frame. This P-frame will have been received following an l-frame, which has been stored in the reference frame buffer 18 in its scaled format. In the general operation of an MPEG2 video decoder 14, received frames that are not l-frames cannot be decoded until an l-frame has been received by the decoder. So for example, if a user turns on a decoder (for example by turning on a television), then no picture will be rendered until an l-frame has been received. The same is true when a user changes channel on a digital television that uses MPEG2.

The two parts of the P-frame that are received by the MPEG VLD 26, being the residue data 28 and the motion vectors 30, are split at this point, with the residue data 28 passing to the IQ/IDCT module 20 and the motion vectors 30 passing to the motion composition module 16. The residue data 28 is handled in exactly the same way as the residue data of an l-frame, described above, for example, with reference to Figure 3. The data of the P-frame is

scaled according to the filtered quantification matrices stored in the store 22 and passed to the summer 24.

The motion vectors 30 are passed to the motion composition module 16, where they are scaled by a factor of two, to match the spatial resolution of the reference frames stored by the buffer 18. The motion composition module 16 includes a % pixel interpolation filter. The motion predictor reconstruction can be performed by reusing an MPEG4 module of the motion compensation unit or code. Motion vectors with % pixel accuracy are derived from the original MPEG2 motion vectors by a simple division by a factor of 2 to match the reduced resolution of the subsampled references frames. The motion predictors are computed using the sophisticated 8 tap MPEG4 % pixel filter and then used in the classical MPEG2 decoding process.

The result of this process is a downscaled decoded picture by a factor of two in the horizontal and/or vertical direction. This invention provides an improved implementation of a decoder that uses downscaled reference frames. This is achieved by reusing existing components used for MPEG4 decoding thereby lowering the implementation cost.

Figure 5 shows a flowchart of the method of operating the video decoder 14. The method comprises the steps of firstly receiving 510 the encoded video stream 10 (which comprises residual data 28 and motion vectors 30). The residual data 28 for each frame 12 passes to step 512, where subsampling 512 of the residual data 28 by the IQ/IDCT module 20 using scaled quantification matrices stored by the quantification matrix store 22 takes place. The motion vectors 30 for each frame 12 are passed to step 514 where the motion composition module 16 executes motion composition using the % pixel interpolation filter. This portion of the method only occurs for P-frames and B-frames of the MPEG2 video decode, as discussed above.

Once the processing of the two parts of the frame has taken place, then the next step 516 is the summing of the two portions to create the finished frame for rendering. Following this is the stage 518, which comprises the step

of storing a corresponding subsampled reference frame in the frame reference buffer 18, for use as desired in the motion reconstruction of other frames.

The purpose of the invention is to reduce the cost of implementation by creating and using smaller reference frames in the decoding process. This will both lower the required memory footprint and the amount of traffic required to decode the original picture. A cheaper and more efficient way of implementing the decoder is achieved by reusing part of the MPEG4 decoding process to make it possible at minimal cost.