Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND DEVICE FOR CODING AND DECODING DATA
Document Type and Number:
WIPO Patent Application WO/2007/105951
Kind Code:
A1
Abstract:
Method and encoding unit for coding a video image in which block codes (Bmn) are assigned to unique blocks of the video image, following which 2 or more block codes are combined in order thus to form a pair of block codes, and combination codes (CCij) are assigned to these pairs of block codes. Subsequently, the process is continued recursively by combining 2 or more combination codes (CCij) to form pairs of combination codes and in turn assigning codes to the latter, until only 1 final code remains. The invention also relates to a decoding method and decoding unit for decoding this one final code.

Inventors:
VAN DEN BOOM IPO PAULUS WILLEM (NL)
VAN SCHAIK FRANCOIS AUGUSTIN (NL)
Application Number:
PCT/NL2007/050102
Publication Date:
September 20, 2007
Filing Date:
March 14, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VAN DEN BOOM IPO PAULUS WILLEM (NL)
VAN SCHAIK FRANCOIS AUGUSTIN (NL)
International Classes:
H04N7/28
Foreign References:
EP0633701A21995-01-11
Other References:
NASRABADI N M ET AL: "IMAGE CODING USING VECTOR QUANTIZATION: A REVIEW", IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 36, no. 8, 1 August 1988 (1988-08-01), pages 957 - 971, XP000052119, ISSN: 0090-6778
NELSON M R: "THE ZLIB COMPRESSION LIBRARY A COMPRESSION ENGINE FOR EVERYONE", DR. DOBB'S JOURNAL, M&T PUBL., REDWOOD CITY, CA,, US, vol. 22, no. 1, January 1997 (1997-01-01), pages 36,38,40 - 41, XP001004599, ISSN: 1044-789X
GERSHO A ET AL: "ADAPTIVE VECTOR QUANTIZATION BY PROGRESSIVE CODEVECTOR REPLACEMENT", INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING. ICASSP. TAMPA, FLORIDA, MAR. 26 - 29, 1985, NEW YORK, IEEE, US, vol. VOL. 1 CONF. 10, 26 March 1985 (1985-03-26), pages 133 - 136, XP001176990
GOLDBERG M ET AL: "IMAGE SEQUENCE CODING USING VECTOR QUANTIZATION", IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. COM-34, no. 7, July 1986 (1986-07-01), pages 703 - 710, XP002280029, ISSN: 0090-6778
GOLDBERG M ET AL: "IMAGE COMPRESSION USING ADAPTIVE VECTOR QUANTIZATION", IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 34, no. 2, February 1986 (1986-02-01), pages 180 - 187, XP008074559, ISSN: 0090-6778
BUZO A ET AL: "SPEECH CODING BASED UPON VECTOR QUANTIZATION", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE INC. NEW YORK, US, vol. ASSP-28, no. 5, 1 October 1980 (1980-10-01), pages 562 - 574, XP000670426, ISSN: 0096-3518
Attorney, Agent or Firm:
VAN WESTENBRUGGE, Andries (LS Den Haag, NL)
Download PDF:
Claims:

CLAIMS

1. Method for coding at least one video image, which video image comprises a number of colour channels (RGB; YUV) and, per colour channel, a set of blocks (Bm n ; m = 1 , 2, .. ,, M; n = 1 , 2, ...N) stored in a memory (7), each block (Bmn) representing two or more pixels and having a specific value, comprising a) reading a first block (Bi i) of the video image out of the memory; b) storing the first block (Bi i) in a block database (BDB); c) providing a unique block code (Ci i) to the first block (Bn), which unique block code (Ci i) is stored in the memory and is in a fixed relationship to the first block (Bi i); d) reading a next block (Bmn, m = 1 , 2, ... , M; n = 1 , 2, ...N) out of the memory; e) comparing the read-out next block (Bmn) to the contents of the block database (BDB); if the next block (Bmn) is not known yet, continuing with operation f); if the block (Bmn) is identical to a known block in the block database (BDB), continuing with operation g);

1) storing the next block in the block database (BDB); assigning a new block code (Cmn) to this next block; storing the new block code (Cmn) in the memory; continuing with operation h); g) reading out the value of the block code of the known block and placing the value of the block code of the known block in the set of block codes in the memory; continuing with operation h); h) returning to operation d) if not all blocks (Bmn) have been read yet; i) selecting a pair of block codes (C π , C12); j) storing the pair of block codes (Cn, C12) and a combination code (CCn) in a frame database (FDB) in the memory, which combination code (CCn) refers in a unique manner to the values of the pair of block codes (Cn, C 12); k) selecting a new pair of block codes;

1) comparing this new pair of block codes to all pairs of block codes already stored in the frame database (FDB); m) storing anew combination code (CCij,i = l, 2, 3, ...;j = 1, 2, 3, ...) with the associated new pair of block codes in the frame database (FDB) if the new pair of block codes does not yet exist in the frame database (FDB); selecting a

further new pair of block codes and returning to operation k) if not all pairs of block codes have been selected yet, and otherwise continuing with operation o); n) storing the combination code (CCij) in the frame database (FDB) which is associated with the known block code if the new pair of block codes corresponds to a known block code already stored in the frame database (FDB); selecting a further new pair of block codes and returning to operation k) if not all pairs of block codes have been selected yet, and otherwise continuing with operation o); o) recursively repeating operations i) to n) for all combination codes (CCij) obtained in a previous series of operations i) to n), with the proviso that where operations i) to n) refer to a "block code", this is replaced by "combination code", until the series of operations i) to n) results in 1 last combination code.

2. Method according to Claim 1 , characterized in that a compression algorithm is used when storing the blocks in the block database.

3. Method according to Claim 2, characterized in that the compression algorithm is the deflate algorithm.

4. Method according to one of Claims 1-3, characterized in that the block codes and combination codes are arranged in a matrix and the matrix is traversed alternately horizontally and vertically when operations i) to n) are being recursively repeated.

5. Method according to one of Claims 1-3, characterized in that the block codes and combination codes are arranged in a matrix and in that the matrix is traversed diagonally when operations i) to n) are being recursively repeated one or more times.

6. Method according to one of Claims 1 - 5, characterized in that a compression algorithm is used when data are stored in the frame database (FDB).

7. Method according to Claim 6, characterized in that the compression algorithm is the deflate algorithm.

8. Method according to one of the preceding claims, in which the method is carried out for each of the three colour channels (RGB; YUV) and the resulting data from the frame database are stored linearly in a sequence database (SDB).

9. Method according to one of the preceding claims, in which the image has pixels with RGB values, comprising

• transforming the RGB values into YUV values, the UV values representing a specific resolution;

• scaling the resolution of the UV values.

10. Method according to one of the preceding claims, in which the image has pixels and the method is started by replacing each pixel value by an average value of one or more adjacent pixels.

11. Method according to one of the preceding claims, in which the image has pixels with first pixel values and the first pixel values are compared individually with second pixel values of pixels of a next image and are replaced by the second pixel values if the second pixel values differ by less than a threshold value from the first pixel value.

12. Method according to one of the preceding claims, in which the pixel values are subjected to a quantization operation.

13. Method according to one of the preceding claims, in which successive images of a video are coded and only difference values between successive pictures are coded.

14. Method according to one of the preceding claims, in which a series of images of a video film are subdivided into separately coded subseries, the method starting again for each subseries.

15. Method for decoding a set of combination codes (CCij) with associated pairs of codes in a frame database (FDB) as obtained by the method according to one of Claims 1-14, comprising:

• replacing a last combination code (CCij) with an associated pair of codes;

• treating each obtained code as a combination code and replacing these combination codes with an associated pair of codes;

• recursively repeating the previous operation, until all combination codes (CCij) are replaced and a set of block codes (Cmn) has been obtained; • replacing each block code (Cmn) with an associated block (Bmn).

16. Encoding unit comprising a processor (5) and a memory (7) with data and instructions in order to allow the processor to code at least one video image, which video image comprises a number of colour channels (RGB; YUV) and a series of blocks (Bmn; m = 1 , 2, ... , M; n = 1 , 2, ...N) stored in the memory (7) for each colour channel, each block (Bmn) representing two or more pixels and having a specific value, the processor being designed for: a) reading out a first block (Bu) of the video image from the memory; b) storing the first block (Bi i) in a block database (BDB); c) providing a unique block code (C u ) to the first block (Bn), which unique block code (Cn) is stored in the memory and has a fixed relationship with the first block (Bn); d) reading out a next block (Bmn, m = 1 , 2, ... , M; n = 1 , 2, ...N) from the memory; e) comparing the read-out next block (Bmn) to the contents of the block database (BDB); continuing with operation f) if the next block (Bmn) is not known yet; continuing with operation g) if the block (Bmn) is identical to a known block in the block database (BDB); f) storing the next block in the block database (BDB); assigning a new block code (Cmn) to this next block; storing the new block code (Cmn) in the memory; continuing with operation h); g) reading the value of the block code of the known block out and placing the value of the block code of the known block in the set of block codes in the memory; continuing with operation h); h) returning to operation d) if not all blocks (Bmn) have been read yet; i) selecting a pair of block codes (Ci i , C12); j) storing the pair of block codes (Ci 1, C12) and a combination code (CCi 1) in a frame database (FDB) in the memory, which combination code (CCn) refers in a unique manner to the values of the pair of block codes (Cn, C12);

k) selecting a new pair of block codes;

1) comparing this new pair of block codes to all pairs of block codes already stored in the frame database (FDB); m) storing a new combination code (CCij, i = 1, 2, 3, ...;j = 1, 2, 3, ...) with the associated new pair of block codes in the frame database (FDB) if the new pair of block codes does not yet exist in the frame database (FDB); selecting a further new pair of block codes and returning to operation k) if not all pairs of block codes have been selected yet, and otherwise continuing to operation o); n) storing the combination code (CCij) in the frame database (FDB) which is associated with the known block code if the new pair of block codes corresponds to a known block code which is already stored in the frame database (FDB); selecting a ftirther new pair of block codes and returning to operation k) if not all pairs of block codes have been selected yet, and otherwise continuing to operation o); o) recursively repeating operations i) to n) for all combination codes (CCij) obtained in a previous series of operations i) to n), with the proviso that where operations i) to n) refer to a "block code", this is replaced by "combination code", until the series of operations i) to n) result in 1 last combination code.

17. Transmitter comprising an encoding unit according to Claim 16.

18. Decoding unit comprising a processor (21) and a memory (23) with a frame database (FDB), in which the processor (21) is designed for decoding a set of combination codes (CCij) with associated pairs of codes from the frame database (FDB), as obtained by the method according to one of Claims 1-14, according to:

• replacing a last combination code (CCij) by an associated pair of codes;

• treating each code obtained as a combination code and replacing these combination codes with an associated pair of codes;

• recursively repeating the previous operation until all combination codes (CCij) have been replaced and a set of block codes (Cmn) has been obtained;

• replacing each block code (Cmn) by an associated block (Bmn).

19. Receiver comprising a decoding unit according to Claim 18.

20. System comprising a transmitter according to Claim 17 and a receiver according to Claim 19.

21. Signal comprising a set of combination codes (CCij) as obtained by the method according to one of Claims 1-14.

22. Computer program product comprising data and instructions which can be read in by a computer in order to allow the computer to perform the coding method according to one of Claims 1-14.

23. Computer program product comprising data and instructions which can be read in by a computer in order to allow the computer to perform the decoding method according to Claim 15.

24. Data carrier comprising a computer program product according to Claim 22 or

23.

*****

Description:

Method and device for coding and decoding data.

Background.

The invention relates to a method and device for coding and decoding data, in which a data reduction operation is carried out during coding.

Prior art.

A digital image is a matrix of B * H pixels (pixel = picture element). The colour of each pixel is determined by a certain value of D bits. A complete image is thus described by B * H * D bits. Usually, successive images are incorporated into "frames", with 1 frame comprising the audio and video information of 1 image.

Example:

An image of 720 x 576 pixels (the standard resolution for a DVD film) with 24 bits per pixel (truecolor, for each of the three colour channels red, green and blue 8 bits) is represented by 720 x 576 x 24 = 9,953,280 bits or 1,244,160 bytes or 1.19 MB: 8 bits = 1 byte, 1024 bytes = 1 KB (KiloByte), 1024 KB = 1 MB (MegaByte).

The typical playback speed for a video film is 25 frames per second (jps). This means that, using the data from the example above, one second of film contains 30 MB per second, i.e. more than 100,000 MB per hour. Although the usual storage capacity of computers is quite high, it will not be possible, in practice, to store 100 GB (GigaByte = 1024 MB) per hour of film. A DVD, for example, has a storage capacity of between 5 and 17 GB. Naive storage, i.e. where all bits of the film are stored, can thus not be used in practice.

There are essentially two different ways of reducing the storage capacity which is required for information:

1) Modifying the input information: in this case, for example, irrelevant parts of the information are removed, for example non-audible frequencies in the case of sound or non-visible parts in the case of image information; or the input information is modified in another manner in such a way that successive

compression stages become more efficient. With this kind of data reduction, it is not possible to retrieve the original information in its original form. 2) Loss-free encoding of input information: in this case, the information is converted into another representation, in which data are compressed in such a manner that it is possible to store the same information using less space. The original information can always be retrieved without any loss.

Blockwise coding is known per se from the prior art. The reader is, for example, referred to A. Gersho e.a., "Adaptive Vector Quantization by Progressive Codevector Replacement " ", Proceedings of the IEEE International Conference on Acoustics,

Speech, and Signal Processing, Tampa, USA, 26-29 March 1985, pp. 133-136; N.M. Nasrabadi, e.a., "Image Coding Using Vector Quantization: a Review", IEEE Transactions on Communications, 36(1988) August, No. 8, New York, USA, pp. 957- 971; EP-A-O 633 701.

In the case of the block codings disclosed in these publications, vectors are assigned to blocks, which vectors are, for example, an address or refer to an address where a block associated with the respective vector is stored in a memory. None of these publications discloses the concept of grouping vector codes into new blocks, subsequently assigning vectors to the groups of vector codes thus formed, etc.

Short description of the invention.

It is an object of the invention to provide a method for data reduction in such a manner that the same amount of data can be stored using considerably less storage space.

The invention is characterized by several independent claims which are attached. Embodiments are claimed in dependent claims.

Various tests with video data streams compressed according to the invention have shown that a reduction by a factor of approximately 15-20 can be achieved without a significant loss of quality.

The invention will be explained in more detail below with reference to a number of figures. The figures only show embodiments and are not intended to limit the inventive idea which is defined by the attached claims. Technically equivalent solutions are deemed to fall within the scope of the claims.

It should be noted that the invention in the present patent application is mainly discussed with reference to image data. However, the invention applies equally well to sound data and text data. In the case of text data, it is not possible to use operations which result in the loss of information. Consideration maybe given to the calculation of mean values and quantization.

In the case of text data, the pages of a book can, in principle, be considered in the same way as frames of a film. In many cases, there are only two colours: white (background) and black (text). Each line of text results in one code line. All 'code lines' on a page taken together result in a page code for a page. All page codes together in turn result in a book code for the book in question.

Brief description of the figures.

Figure 1 shows a diagrammatic arrangement of an encoding unit and a decoding unit; Figure 2 shows a diagrammatic overview of blocks of data in an image of, for example, a video;

Figure 3 shows a diagrammatic illustration of memory blocks in a memory; Figure 4 shows a simplified overview of the method according to the invention.

Description of embodiments. 1. Introduction.

Figure 1 shows a general arrangement of a system in which the present invention may be used. The system comprises an encoding unit 1, which may also act as a transmitter. The encoding unit 1 comprises a processor 5 which is connected to a memory 7. The processor 5 is also connected to an input/output device (I/O) 9 and an I/O 11. Via the I/O 9, the processor 5 is able to receive data via a connection 13. The connection 13 may be a wire, but may also be implemented in wireless fashion. For the invention, it does not matter which wavelength of electromagnetic radiation is used for a possible

wireless connection. Via the I/O 11, the processor 5 is able to output data via a connection 15, which may also be a wire or a wireless connection.

In the memory, one or more programs and data are stored which enable the processor 5 to provide a certain functionality, such as the coding of data according to the invention and the subsequent transmission thereof.

Figure 1 shows various ways in which the coded data produced by the processor 5 may be transmitted. Transmission may be effected, for example, via a wire. The double arrow 31 indicates that the data produced can alternatively be transmitted to a receiver in wireless fashion. This may be in the context of radio, television, or from one terminal to another terminal. A terminal is, for example, a mobile phone, a PDA (Personal Digital Assistant), etc. Alternatively, the data produced may be stored on a data carrier, such as an optical disk (for example a CD, a DVD), a magnetic tape 35, or a memory card 37, such as a memory stick. Obviously, other methods of storage are likewise possible.

The signal produced by the encoding unit 1 is received by a receiver or decoding unit 3. The decoding unit 3 comprises a processor 21 which is connected to a memory 23. The processor 21 is also connected to an input/output device (I/O) 19 and an I/O 25. Via the I/O 19, the processor 21 can receive the data via a connection 17. Via I/O 25, the processor 21 can output data, for example audio data, to a loud speaker (headphone) and/or video data to a monitor or display 29.

The decoding unit 3 can have any desired implementation, such as a CD or DVD player, a tape recorder, a player based on data stored on a memory chip (such as currently an MP3 player), a mobile phone, a PDA, a Blackberry, etc.

The memories illustrated may comprise one or more of the following: a hard disk, Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM) and Random Access Memory (RAM). Not all these types of memory necessarily have to be present. In addition, they do not have to be arranged physically close to the respective processor. They may also be arranged at a distance therefrom.

In one embodiment, the respective processors are connected to means for inputting instructions, data, etc. by a user, such as a keyboard and a mouse. Other input means, such as a touch screen, a track ball and/or speech converter, which are known to those skilled in the art, may also be used.

It is also possible to provide a reading unit connected to the respective processors. Such a reading unit is designed for reading data on a data carrier, such as a floppy disk, a memory stick, a magnetic tape, a CD-R, a DVD-R or a CD-ROM.

The respective processors may be connected to a printer for printing output data on paper, as well as a display unit, for example a monitor or LCD (Liquid Crystal Display) screen, or any other type of display unit which is known to the person skilled in the art.

The processors may be connected to a communications network, for example the PSTN (Public Switched Telephone Network), a local network (LAN = Local Area Network), a wide area network (WAN = Wide Area Network), the internet, etc. by means of suitable input/output means for receiving and sending data.

The processors may be implemented as a self-contained system or as a number of processors operating in parallel, each of which is designed to perform subtasks of a larger program, or as one or more main processors having various subprocessors. Parts of the functionality of the invention may even, if desired, be carried out by remote processors which communicate with the respective processor via a network.

Below, the invention will be explained in more detail with reference to video data, but the invention is not limited thereto. Thus, the invention can also be used, for example, for sound data and text data. In the case of text data, the preparatory stage is omitted if it causes loss of information.

2. Encoding unit

2λ Introduction

The encoding unit 1 converts digital image information, for example associated with a film, into a compressed format, which makes it possible to store the information in a space-saving manner. In this case, the coding of an image takes place in two stages: the preparation of the image and the coding of the information.

2.2 Preparation of the image

The aim of the first stage is to convert the image information of the original image into a representation which makes the second stage (the coding) more efficient. In this first stage, use is made in particular of the first option of achieving data reduction, which was discussed in the introduction: modification of information by omitting non-relevant information. The human eye has certain limitations which make it possible to omit information from pictures without leading to a (highly) visible loss in quality. The first stage of the encoding unit 1 uses this fact in order to simplify the information in the images. Several techniques may be used for the simplification, which will be discussed below.

2.2.1 Conversion of RGB into YUV

Screens display a variety of colours by mixing the basic colours red, green and blue in different intensities. Each basic colour is associated with 1 channel. Therefore, a pixel of an image can simply be described by indicating the intensity of the three basic colours red, green and blue. One possible RGB value for a pixel is thus (255, 128, 0).

Starting from a scale from 0 to 255, this means: mix red with an intensity of 255 (full on), green with an intensity of 128 (half on) and blue with an intensity of 0 (full off). This scale from 0 to 255 is used in many cases, as it corresponds with 256 states and the 256 states can be represented by exactly one byte of 8 bits (2 8 = 256). As there are 3 bytes per pixel, there are often 24 bits per pixel.

The YUV format also uses three channels for image information: one channel for brightness (Y, luminance) and two channels for the colour (U and V, chrominance). The human eye is more sensitive to changes in brightness than to changes in colour. On

the basis thereof, it is possible to omit parts of the colour information (chrominance) without visible loss of quality.

A pixel in RGB can easily be converted to YUV ([I]). By scaling the resolution of the two chrominance channels UV to a quarter, for example from 720*576 pixels to

360*288 pixels, 50% of the information can be removed without visible loss of quality:

2.2.2 Average In one embodiment, an average value is determined in the following manner. Each pixel is replaced by the average of the pixel itself and the pixel to the right of, the pixel below and the pixel on the bottom right of the pixel if the average does not deviate by more than a predetermined threshold value, for example 8, from the original pixel. If the deviation is greater than the threshold value, an attempt will be made to use only the average of the original pixel and the neighbouring right-hand pixel or the pixel below the pixel, provided this average is not greater than the threshold value 8 either. In this manner, the image is not subject to global blur. In this way, only regions with a similar colour are blurred, which increases the likelihood of two successive frames in these regions having the same value. Establishing the average can and will not be cancelled during the decoding. A possible inversion for this step is the use of a sharpening filter.

The average of pixels can also be calculated by taking, for example, the average values of the eight surrounding pixels.

In addition, it is also possible to use so-called PNG filters (PNG = Portable Network Graphics).

2.2.3 Comparison of successive frames.

One of the possible operations could be a comparison of successive frames to one another, that is to say compare frame n+1 to frame n. The aim of this operation would be to determine, after the average values have been calculated, how great the deviation of pixel values between two successive frames is. After all, if these differences are

within a certain threshold value, it may be advisable to make the pixel values in successive images equal to one another. The threshold value serves to ensure that the loss of quality in the frames remains limited. If the differences are greater than the threshold value, no action is taken.

The result is that quantization produces more identical pixel values, the coding following quantization becomes more identical and thus the differences are more often equal to 0 (no difference).

2.2.4 Quantization

The human eye is not able to detect very small differences in the intensity of colours. A possible reduction in information comprises reducing the number of possible different intensities, for example only 64 stages instead of 256 stages, i.e. a reduction from 8 bits per pixel per channel to 6 bits. This means that several input values will have the same output values, e.g.

The encoding unit 1 may output this quantization as follows. According to the above example:

• the input values 0, 1 , 2 and 3 produce the value 2;

♦ the input values 4, 5, 6 and 7 produce the value 5;

etc.

Subsequently, the values are coded as follows after quantization:

• the value 2 becomes code 0;

• the value 5 becomes code 1;

• etc.

Alternatively, the quantization steps can be set to 5. The result then is that:

• the input values 0, 1 ,2,3 and 4 become 2;

• the input values 5,6,7,8 and 9 become 7;

• etc.

Subsequently, the values are coded as follows after quantization:

• the value 2 code 0 • the value 7 code 1

• the value 12 code 2

• etc.

If this operation is carried out, "pixel values" should be read as "coding values" below.

2.2.5 Storing differences between frames

Since the difference between two successive images ("frames") of a film is not usually that great, it may be advantageous to store only the differences between successive frames instead of the "hard" values of each frame. The following table gives an example thereof for 9 pixels in frame #1 and the same 9 pixels in a next frame #2. For the sake of simplicity, only 1 colour value is given for each pixel.

Example: Frame #1 Frame #2 Differences

This example relates to one of the RGB values, before quantization is carried out. Assuming quantization steps of 5, the example then becomes as follows: Frame #1 Frame #2 Differences

Quantization values Frame #1 Frame #2 Differences 42 37 47 47 37 47 -5 0 0 47 37 47 47 37 47 0 0 0 37 32 22 37 32 22 0 0 0

Coding values

8 7 9 9 7 9 -1 0 0

9 7 9 9 7 9 0 0 0 7 6 4 7 6 4 0 0 0

In this way, relatively few different and relatively small figures are produced which can be coded in the second stage in a more advantageous manner, because they can be stored using a relatively small number of bits. After all, in this case the differences of the codes obtained after quantization are used and not those of the actual RGB values. Consequently, the difference values are even smaller and, in many cases, equal to zero.

2.2.6 Discrete cosinus transformation

Using transformations, such as the "discrete cosinus transformation" (DCT) [2], it is possible to convert image information into the frequency space. The image is then no longer represented by the colour values, but rather by coefficients of basic frequencies. As the human eye is less sensitive to the relatively high frequencies in an image, the

coefficients for the high frequencies can often be completely removed or their values can at least be considerably reduced without great loss of quality.

2.2.7 Other Useful techniques are those which make the contents of the frames "as identical as possible" without compromising the quality, for example noise filters. Other techniques which may be used are Huftnann, Zip, etc.

2.3 Coding image information

After the information has been simplified in the first stage, the data are coded in a space-saving manner in the second stage without loss of information. The input for the second stage, that is to say the output of the first stage, can be retrieved exactly by decoding.

The second stage operates on the basis of several databases at different levels. The information is successively taken to three different databases. This will be explained below with reference to an example. The example relates to one "image", that is to say one frame. For the sake of simplicity, it is always assumed that each pixel is only associated with one colour value. The principle of the invention is explained in relation to this one value per pixel. Obviously, in practice, the principle of the invention is applied to each of the three values (RGB or YUV).

In order to carry out the invention, three databases are defined: a block database BDB, a frame database FDB and a sequence database SDB. As illustrated in Figure 3, these three databases may be situated in the memory 7. However, it is also possible for separate databases to be situated in memories which are located elsewhere. The databases may physically be defined in the same memory chip or on the same hard disk, etc. The algorithm applied is used per channel. Nevertheless, per database BDB, FDB, SDB, only 1 set of data has to be used for all three channels.

2.3.1 Pixel level: the block database BDB. A set of images which, for example, are part of a film is stored in memory 7. Each image, which for example comprises 720x576 pixels, is subdivided into blocks of a

certain dimension, for example blocks of 16x16 pixels. Figure 2 illustrates this diagrammatically. The image in Figure 2 is subdivided into BMN blocks. Each block is converted into a block code. The block code is shorter than the contents of the block itself.

Figure 3 also shows that original images OB are stored in the memory 7 and that the memory 7 has a storage space CB for storing the coded images.

Converting a block into a block code takes place as follows: a) First, the first block B 11 of the image is read out of the storage space OB; b) The first block Bi i is stored in the block database BDB. c) The first block Bi i receives a unique block code Cu, which is stored in the storage space CB; the block code has a fixed relationship with the block. d) A next block Bmn (m = 1, 2, ..., M; n = 1, 2, ...N) is read out of the storage space OB. e) The block Bmn is compared to the contents of the block database BDB; f) If the block Bmn is not known yet, the block is stored in the block database BDB. A new block code CW is assigned to the block. This block code Cmn is stored in the storage space CB. Each block code Cmn has a fixed relationship with an associated block Bmn. If not all blocks Bmn have been read yet, the program returns to operation d). Otherwise, the program continues with operation h). g) If the block Bmn is identical with a known block in block database BDB, the value of the unique block code of the known block is read out and the value of this read-out block code is added to the set of block codes in storage space CB.

If not all blocks Bmn have been read yet, the program returns to operation d). Otherwise, the program continues with operation h). h) In this operation, the coding for the respective level is completed. In this way, each different block is stored only once. The first stage has ensured that 1) a particular block Bmn will probably occur relatively often in the images;

2) the contents of each unique block Bmn are "simple", that is to say that there are generally fewer different blocks than the maximum number which is theoretically possible.

As the contents of each unique block Bmn are relatively simple, the blocks are preferably not simply stored in, for example, memory 7, but rather a compression algorithm is preferably used first, preferably prior to storage. A suitable compression algorithm for this purpose is "deflate " [3].

In this block database BDB, data reduction is then thus achieved by means of two techniques:

1) storing only different blocks

2) storing the unique blocks in a space-saving manner.

2.3.2 Block level: the frame database FDB.

After the previous operation, the image has been converted into a matrix with block codes. It is possible now that a certain combination of block codes occurs relatively often, in other words that areas which are larger than one block occur relatively often. The frame database uses this possibility. The matrix with block codes is encoded recursively until only one code per image (the frame code) remains. To this end, adjacent codes are replaced by so-called "combination codes", while in subsequent operations adjacent combination codes may be replaced by new combination codes. This works in a similar way to the coding of blocks: i) First, a pair of adjacent block codes Cn and C12, also referred to as a "code pair", is selected; j) Store a combination code CCn in the frame database FDB in the memory 7 which refers to the values of the code pair Ci i and C12 in a unique manner; k) Select a pair of new adjacent codes; 1) Compare the combination of this new pair of adjacent codes to all code pairs in the frame database FDB; m) If this combination does not yet exist, then store a new combination code CCij (i = 1, 2, 3, ...; j = 1, 2, 3, ...) with the associated code pair in the frame database FDB. That is to say, the code pair is replaced by the new combination code CCij, as it were. If not all codes of the image have been selected yet, then select a pair of new adjacent codes. Return to operation 1). n) If the combination is already known, then store a combination code CCij in the frame database FDB with the associated code pair. That is to say, replace the

code pair with the combination code CCij of the respective known pair. If not all codes of the images have been selected yet, then select a pair of new adjacent codes. Return to operation 1). o) (Since each pair of codes has been replaced by one combination code, the matrix now contains half the number of codes compared to the beginning.)

Rename, for the purpose of the program: "combination code CCij" → "block code Cij". p) Repeat, recursively, operations i to o for as long as the matrix is greater than

1x1. When carrying out operations i to o, the matrix is preferably traversed horizontally one time and vertically the next time, etc.

It should be noted that the above plan was devised for block codes/combination codes which adjoin one another horizontally. Of course, this is not intended to be limiting. Any arbitrary pair may be the starting point, for example two vertically adjacent block/combination codes.

It should furthermore be noted that the system works if the resolutions are powers of 2, for example 512*512. With other resolutions, "dummy codes" (for example 0) will occasionally have to be added to the end of a row or column.

Traversing the matrix alternately horizontally and vertically may in some cases result in a further appreciable data reduction if, for example, there are no repeating pairs of codes or hardly any in the horizontal direction, but in the vertical direction there are. The invention is not limited to this variant. It is also possible that traversing the matrix according to a different pattern results in an appreciable data reduction. Such a different pattern may be a diagonal or a zig-zag pattern, such as is known per se from MPEG technology.

Example:

Block codes:

1st cycle: matrix traversed horizontally.

(0, 1) Not known; store with new combination code 0

(2, 3) Not known: store with new combination code 1

(0, 3) Not known: store with new combination code 2

(2, 1) Not known; store with new combination code 3

(3, 2) Not known: store with new combination code 4

(0, 1) Known: combination code 0

(1, 2) Not known: store with new combination code 5

(0, 3) Known: combination code 2

2nd cycle: assessing vertical neighbours (0, 2) Not known; store with new combination code 6 (4, 5) Not known; store with a new combination code 7 (1, 3) Not known; store with new combination code 8 (0, 2) Known; combination code 6

After 2nd cycle:

3rd cycle: assessing horizontal neighbours:

(6, 8) Not known; store with new combination code 9

(7, 6) Not known; store with new combination code 10

After 3rd cycle:

10

4th cycle: assessing vertical neighbours:

(9, 10) Not known; store with new combination code 11.

This last combination code is referred to here as the "frame code" for this image and thus equals 11. The contents of the frame database FDB for this image is therefore:

If the original size of the matrix is known, the decoding unit can repair the original codes again later by recursively replacing each code by the two values from the database. This database ensures that even relatively large image parts are only stored once. This database may also use a compression algorithm, such as deflate, in order to store the combination codes in a space-saving manner.

2.3.3 Frame level: sequence database SDB. Since each video image has three (colour) channels, the encoding unit 1 thus receives three frame codes for each frame (image); one frame code per channel. It is reasonable to assume that the combination of these three frame codes is relatively unique. The chance that exactly the same video image will occur again in a film is very slim. The three frame codes per frame are therefore stored linearly in the sequence database SDB. However, a compression algorithm, such as deflate, may be used in this case as well in order to store the frame codes in a space-saving manner.

2.4 Diagram encoding unit 1.

Figure 4 shows a diagrammatic overview of the successive operations that have been explained above.

3 Decoding unit

The decoding of the information in the decoding unit 3 again takes place in two stages. Each stage corresponds to a stage in the encoding unit 1 :

1) Decoding using the block database BDB, frame database FDB and sequence database SDB;

2) Backtransforming to image information.

3.1 Decoding frames

Stage 1 of the decoding unit 3 is the opposite of stage 2 in the encoding unit 1. In this stage, the decoding unit 3 has to have received the three databases BDB, FDB, SDB from the encoding unit 1. The decoding unit 3 reads the three frame codes - one frame code per channel -for each frame from the sequence database SDB. Then, using the frame database FDB, the decoding unit 3 starts to convert the frame codes into block codes. This takes place as follows: the decoding unit 3 starts with a matrix of 1x1 and fills the only field in the matrix with the frame code. The code then uses the decoding unit 3 in order to find the associated pair of codes in the frame database FDB. The original frame code is replaced by this code pair, resulting in an enlargement of the matrix. These steps are repeated until the matrix has reached its original size: one code is always replaced by the code pair in the sequence database. The extension of the matrix again takes place alternately horizontally and vertically.

Example

The frame database FDB comprises, for example, the following information:

The first frame code from the sequence database is 11. This code is placed at the start in a matrix of 1x1. The decoding unit will now "unpack" this code until the matrix reaches a certain size. It is assumed that the decoding unit knows that the matrix eventually has to have a size of 4x4. Initial state

11

The decoding unit checks which code pair is associated with the frame code "11" in the frame database FDB and replaces the frame code by these two codes. The frame code "11" is associated in the frame database FDB with the code pair "9, 10".

After operation 1:

Initial state

11

10

For each of the two codes "9" and "10", the decoding unit 3 again checks the frame database FDB for the corresponding code pairs for these codes and extends the matrix by these pairs. Code "9" is associated with the code pair "6, 8", and code "10" is associated with code pair "7, 6".

After operation 1: After operation 2:

This algorithm is carried out repeatedly, until the matrix has reached the desired size of

4x4:

After operation 3:

After operation 2:

After operation 4:

The codes in this last matrix are now the block codes.

Using the block database BDB, every block code is replaced by the corresponding block in the block database BDB. In this way, it is possible to retrieve precisely the values which once served as input for stage 2 of the encoding unit 1.

3.2 Backtransforming to image information

Starting from the values which the decoding unit 3 has retrieved in stage 1, the image information now has to be restored. To this end, some of the preparatory steps carried out during the coding have to be undone. It is not possible or necessary to undo the change for each preparatory algorithm. A noise filter during stage 1 of the encoding unit 1 for example removes undesired information in order to make coding easier.

However, it is likely that no algorithm which adds the noise again will be used during decoding. A conversion, for example from RGB to YUV, or a DCT can and has to be undone again. In any case, this stage results in an image which, depending on the preparatory steps carried out, does not correspond exactly with the original, but in most cases does resemble the original to a sufficient degree.

4 Reference application

In order to illustrate that and how the method of coding and decoding according to the invention works in practice, a reference application has been devised which implements the main points thereof.

4.1 Input

The program uses MPEG-2 films as input, a common format for both DVDs and digital TV. In order to read the films, the library under GPL (GNU Public Licence) [4] mpeg2lib [5] is used. However, the invention is not limited to MPEG-2 films. Other formats, such as raw-AVI material, may also be used as input.

4.2 Preparatory steps

The reference application uses the following preparatory steps in order to simplify the image information to such a degree that the second stage (coding) can be carried out more efficiently:

4.2.1 Conversion of RGB into YUV Each frame is converted from RGB into YUV. Subsequently, the U- and V-channels are scaled to a quarter, with each output pixel being assigned the average value of four input pixels (4:2:0 chroma subsampling). During decoding, the U/V-channels are scaled up again and the image is backtransformed to RGB.

4.2.2 Average In one embodiment, an average value is determined in the following way. Each pixel is replaced by the average of the pixel itself and the pixel to the right of, the pixel below and the pixel to the bottom right of the pixel if the average does not deviate by more than a predetermined threshold value, for example 8, from the original pixel. If the deviation is greater than the threshold value, an attempt will be made to use only the

average of the original pixel and the neighbouring right-hand pixel or the pixel below the pixel, provided this average is not greater than the threshold value 8 either.

In this manner, the image is not subject to global blur. In this way, only regions with a similar colour are blurred, which increases the likelihood of two successive frames in these regions having the same value. Establishing the average can and will not be cancelled during the decoding. A possible inversion for this step is the use of a sharpening filter.

The average of pixels can also be calculated by taking, for example, the average values of the eight surrounding pixels.

In addition, it is also possible to use so-called PNG filters (PNG = Portable Network Graphics).

4.2.3 Comparison of successive frames.

One of the possible operations could be a comparison of successive frames to one another, that is to say compare frame n+1 to frame n. The aim of this operation would be to determine, after the average values have been calculated, how great the deviation of pixel values between two successive frames is. After all, if these differences are within a certain threshold value, it may be advisable to make the pixel values in successive images equal to one another. The threshold value serves to ensure that the loss of quality in the frames remains limited. If the differences are greater than the threshold value, no action is taken.

The result is that quantization produces more identical pixel values, the coding following quantization is more identical and thus the differences are more often equal to 0 (no difference).

4.2.4 Quantization The reference application uses a simple quantization method. During the quantization operation, successive groups of values are rounded off to one single value and then coded as a single lower value (see 2.2.4).

4.2.4 Differences between frames

The encoding unit 1 codes only the differences between two frames. For this purpose, an offscreen buffer is used which contains the previous frame. Initially, this buffer is completely black. For each new frame, the difference between this offscreen buffer and the new frame is calculated. Only these differences are passed onto the second stage of the coding method as applied by the encoding unit 1. Subsequently, the offscreen buffer is replaced by the current frame.

During coding, an offscreen buffer is used again. For each frame, the decoding unit receives the differences with respect to the previous frame. These differences are applied to the offscreen buffer and a copy of the offscreen buffer is passed to the next step.

4.3 Databases

Currently, the available storage space for storing the databases BDB, FDB and SDB sometimes causes a problem. Since a complete film often requires databases of several hundred MB, it is not always possible to keep the entire databases in the working memory of a computer, both during coding and decoding. It could be an alternative to keep the databases BDB, FDB, SDB on a hard disk, but reading from and writing to a hard disk takes very long (compared to a working memory), as a result of which the method of coding (and decoding) could be slow.

The reference database solves this problem, for example, by using more than one block database BDB, frame database FDB and sequence database SDB. After an adjustable number of frames (standard: 500, but this number can also be chosen to be different), a new set of databases BDB, FDB, SDB is started. In this way, the databases BDB, FDB, SDB remain small enough for the processing operation to take place entirely in the working memory of the computer. This makes the method according to the invention significantly faster.

In one embodiment, the block database BDB and the frame database FDB use hash tables in order to be able to check very quickly whether a certain block/code pair already exists in the respective database. During the coding, so-called "dump threads"

are running in the background which take the contents of the databases to the hard disk. In this case, the database information can be compressed on the fly, for example using the zlib-libτary [6].

In order to play back the film, it is thus necessary to have the correct parts of the databases BDB, FDB, SDB in the memory at the right time. The reference application uses separate loader threads for this purpose. At the same time as the decoding of a current piece of film (sequence) takes place, all the parts of the databases BDB, FDB, SDB of the previous piece of film (sequence) are loaded. At a sequence length of 500 frames, the loader threads thus have 20 seconds to load the next databases (500 frames are shown in (500 frames) / (25 frames per sec) = 20 sec). This time is generally quite sufficient. If it becomes obvious during loading, that a database can be loaded much more quickly than necessary, the loader thread is slowed down on purpose in order not to disrupt the playback of the film in the main thread. This enables smooth playback of the film.

4.4 Technical information

The tested codec is implemented as a Microsoft Windows DLL. The source code is written in C++ and compiled by Microsoft Visual C++ 2005.

The graphic front ends Encodingunit.exe and Decodingunitexe are written in C# and compiled by Microsoft Visual C# 2005. The programs have used .NET 2.0 runtime [7].

During the test, the following third-party libraries were used:

- Mpeg21ib [5] - ZHb [6]

- Stlport [8]

5 Acknowledgement

[1] http://en.wikipedia.org/wiki/YUV [2] http://en.wikipedia.org/wiki/Discrete cosine transform [3] http://en.wikipedia.org/wiki/DEFLATE [4] http://en.wikipedia.org/wiki/GPL [5] https://sourceforge.net/proiects/mpeg21ib

[6] http://www.zlib.net

[7] http://www.microsoft.com/net

[8] http://www.stlport.org