PIXEL INTERPOLATION METHOD AND SYSTEM - FRANCE TELECOM RES & DEV BEIJING COMPANY LTD

Title:

PIXEL INTERPOLATION METHOD AND SYSTEM

Document Type and Number:

WIPO Patent Application WO/2012/000191

Kind Code:

Abstract:

A method of computing a frame to be predicted from a first and a second reference frames, said method comprising for each pixel to be predicted in the frame to be predicted the acts of defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted; computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel, wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted; and computing said pixel to be predicted from said first predicted pixel.

Inventors:

WANG RONGGANG (CN)
ZHANG YONGBING (CN)
YUAN DONG (CN)

Application Number:

PCT/CN2010/074839

Publication Date:

January 05, 2012

Filing Date:

June 30, 2010

Export Citation:

Click for automatic bibliography generation Help

Assignee:

FRANCE TELECOM RES & DEV BEIJING COMPANY LTD (CN)
WANG RONGGANG (CN)
ZHANG YONGBING (CN)
YUAN DONG (CN)

International Classes:

H04N7/36

Foreign References:

CN1897706A	2007-01-17
US20090022220A1	2009-01-22
JP2006324888A	2006-11-30

Attorney, Agent or Firm:

CHENGDU JIUDINGTIANYUAN INTELLECTUAL PROPERTY AGENCY LTD. (Human Art Wisdom Plaza 33 Ximianqiao Street, Wuhou Distric, Chengdu Sichuan 1, CN)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1 . A method of computing a frame to be predicted from a first and a second reference frames, said method comprising for each pixel to be predicted in the frame to be predicted the acts of:

a) defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,

wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,

c) computing said pixel to be predicted from said first predicted pixel.

2. A method according to claim 1 , wherein the similarity function reflects the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.

3. A method according to any of the preceding claims, wherein the similarity function further depends on the distance between each pixel located at position (k,l) in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted located at position (m,n) according to:

l|2

I [N(m,n) -N(k,l)] ®K

p [m,n,k,l] exp , -R≤(m-k, n-l)≤R

Z (m,n) 2σ^ι wherein p [m, n, k, l] represents the contribution probability of pixel located at (k, l) towards the pixel located at (m, n) , ® represents the multiplication of

2 two elements located at the same position between two matrixes and σ is a constant to control the decay of the exponential function, with

N (m,n) represents the grey intensity level of the located at pixel located at position (m,n)

N (k,l) represents the grey intensity level of the pixel located at position (k,l) K(x,y)=^exp ^j((x-m))²+(y-n)²\,-W≤{x-m,y-n)≤W (10) and

[N(m,n)-N(k,l) ®K

Z(m,«) =∑_{( )}exp (11

2σ

C is a normalization constant and K(x, y) is a weight value for each pixel in similarity window, which decreases with the distance from the center of similarity window.

4. A method according to any of the preceding claims, wherein the first combination coefficients a may be computed, for a contribution probability applied to each pixel of a neighborhood of size (2R + l)x(2R + l) in a first reference frame \_2k , by minimizing the equation:

wherein P is the matrix of contribution probabilities for each pixel of the neighborhood of size(2R + l)x(2R + l) , [x_2fe+2 (m,w)] is ^a li^near combination allowing an estimation \_2k+2of the pixel to be predicted using the pixel located at position (m,n) in a second reference frame (2k+2), f_R /_2L[X_2fe (m,w)]-¾(a) is a linear combination a neighborhood of size (4L + l)x(4L + l) allowing an estimation \_2k of the pixel to be predicted using the pixel located at position (m,w) in the first reference frame (2k) and z(a) represents the coefficients for estimating the pixel located at position (m, ) in the second reference frame (2k+2) using a linear combination of pixels in a neighborhood of the first pixel located at (m,w) in the first reference frame(2k).

5. A method according to any of the preceding claims, said method further comprising an act b2) of computing a second predicted pixel as a combination of pixels in a neighborhood of said second pixel using second combination coefficients, said second combination coefficients allowing computing said first pixel as a combination of pixels of said neighborhood of said second pixel, wherein said second combination coefficients are calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said second pixel,

and wherein the act of computing said pixel to be predicted is performed from said first predicted pixel and said second predicted pixel.

6. A method according to claim 5 and 4, wherein the act of computing said pixel to be predicted F_2fe+1 (m, w) located at position (m,n) in the frame to be predicted is performed from said first predicted pixel and said second predicted pixel according to:

(^M> ^N) = {/L [^x2k (m, n)] a + f_L [X_2k+2 ( > ")] P}/²

wherein a and β are the first and second combination coefficients.

7. A method according to any of the preceding claims, wherein the predicted frame is sequentially positioned between the first and the second reference frames.

8. A device for computing a frame to be predicted from a first and a second reference frames, said device comprising, for each pixel to be predicted in the frame to be predicted the acts of:

- a defining unit for defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,

- a predicted pixel computing unit for computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel,

- a pixel to be predicted computing unit for computing said pixel to be predicted from said first predicted pixel.

9. A device according to claim 8, wherein the similarity function reflects the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.

10. A device according†o any of the preceding claims 8†o 9, wherein the similarity function further depends on the distance between each pixel located at position (k,l) in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted located at position (m,n) according to:

ί \\ [^~NN((mm..,nn))--NN((kk.,ll))l®®KK

p[m,n,k,l]■ exp ,-R≤(m-k,n-l)≤R

Z m,n) 2σ^ι wherein p[m,n,k,l] represents the contribution probability of pixel located at (k,l) towards the pixel located at (m,n) , ® represents the multiplication of

2 two elements located at the same position between two matrixes and σ is a constant to control the decay of the exponential function, with

N(m,n) represents the grey intensity level of the located at pixel located at position (m,n)

N(k,l) represents the grey intensity level of the pixel located at position (k,l) K(x,y)=^exp\-J((x-m))²+(y-n)²\,-W≤{x-m,y-n)≤W (10) and

[N(m,n)-N(k,l) ®K

Z(m,«) =∑ exp [n:

2σ

C is a normalization constant and K(x, y) is a weight value for each pixel in similarity window, which decreases with the distance from the center of similarity window.

11. A device according to any of the preceding claims 8 to 10, wherein the first combination coefficients a may be computed, for a contribution probability applied to each pixel of a neighborhood of size (2R + l)x(2R + l) in a first reference frame , by minimizing the equation:

wherein P is the matrix of contribution probabilities for each pixel of the neighborhood of size(2R + l)x(2R + l) , [x_2fe+2 (m,w)] is ^a li^near combination allowing an estimation \_2k+2of the pixel to be predicted using the pixel located at position (m, n) in a second reference frame (2k+2), f_l J2L · «) a linear combination a neighborhood of size (4L + l)x(4L + l) allowing an estimation \_2k of the pixel to be predicted using the pixel located at position (m, w) in the first reference frame (2k) and z (a) represents the coefficients for estimating the pixel located at position (m, ) in the second reference frame (2k+2) using a linear combination of pixels in a neighborhood of the first pixel located at (m, w ) in the first reference frame(2k) .

1 2. A device according to any of the preceding claims 8 to 1 1 , wherein the predicted pixel computing unit is further operable to compute a second predicted pixel as a combination of pixels in a neighborhood of said second pixel using second combination coefficients, said second combination coefficients allowing computing said first pixel as a combination of pixels of said neighborhood of said second pixel,

wherein said second combination coefficients are calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said second pixel,

and the pixel to be predicted computing unit is further operable to compute said pixel to be predicted from said first predicted pixel and said second predicted pixel.

1 3. A device according to claim 12 and 1 1 , wherein the pixel to be predicted computing unit is further operable to compute said pixel to be predicted Y_lk+X (m, n) located at position (m,n) in the frame to be predicted is performed from said first predicted pixel and said second predicted pixel according to:

(^M> ^N ) = { /L [^x2k (m, n )] a + f_L (m, n)] p}/2

wherein a and β are the first and second combination coefficients. 14. A system for computing a predicted frame from a first and a second reference frames, said system comprising:

- an obtaining unit for obtaining said first and second reference frames, and,

- a device comprising, for each pixel to be predicted in the predicted frame: - a defining unit for defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,

- a pixel to be predicted computing unit for computing said pixel to be predicted from said first predicted pixel.

15. A computer-readable medium having computer-executable instructions to enable a computer system to perform the method of any one of claims 1 to 7.

Description:

PIXEL INTERPOLATION METHOD AND SYSTEM

Field of the Invention

The present invention relates in general to media content and more specifically to video coding.

Background of the Invention

Prediction is a statistical estimation procedure where one or more random variables are estimated from observations of other random variables. It is called prediction when the variables to be estimated are in some sense associated with the "future" and the observable variables are associated with the "past" (or past and future) . One of the simplest, prevalent prediction techniques is linear prediction. Linear prediction is intuitively satisfying since it is powerful and yet fairly simple to understand and use. Much of linear prediction consists of predicting one vector given another. The most common use of prediction is to estimate a sample of a stationary random process from observations of several prior samples.

A video comprises a plurality of successive images or frames. A frame comprises pixels which may further me divided in sub-pixels. A frame may be divided into blocks of pixels. An application of prediction is in image/video compression where a block of pixels is estimated from an observed "prior" block of pixels in a block raster scan of the image or in a forward or backward reference image. In an existing solution, each predicted frame is divided into non- overlapped blocks, and a motion vector derived for each block by performing the existing technique known as Motion Estimation (ME) in a reference frame. Motion Estimation allows deriving a motion vector between two frames. The motion vector is a vector starting from one point of a first frame and pointing toward the corresponding point in a second frame (e.g. the next frame) along the motion trajectory (i.e. in the direction) of said point from the first frame to the second frame. After that, each block may be predicted using a known technique called Motion Compensation (MC) . Motion Compensation allows obtaining the corresponding block in the reference frame pointed by the derived motion vectors. By doing this, it can help to eliminate redundancy so that there is less waste during the compression process. Consequently, fewer bits are needed to describe the residual obtained during the compression process. However, linear prediction is actually not an accurate prediction method as the MC technique is based on the assumption that an object represented by pixels is performing translational motion, which is not always true. Besides, for non-Gaussian processes, linear prediction cannot fully squeeze out all possible information about the past or the future that will help to predict the current frame.

Patent application WO201004991 7 discloses a method for predicting pixels in a frame using backward and forward frames. A drawback of this method is that each pixel is interpolated as the weighted summation of a sample space including the pixels within its two temporal neighborhoods from the previous and following original frames as well as the available interpolated pixels within its spatial neighborhood in the current to-be-interpolated frame. This method makes difficult to capture the local image structure in case of acute motion regions, since the temporal neighborhood is centered at the collocated pixel. The reference data of filter training process in patent application WO201004991 7 considers only processing pixels by block using the motion vector of the block and a constant filter weights for pixels located in the same block. Consequently, when there is complex motion associated with the pixel to be predicted, the motion of collocated pixels in a temporal neighborhood tends to be different than the one of the current pixel to be interpolated (i.e. predicted), involving thus degraded performances of the prediction.

Today there is no solution to efficiently predict pixels in a video frame that allow optimizing the quality of the image and thus improving efficiency of such video coding and compression systems.

Today there is a need for a pixel prediction solution that can be easily implemented on the existing communication infrastructures.

Summary of Invention

It is an object of the present system to overcome disadvantages and/or make improvement over the prior art.

To that extend, the invention proposes a method of computing a frame to be predicted from a first and a second reference frames, said method comprising for each pixel to be predicted in the frame to be predicted the acts of:

a) defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,

bl ) computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel, wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,

c) computing said pixel to be predicted from said first predicted pixel. An advantage of the method according to the invention is that, as each coefficient of the first combination coefficients is calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said first pixel, each coefficient may be different and reflects the similarity of each pixel in the neighborhood with the pixel to be predicted, allowing thus optimizing the accuracy of the prediction for each pixel. Another advantage of the method according to the invention is that the prediction is based on the motion trajectory for each pixel, achieving thus achieve more robust performances.

The invention also proposes a method according to claim 2. An advantage is that, as the exact difference between each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted is not known prior to any estimation, using the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value allows avoiding having a different texture structure between each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted compared with the pixel to be predicted, allowing thus achieving better results.

The invention also proposes a method according to claim 3. An advantage is that the similarity function considers both distance and similarity between pixels, allowing thus determining an accurate probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.

The invention also proposes a method according to claim 4. An advantage is that it allows practically deriving the first combination coefficients using a mean square error method.

The invention also proposes a method according to claim 5. An advantage is that a better result may be achieved using second combination coefficients as the results of forward and backward estimations of the pixel to be predicted may be averaged.

The invention also proposes a method according to claim 6. An advantage is that it allows practically predicting the pixel to be predicted by averaging the results of forward and backward estimations of the pixel to be predicted. The invention also proposes a method according to claim 7. An advantage is that the estimation of a pixel in a frame sequentially positioned between the first and second reference frames is based on the proximity of these the first and second reference frames, improving thus the accuracy of the method .

The invention also proposes a device according to claim 8.

The invention also proposes a system according to claim 14.

The invention also proposes a readable computer program according to claim 1 5. Brief Description of the Drawings

Embodiments of the present invention will now be described solely by way of example and only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which :

Figure 1 A schematically illustrates forward prediction according to an embodiment of the present invention;

Figure I B schematically illustrates backward prediction according to an embodiment of the present invention;

Figure 2 schematically illustrates forward prediction for an intermediate frame to be predicted according to an embodiment of the present invention;

Figure 3 schematically illustrates a method according to an embodiment of the present invention;

Figure 4 schematically illustrates a method according to an embodiment of the present invention;

Figure 5 schematically illustrates a process of forward prediction according to an embodiment of the present invention;

Figure 6 schematically illustrates a flowchart according to an embodiment of the present invention;

Figure 7 schematically illustrates a flowchart according to an embodiment of the present invention.

Description of Preferred Embodiments

The following are descriptions of exemplary embodiments that when taken in conjunction with the drawings will demonstrate the above noted features and advantages, and introduce further ones. In the following description, for purposes of explanation rather than limitation, specific details are set forth such as architecture, interfaces, techniques, devices etc., for illustration. However, it will be apparent to those of ordinary skill in the art that other embodiments that depart from these details would still be understood to be within the scope of the appended claims. Moreover, for the purpose of clarify, detailed descriptions of well-known devices, systems, and methods are omitted so as not to obscure the description of the present system. Furthermore, entities for video coding/decoding or compression/decompression are not detailed as their implementation is beyond the scope of the present system and method. Unless specified otherwise, the exemplary embodiment will be described hereafter in its application to a video coder. In addition, it should be expressly understood that the drawings are included for illustrative purposes and do not represent the scope of the present system.

The method according to an embodiment of the invention proposes in particular a model for predicting an image (i.e. called predicted or current image/frame) based on observations made in previous and following frames. In the method according to the invention, the prediction may be performed on each pixel. By extension, an image or frame may be assimilated to a matrix or block of pixels.

The method according to an embodiment of the invention is suitable for predicting frames in a sequence or stream of frames and allows in particular predicting a frame between any two first and second reference frames in the stream of frames, such as for example the forward and backward adjacent frames.

The method according to an embodiment of the invention allows predicting pixels in a frame to be predicted given a first and a second reference frames.

An auto-regressive (AR) model allows predicting an image pixel based on the corresponding forward and/or backward observations in forward and/or backward frames. In an illustrative embodiment of the method according to the invention, each pixel in the frame to be predicted may be predicted as the result generated by a Forward Auto-Regressive (FAR) model (or Backward Auto- Regressive (BAR) model) . In an illustrative embodiment of the method according to the invention, each pixel in the frame to be predicted may be interpolated as the average of the results generated by one FAR model and one BAR model using a first and as second reference frame. In the FAR model, each pixel in the predicted image may be first approximated as a linear combination of pixels within a spatial neighborhood along the forward motion trajectory in the forward reference frame, and then the pixels in the backward reference frame along the backward motion trajectory may be approximated as the linear combination of pixels in the frame to be predicted. Since adjacent frames are of high redundancy, the AR coefficients to may be to be the same. Consequently, the pixels in the backward reference frame may be approximated as a combination of pixels within the forward reference frame. The BAR may be performed likewise except that it is operated in the reverse direction. This invention gives the solution of the optimum FAR and BAR coefficients.

Figure 1 A describes an illustrative embodiment of forward interpolation prediction of a pixel 1 15 in a frame (2k+ l ) from a pixel 1 10 in a frame (2k) along the motion trajectory materialized by the corresponding motion vector in a sequence of frames of a video flow.

Figure I B describes an illustrative embodiment of backward interpolation prediction of a pixel 1 1 7 in a frame (2k+ l ) from a pixel 120 in a frame (2k+2) along the motion trajectory materialized by the corresponding backward motion vector in a sequence of frames of a video flow.

In the illustrative embodiments described here under, the frame to be predicted is sequentially positioned between the first and the second reference frames, which is in no way limiting of the scope of the present invention.

In an illustrative embodiment of the method according to the invention, a frame to be predicted is computed from a first and a second reference frames, and, for each pixel to be predicted in the frame to be predicted:

- an act 210 allows defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,

- an act 220 allows computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel,

- an act 240 allows computing said pixel to be predicted from said first predicted pixel.

In an illustrative embodiment of the method according to the invention, an act 230 further allows computing a second predicted pixel as a combination of pixels in a neighborhood of said second pixel using second combination coefficients, said second combination coefficients allowing computing said first pixel as a combination of pixels of said neighborhood of said second pixel,

wherein said second combination coefficients are calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said second pixel, and wherein the act of computing said pixel to be predicted is performed from said first predicted pixel and said second predicted pixel. This further allows for example averaging the results of the forward and backward estimations for computing the pixel to be predicted using the first and second combination coefficients as described here under on reference to Equation (21 ) .

In an illustrative embodiment of the method according to the invention:

- the act 220 may comprise computing a first coefficient vector allowing the estimation of the second pixel from the first pixel,

- the act 240 may comprise computing the pixel to be predicted using said first coefficient vector and a similarity function applied onto pixels in a neighborhood of the first pixel in the first reference frame.

- an act 230 may comprise computing a second coefficient vector allowing the estimation from the first pixel of the second pixel, the act 240 further comprising using said second coefficient matrix and pixels in a neighborhood of the second pixel in the second reference frame for estimating the first pixel. This further allows for example averaging the pixel to be predicted using the first and second coefficient vectors as described here under on reference to Equation (21 ) .

In an illustrative embodiment of the method according to the invention, as described in Figure 1 A, each pixel within frame (2k+ l ) may be approximated as a linear combination of the corresponding pixels within a spatial neighborhood in the previous frame (2k) along the forward motion trajectory. X _lk (m, n) represents the pixel value located at (m, w) for the forward observation in the frame to be predicted (2k+ l ) . Then using a FAR model, the predicted pixel may be interpolated a

^X2k+l { ^{m n}) =

where (ν^,ν^ ) represents the corresponding forward motion vector with sub-pixel accuracy from frame (2k) to (2k+ l ) , ...J is the floor function, which maps and v _fy to the next full-pixel position, _i . is the Forward AR coefficient, and «2fe+i ' ^s t ^{n e} additive Gaussian while noise. Here / is defined to be the radius of the AR filter, and thus the size of the AR filter is (2/ + l)x(2/ + l) .

Defining / _L [»] as an operator that extracts a patch of (2L + l) x(2L + l) pixels, the expression / _L[x(m, n)] results with a vector of length (2L + 1) ² being the extracted patch, which is centered at pixel X (m, n) . Thus, Equation ( 1 ) may also be expressed as:

½+i (m, n) = f _L [X _2k ( , « )] ^■ a + n _2k+1 (2) where (m, w) represents the forward motion aligned pixel coordinate (i.e. position) and a = ^a_ _L _ _L, a_ _L _ _L+i, ..., a _{L L} ^j is the first coefficient vector (here e.g. the FAR weight vector) .

The Forward AR or first coefficient vector a should be chosen to be the "best" in some sense. The most common measure of performance of a predictor, i.e. the Mean Squared Error (MSE) may be used:

l2k+l (m,n) = E fit [ ^Y2k+l {m,n) -f _R f _2k+l (m,n)

= E fit [¾k+i ( ^m> ")] -/* [¾t+i ( ^m>«)] fR [ ^Y2k+i ( ^m> »)] -//? [¾t+i ( ^m> ")

(3)

where | · | denotes the L2 norm, with R≥L . The MSE as a performance criterion may be viewed as a measure of how much the energy of the signal is reduced by removing the predictable information based on the observables from it. Since the goal of a predictor is to remove this predictable information, a better predictor corresponds to a smaller MSE. However, such a use would not work, since the actual value of X _2A+1 is not necessarily available (e.g . at a receiver side in a communication network if the video comprising frame is sent compressed by a transmitter and the method according to the invention is performed to predict frames on the receiver) . The method according to the invention allows thus continuing the approximation of pixels in the temporal axis along the motion trajectory, as shown in Figure 4.

Figure 3 describes the motion trajectory of a pixel from one frame (2k) to another frame (2k+2) in a sequence of frames of a video flow. Assuming a first pixel in a frame (2k) , then, the corresponding pixel, in the frame to be predicted (2k+ l ) , along the motion trajectory (defined by its associated motion vector) would be the pixel 210. Similarly, the corresponding pixel, in the following frame (2k+2) of the frame (2k+ l ) , along the same motion trajectory (defined by its associated motion vector) would be the pixel 220. In reference †o Figure 3, assuming the actual value for a pixel to be predicted has been obtained in frame(2k+l), the same FAR coefficients may be used to approximate (i.e. predict) the pixels within frame (2k+2) along the motion trajectory. The assumption that the FAR coefficients between frames (2k) to (2k+l) are the same as those between frames (2k+l) to (2k+2) is quite reasonable since there is a high redundancy between pixels along the motion trajectory from frame (2k) †o(2k+2). Based on such an assumption, pixels in frame (2k+2) may be approximated as:

^X2k+2 (m,n) = f _L [ k+i (Ή, ")] · a + n _2k+2 (4) where (ιη,ίϊ) represents the backward motion aligned pixel coordinates within ^2k+2 ■ fh ^2k+ii ^m _^ ⁿ) represents the patch extraction from the frame to-be-predicted (i.e. to-be-interpolated) (2k+l), whose element is computed according to Equation (2), and a is the same as in Equation (2). Incorporating (2) into (4) allows obtaining the approximation of X _2k+2 utilizing the corresponding forward motion aligned pixels as:

^X2k+2 (m,n) = f _L [¾*+ι (m, «)] · a + n _2k+2

= fh [/L [ ^x2k {m,n) ^■ a + n _2k+1 ] · a + n _2k+2 (5)

= h [ ^x2k (m,n) ^■ h (a) + n _2k+2

Where:

* fiL (f"'")] represents extracting a (4L + l)x(4L+l) patch, as shown as the circles in Figure 4, which is centered at pixel 410 X _2k (m,n) ;

* h(a) represents the new weight vector corresponding to the enlarged patch of sized (4L + l)x(4L+l) .

In reference to Figure 4, each backward motion aligned pixel in the following frame 2k + 2 may be estimated as a weighted summation of the pixels within an enlarged patch in the previous frame 2k. It should be noted that each element of h(a) is the quadratic of the elements of a. In other words, h(a) may be expressed in a two dimensional form as:

A ^a)= ∑ i,j- p,q ( ⁶)

i+p=m

j+q=n with -L≤i, j≤L f] -L≤ p, q≤L and -2L≤m, n≤2L .

Using Equations (5) and (6) , the weight vector a may be computed by minimizing the following MSE:

Il2

£ ^" («) = /R [ ^X2k+2 (m, n)] - f _R f _2L [X ₂k (m, n)] - h (a) (7) where r (a) = f _R [X _2k+2 (m, - f _R [f _2L [X _2k { , n )] - h (a) is defined to be the residual vector, with R≥L .

The act 220 (in reference to Figure 2) allows computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel, wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,

In an illustrative embodiment of the method according to the invention, the similarity function reflects the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.

In an illustrative embodiment of the method according to the invention, the similarity function may further depends, as described here under in Figure 5, on the distance between each pixel located at position (k,l) in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted located at position (m,n) .

Figure 5 describes an illustrative embodiment of the method according to the invention, wherein the determination of each sample (i.e. pixel) contribution within a patch sized of (2R + l)x(2R + l) (i.e. neighborhood of the first or second pixel) allows determining the quality of the pixel to be predicted . In other words, incorporating the contribution probability of each sample within the training window (2R + l)x (2R + l) into the minimization problem of equation (7) allows avoiding the training sample having a different texture structure compared with the to-be-interpolated pixel, and thus can achieve better results. For example, it may be desirable to give a higher contribution probability to the sample which is very similar to the target sample while a smaller contribution probability may be set to the sample which is quite different to the target sample. Each pixel or sample in the combination of pixels in a neighborhood of the first pixel may be assigned a weight that reflects the probability that this pixel and the pixel to be interpolated have the same value.

In an illustrative embodiment of the method according to the invention, the similarity function may be based on the geometric proximity or the intensity grey level of the pixel to be predicted and of the pixels in a neighborhood of the first or second pixel. For instance, the similarity between two pixels X(m,n) and

X(k,l) may depend on the similarity of geometric proximity as well as the intensity gray level N X (m,n)) and N X (k,l)) , where N x(i,j)) denotes a square neighborhood of fixed size ( (2W +l)x(2W + l) ) and centered at pixel

X(i,j). This similarity may be derived, in an act 510, as a decreasing function of the weighted Euclidean distance between N x (m,n)) and N(X (&,/)) as well as the geometric proximity, namely:

[N(m,n)-N(k,l)]®K

p[m,n,k,l] exp ,-R≤(m-k,n-l)≤R (8)

Z m,n) 2σ ^ι where p[m,n,k,l] represents the contribution probability of pixel located at (k,l) towards the pixel located at (m,n) , ® represents the multiplication of

-2 · two elements located at the same position between two matrixes and σ is a constant to control the decay of the exponential function, with

N{m,n)=-[f _w [X _2k (m,n)] + f _w [X _2k+2 (9)

and

[N(m,n)-N(k,l) ®K

Z(m,«) =∑ _{( )}exp (11

2σ

C is a normalization constant and K(x, y) is a weight value for each pixel in similarity window, which decreases with the distance from the center of similarity window.

For the current to-be-interpolated (i.e. to-be-predicted) pixel F _2fe+1 (m,n) , the contribution probability of each sample within the training window (sized of (2R + l)x(2R + l) ) may be computed, in an act 520, as the minimization equation in Equation (7) may be modified as:

^(«) = ||{^[¾ ₊2(^«)]-^[ 2L¾J^«)]^(«)]}®P|| =||r(a)®P|| ² (12) where r (a) = f _R [X _2k+2 - f _R [f _2L [X _2k (m,n) - ¾(a)] is defined to be the residual vector. It is noted that the element of P in Equation (12) is computed according to Equation (8). Incorporating the contribution probability of each sample within the training window into the minimization process allows avoiding the training sample having a different texture structure compared with the to-be-predicted pixel, and thus allowing achieve better results than existing methods.

The first combination coefficients a may then be computed as follows: assume a to be the approximation of a, r(a) may thus be expressed around a in Taylor series as:

r(a) ^»r(a)+/(a)(a-a) (13)

where J(a) is the Jacobian matrix of r(a) at a .

Using r (a) = 0 , a may then be, in an act 530, computed as:

According to the Newton iteration algorithm, in order to optimize the first combination coefficients ¹ , if the weight vector a ¹ has been obtained after the i ^th iteration, then the weight vector in the (i+1 ) ^th iteration may be updated as: a ⁱ⁺¹ = a' - (/' (a* ^' ) J (a* ^' J' (a* ^' ) r (a* ^' ) (15).

The iterations may stop after a predetermined number of iterations or when a ¹ has converges to a predefined extent, allowing deriving a in an act 540.

In act 240, the pixel to be predicted is computed from said first predicted pixel using a .

Figure 6 describes an illustrative embodiment of iterative process for deriving and optimizing the first combination coefficients (or auto-regressive filter coefficients) according to the invention. An initial a ⁰ may be computed at first in act 600. To initialize a ⁰ , initial pixels for initiating the iterative process of pixel prediction may be obtained by using interpolation results of various known methods, such as e.g. Motion Compensation Interpolation (MCI), Overlapped Block Motion Compensation (OBMC), Adaptive OBMC (AOBMC), and Spatio- Temporal Auto-Regressive(STAR) .

Then the corresponding pixels may be approximated in frame 2k + 1 and 2k + 2 as: )

The initial a may be computed according to:

[18)

After that r l a ) and / ( a ) may be computed in act 605. Then the iteration counter is set to zero, and the iteration process may start. It may be first evaluated whether the iteration has converged or the iteration counter is larger than a preset maximum number of iterations, in act 610. If the answer is yes, the iteration process is over, and the algorithm for predicting a pixel may be carried out on another pixel. Otherwise, in act 615, Δα' may be computed according to Δα' = (/' (α* ^' ) J (α* ^' J' (a* ^' ) r (a* ^' ) ( 19)

And then, in act 620, a ^l+1 may be updated as:

α' ⁺¹ = α' + Δα' (20)

And r (a' ⁺¹ ) and J (a' ⁺¹ ) are updated, which is shown as module 625.

Then the iteration counter is increased by one, and moves to the next iteration process.

Finally, the first combination coefficients a are derived, allowing the prediction or estimation of the pixel to be predicted.

The similar iterative algorithm may be performed to derive Backward Auto- Regressive (BAR) coefficients (i.e. deriving BAR coefficients by using, in a symmetric manner, the second pixel in the second reference frame, a second combination coefficients and a similarity function to estimate the first pixel in the first reference frame) . If we assume the derived Forward Aufo-Regressive (FAR) and BAR coefficients are a and β , respectively, then the final predicted pixel value at time instance 2k + l may be computed as:

¾ ₊₁ (m, _W) = {/ _L [X _2fe (m, ¾)] a + / _L [X _2fe+2 (m, «)] p}/2 (21 )

The method according to the invention may be used in various video streaming related applications, such as I PTV, Mobile TV and Digital Storage etc...

Various standard video sequences, with varying sizes (QCI F, CI F) , have been tested. To compare the invented method with other prediction methods, every other frame from test sequences is skipped and interpolated by invented method as well as other methods. The interpolated frames are then compared with the accurate ones within original sequences. The Peak-Signal-Noise-Ratio (PSN R) is used as the criterion to evaluate the performance of all the methods. The anchor methods are motion compensation interpolation (MCI) , overlapped block motion compensation (OBMC) method, adaptive overlapped block motion compensation (AOBMC) , and spatio-temporal auto-regressive (STAR) model. The MCI method interpolates one block directly by averaging the motion compensated blocks pointed by the forward and backward motion vectors. The OBMC method interpolates one block as the weighted sum of the blocks motion compensated by the motion vectors of neighboring blocks, and the weights are determined by the relative positions. AOBMC is similar to OBMC. The only difference is that the weights are determined by the reliability of neighboring blocks. STAR method interpolates one pixel as the linear combination of pixels within the previous frame, the following frame, and the available pixels in the current frame. The performance results are given in Table 1 . It is objectively observed that the method according to the invention outperforms MCI, OBMC, AOBMC and STAR methods in terms of PSN R values.

Table 1 Performance of various prediction methods

Previous Patent: METHOD AND APPARATUS FOR ALLOCATING IDENTIFIER TO NETWORK ENTITY

Next Patent: MIMO CONFIGURATION PROCESSING METHOD AND SYSTEM