Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IMAGE STABILIZATION METHOD
Document Type and Number:
WIPO Patent Application WO/2007/124360
Kind Code:
A2
Abstract:
In one embodiment, a method for reducing motion artifacts in an output image is provided (Fig. 2). The method comprises capturing multiple frames of a scene (14); determining local motion vectors between each pixel in a current frame and a corresponding pixel in the a previous frame (22); and performing a temporal filtering operation based on the local motion vectors wherein pixels from a plurality frames are integrated to form the output image (24).

Inventors:
HONG CHEN (US)
WONG PING WAH (US)
Application Number:
PCT/US2007/066967
Publication Date:
November 01, 2007
Filing Date:
April 19, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NETHRA IMAGING INC (US)
HONG CHEN (US)
WONG PING WAH (US)
International Classes:
H04N11/02; H04B1/66
Foreign References:
US5473384A
US6665007B1
US5970180A
US6734902B1
Attorney, Agent or Firm:
MOODLEY, Vani (Suite 180Mountain View, California, US)
Download PDF:
Claims:
CLAIMS:

What is claimed is:

1. A method for reducing motion artifacts in an output image, comprising; capturing muitipie frames of a scene; determining iocai motion vectors between each pixei in a current frame and a corresponding pixel in a previous frame; and performing a temporal filtering operation based on the local motion vectors wherein pixeis from a plurality of frames are integrated to form the output image.

2. The method of ciaim 1 , wherein an exposure time for each of the muitipie frames is between 1/250 to 1/2000 seconds.

3. The method of ciaim 1 , wherein an effective exposure time for the output image is longer than for each of the muitipie frames.

4. The method of claim 1 , wherein determining the local motion vectors comprises for each pixel iocation (m,n) in the current frame, defining a block of pixels centered at (m,n) in the current frame and finding a block in the previous frame that is the ciosest match to the biock of pixeis centered at (m,n) in the current frame.

5. The method of ciaim 1 , wherein in determining the closes! match, luminance values for the pixeis in each biock are compared,

6. The method of claim 4, further comprising applying a smoothing filter to the current and previous frames prior to caicuiating the ioca! motion vectors.

7. The method of ciaim 4, wherein the smoothing filter comprises a spatial low pass filter.

8. The method of ciaim 1 , wherein the temporai filtering operation comprises applying a finite impulse response filter.

9. The method of claim 1 , wherein the temporal filtering operation comprises applying an infinite impulse response fiiter,

10 The method of claim 4, wherein performing the temporal filtering operation comprises selectively adjusting for pixel motion between a current and a previous frame.

11. The method of claim 10, wherein selectively adjusting for pixel motion comprises rejecting a motion vector if a degree of similarity

between the closest matching block in the previous frame to the block of pixels in the current frame centered on (rn.n) exceeds a predefined block rejection threshold.

12. The method of claim 10, wherein selectively adjusting for pixel motion comprises applying a motion vector of zero if a degree of similarity between the closest matching block in the previous frame to the block of pixels in the current frame centered on (m,n) is below a predefined block noise threshold.

13. An image processor, comprising: an image buffer to store image data for a captured image; and image stabilization logic to reduce motion artifacts in an output image, wherein the image stabilization logic captures multiple frames of a scene; determines local motion vectors between each pixel a current frame and a corresponding pixel in a previous frame; and performs a temporal filtering operation based on the local motion vectors wherein pixels from a plurality frames are integrated to form the output image.

14. The image processor of claim 13, wherein the frames are captured with an exposure time of 1/250 to 1/2000 seconds.

15, The image processor of claim 13. wherein determining the local motion vectors comprises for each pixel location (m.n) in the current frame, defining a biock of pixels centered at (ran) in the current frame and finding a block in the previous frame that is the closest match to the block of pixels centered at (m.n) in the current frame.

16, The image processor of claim 13, wherein the temporal filtering comprises applying either a finite impulse response filter or an infinite impulse response filter.

17, A camera system, comprising: camera optics; an image sensor positioned so that light passing through the camera optics impinges on the image sensor; and an image processor coupled to the image sensor to receive image data for a captured image therefrom, wherein the image processor comprises image stabilization logic to perform a method for reducing motion artifacts in an output image, comprising: capturing multiple frames of a scene; determining local motion vectors between each pixel in a current frame and a corresponding pixel in the a previous frame; and

performing a temporal filtering operation based on the local motion vectors wherein pixels from a plurality of frames are integrated to form the output image.

18. The image processor of claim 17, wherein an exposure time for each of the multiple frames is between 1/250 to 1/2000 seconds.

19. The image processor of claim 17, wherein determining the local motion vectors comprises for each pixel location (m,n) in the current frame, defining a block of pixels centered at (m,n) in the current frame and finding a block in the previous frame that is the closest match to the block of pixels centered at (m,n) in the current frame.

20 The image processor of claim 17. wherein the temporal filtering operation comprises either a finite impulse response filler or an infinite impulse response filter.

Description:

IMAGE STABILIZATION METHOD FIELD

Embodiments of the invention relate to a method for stabilizing a captured image.

BACKGROUND

A common problem of real time image capturing systems (hereinafter referred to as "imaging systems") is that images captured by such systems may contain motion artifacts ύue to movement of the image capturing device, or by movement of objects in a scene that is being captured. Both types of movements generally result in blurring of captured images, in order to produce high quality crisp still images, motion artifacts must be minimized.

Consider an imaging system as shown in Figure 1 that supports both real time video and still image capture, i.e., a system that can process and send out either a single frame or multiple frames of images in real time. In such a system, image processing algorithms can be designed to process either a single frame or multiple frames. An advantage of such a system over a still image processing system is that image processing algorithms can take advantage of the correlation in adjacent frames so that better output quality can be produced.

US Patent 5,629,988 describes a method in video stabilization. Specifically, the method estimates a global motion vector between a captured image and a reference image, determines a transformation parameter based on the global motion vector and applies the transformation to the captured image, US Patent 6,654,049 suggests using color vaiues as a means to determine motion vector. US Patent 6,809,758 improves on the accuracy of gSoba! motion vector determination using a global motion vector histogram that is constructed from information in multiple frames. Since muitipie frames are considered, the motion vector from frame to frame can foilow a smooth trajectory and the result is improved.

The above methods rely on a global motion vector to correct for an image frame and wouid work well in video capture by reducing camera shake artifacts. However, these methods do not reduce motion artifacts due to object motion in a scene as object motion in a scene typicaiiy affects on Sy a portion of the scene, and hence applying a giobal motion vector to the entire frame is generally not effective in such case.

SUMMARY

In one embodiment, the invention provides a method for reducing motion artifacts in an output image. The method comprises capturing multiple frames of a scene; determining iocal motion vectors between each pixei in a current frame and a corresponding pixel in a previous frame; and performing a temporal filtering operation based on the local motion vectors wherein pixels from the multiple frames are integrated to form the output image.

Other aspects of the invention wiil be apparent from the detailed description below.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described, by way of example, with reference to the accompanying diagrammatic drawings, in which:

Figure 1 illustrates a real-time imaging system;

Figures 2 and 4 show a high-levei block diagram of an imaging system, in accordance with embodiments of the invention; and

Figure 3 shows a flowchart for a method to stabilize an image, in accordance wrth one embodiment of the invention.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention, ft will be apparent, however, to one skilled in the art, that the invention may be practiced without these specific details, in other instances, structures and devices are shown at biock diagram form only in order to avoid obscuring the invention.

Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.

Broadly, embodiments of the present invention disclose a stiil image capturing method that minimizes motion artifacts caused by camera motion and by object motion in a scene. The method uses muitipie frames captured by a reai time imaging system, wherein the exposure time of each frame is set to a short duration to avoid blurring within each frame. Since a short exposure time is

used, each individual frame can be noisy. The method constructs a fina! εtil! image using multipie frames so that the effective exposure of the constructed image is substantially longer than that of each frame. The construction step determines loca! motion vectors at every psxel location, and then uses a filter to 5 perform integration white taking the local motion vectors into account. As a result, the method can minimize motion artifacts caused by either camera motion or object motion in the scene.

Embodiments of the present invention aiso cover an image processor which includes logic to perform the image capturing method. An imaging system H) which includes such an image processor is also covered.

Turning now to Figure 2 of the drawings, there is shown a high-level block diagram of an imaging or camera system in the form of a still image stabilization system 10, The system 10 includes camera optics 12 coupled to an image sensor 14. Operation of the image sensor 14 is controlled by an exposure time ! 5 control circuit 16. The image sensor 14 is coupled to an image processor 18. The image processor 18 includes a smoothing block 20, a local motion detection block 22 t a filtering biock 24, a frame buffer 26 for data from previous frames, and a line buffer 28 for a current frame. The system 10 captures multiple frames and uses them to construct an output frame.

0 The image processor 18 performs a method for reducing motion artifacts in an output image. The method is illustrated by the flowchart of Figure 3. Referring to Figure 3 it will be seen that at block 30 multiple (input) frames of a scene are captured. At block 32 loca! motion vectors between each pixel in a

current frame and a corresponding pixel in a previous frame are determined. Finally, at block 34 a temporal filtering operation based on the local motion vectors is performed. The temporal filtering operation includes integrating pixels from the multiple frames to form the output image.

In one embodiment, to minimize motion artifact in each input frame, an exposure time for each input frame is set to a short duration.

Because a relatively short exposure time is used for each input frame, each input frame is relatively noisy. Typical sources of noise can include image sensor noise, processing errors, compression distortions, environment perturbations, etc. Generally the lower the exposure time, the lower the number of photons that reach the sensor, and hence the captured image is noisier,

In one embodiment the temporal filtering operation includes applying a temporal low pass filter to remove noise in the images. Specifically, the output pixel at each location (m, n) is obtained by averaging pixels of successful frames

at the same location (m, n). In other words, the output image yX, at time ft can

be written as

where ^l is pixel at the (m, n) location of the / th frame, and h, is a sequence of

weights satisfying

A temporal low pass filtering algorithm is superior compared Io spatial low pass filtering within each frame because temporal filtering can avoid blurring of images caused by spatial averaging. One of ordinary skill in the art will appreciate that the application of temporal filtering effectively increases the equivalent exposure time of the output image.

The filter in (1 ) is known as a finite impulse response (FSR) filter in digital signal processing, it is also possible to perform temporal filtering using an infinite impulse response (IiR) filter given by

where α is a constant between 0 and 1. An advantage of (3) compared to (1) is

that only one previous output frame will need to be stored in the case of (3), where as buffering of /V-1 previous input frames is necessary in (1). To implement (3), an embodiment as shown in Figure 4 may be used, as will be described later

Although motion artifacts within each frame are minimized, motion of either the camera or the object in a scene can result in substantia! differences from frame to frame. This means both (1 ) and (3) will cause the output image to be blurred because pixels representing different regions of the image or different objects in the image are blended together. As a result, in one embodiment, a local motion detection step is performed by the block 22 in Figure 2.

In one embodiment, local motion detection is performed by comparing the current input frame with a stored previous input frame. For the embodiment shown in Figure 2 of the drawings motion detection is performed using the current input frame and a previous input frame. Figure 4 of the drawings shows another embodiment of an imaging system 40 in which motion detection is performed using the current input frame and a previous output frame from the temporal filter given by (3), The imaging system 40 is very similar to the imaging system 10; except the former stores only one previous output frame its frame buffer whereas the latter stores multiple previous input frames. Accordingly the same reference numerals are used to indicate like components between the two systems. The imaging system 40 includes a frame buffer 46 and a line buffer 48. The frame buffer 46 has a capacity of one frame of image data, and the line buffer 48 has a capacity to store a small number of lines (e.g., less than 8) of image data. Figure 5 of the drawings shows a flowchart of an image stabilization technique performed by the imaging system 40. Referring to Figure 5, it wiil be seen that at block 50, an input image frame is captured. At block 52, local motion vectors between each pixel in the current input frame and the previous output frame are determined. Finally, at block 54, a temporal filtering operation based on the local motion vectors is performed. The temporal filtering operation included integrating pixels from the previous output frame and the current input frame to form the output image.

In the systems 10 and 40. motion detection is performed for every pixel iocation in the current input frame. This information is used in the temporal filtering procedure. An efficient method to perform motion detection is to use the iumsnance component of the image data, and ignore the chrominance values.

5 Image data coming from image sensors generaily contain noise, and noise can significantly affect the accuracy of motion detection. As a result, in one embodiment, a local smoothing procedure S y {) is applied by the block 20 to the input image data in the current frame before motion detection is performed. The local smoothing procedure S y {) is designed for reducing the nosse level in H) the current input frame so that accurate motion detection can be achieved. The image data used at the input of the filtering block l yuv () is un-smoothed. As a result, the overall temporal filtering method can reduce noise using image data from multiple frames and at the same time prevent blurring m the output images.

it is noted that many parameters including the algorithm steps, the block size ! 5 parameters, the criterion in determining acceptance of local motion vectors, and the integration method will impact on the quality of the output image. The specific procedure of each step and the selection of parameters are described in the following sections.

Exposure time

0 Generally a short exposure time is preferred so that each captured frame contain crisp image data with little motion artifact. As described earlier, short exposure time also means that each individual frame is noisy : and hence there is

a need to incorporate the motion compensated temporai filtering, In one embodiment, it has been found that exposure time in the range of 1/250 to 1/2000 seconds is appropriate.

Smoothing SJ )

5 As described earlier, luminous values between image pixels in consecutive frames are processed to produce the local motion vectors at every pixel iocation. in order to accurateiy determine the motion vectors, noise in the input data is removed, in one embodiment, before the comparisons are done. The smoothing biock S > ,( } appiies spatia! low pass filtering to reduce noise in the 0 current input frame before the pixel data are used in the motion detection biock. Referring to Figure 3, the smoothed value for a pixel can be calculated from a window around the pixei as

where M and N define the support size of the smoothing filter, w, j is the weight or \ 5 point spread function of the smoothing filter, Xi 4 is the luminous value of the current frame, and qij is the output of the smooth biock. The point spread function is often normalized so that

In one embodiment, the values M and N are both set to 1 , and the weights are 0 uniform with vaiues equal to 1/9 for each coefficient.

Motion Detection

The motion detection block MD V ( ) calculates the local motion vector for each pixel in the current frame. For an efficient implementation, it is sufficient to compare the luminance values between two frames. Consider a block of pixels 5 qij of size 2K+1 by 2L+1 centered at pixel location (m, n) in the current smoothed frame. The variable qr^ represents the filtered or smoothed result of the luminance values in the current input frame. In one embodiment, the sum absolute difference D n- n t , between this block and a block of luminance values

centered at (m+r, n+s) in the previous frame is calculated as follows;

10

where the parameters r and s are restricted to a search range. The best local motion vector at location (m, n) of the current input frame is defined as

In other words, determination of the motion vector at the pixel location (m,

15 n) in the current input block is to find a block in the previous frame that has the closest match to the block around the location (m, h) in the current input block.

Generally, the complexity of the algorithm increases with the size of the search range, in a one embodiment, a search range of 31 x31 is selected, and K=L=I ,

In order to find an optimally matched location in the previous output frame 20 for a pixel at location (m. n) m the current input frame, in one embodiment a

block of pixels centered at (m, n) is considered and (λ s*) m n according to Equation (?) is found, in this case, (λ s * W is the center pixel of a block in the previous output frame that is the closest match to the block centered at (m s n} m the current input frame. For the purpose of temporal filtering, in one embodiment the pixel at location (r* s*) m<n in the previous output biock is considered to be the best match of the pixel at location (m, n) in the current input frame.

Block Noise Threshold and Block Rejection Threshold

Even with two frames with relatively low noise, motion vector determination can sometimes be incorrect. For example, the captured noise over a relatively clean background such as a wall can lead to non-zero motion vectors even though the area is not moving. Based on this observation,

embodiments of the present invention use a parameter β called block noise

threshold. When the difference between the pixel values in a block of the current frame and the block in the same location of previous output frame is below the

block noise threshold, i.e., when />„.„,.,. < β t then the local motion vector at (m,

n) is considered to be zero.

Another consideration in motion detection is that there may not be a matched block within the defined search area to the block in the input image, although the minimization criterion in (6) will always return a motion vector, As a

result, in one embodiment of the present invention a parameter γ called block

rejection threshold is used. When the difference between the pixel values in a biock of the current frame and any block in the previous output frame within the

search range is above the block rejection threshold, i.e.. when ./.),,. ,., s > γ for al!

(r, s) within the search range, then the local motion vector at (m, n) is rejected. That is. the "optimum * match (f*. s*} is not used in the filtering procedure,

ft turns out that the block noise threshold and block rejection threshold are

related. For image sets at a similar quality or noise level, the parameters β and / are linearly related to each other. In other words, when the block rejection

threshold γ increases, so does the block noise threshold β A reason is that both

of them depend on image noise. When the level of noise in the images is increased, both the block noise threshold and block rejection threshold increase. in one embodiment, a single quantity called MaxDiffPerPixel is used, which is independent of block size because the parameter is normalized to a per pixel

basis. The block rejection threshold γ is set to MaxDiffPerPixel times the block

size. That is

γ - (2K 4- I) * (21 + 1) * MaxDiffPerPixel .

At the same time, the block noise threshold β is set to β ~ γ / 6.

Sn one embodiment the value EviaxDiffPerPixeS may be set as a linear function of the sensor gain, and the proportional factor can be determined by calibration.

Temporal Filtering

The final step in the method is to perform temporal filtering based on the results of the motion vector determination and thresholding steps. The procedure can be summarized as

where αr 0 and α t are the filtering parameters where the motion vector at the

location (m, n) was considered to have zero and non-zero values, respectively.

In other words, if D tn n r i for ail (r, s) within the search range exceeds the

5 biock rejection threshold, the motion vector is rejected and no filtering is

performed. On the other hand, if D m >; rλ for ail (r, s) within the search range is

lower than the block noise threshold, the local motion vector is considered to be zero, and filtering of the form (3) is performed. Otherwise, the local motion vector is accepted and the motion compensated pixel in the previous output SO frame is used in the filtering procedure.

in one embodiment where only two frames with similar noise levels are

considered, α is selected to be 0.5, i.e. equal weights are given to each of the two

frames. However, because the techniques described herein may be used recursively and consecutively on a sequence of frames, the noise ievel in the

1.5 "previous output frame" will gradually decrease because of the accumulative

effect of the filtering procedure. Thus in one embodiment a is selected to be greater than 0.5, i.e., higher wesght is given to the previous output frame which

has a lower noise level. In one embodiment α is selected to be less than 1 to

prevent the image sequence becoming stagnant, i.e. the output frames are not 0 changing although the input frames are.

As an example, consider the case where the techniques described herein are applied to ten consecutive input frames. Assuming that the initial condition is zero, and applying (8) recursively 9 times for the case that the motion vector was considered to have zero value r the (ro ; /?) th pixel for the 10 th output frame will be

For example, if cu., ::: 0.75, then {9} becomes

One of ordinary skill in the art will appreciate that, the procedure is equivalent to a weighted average of the pixels in the past input frames. For the case where the motion vectors are accepted and using a small value such as U 1 ::: 0.1, applying (8) recursively will give the result

tn this case, the most recent frame dominates the result, as it should.

For the above case, the values «Ό and are chosen so that a relatively large

value of CA) enables multi-frame averaging to reduce noise in the case where

there is little motion. When there is motion, light condition on a particular object in the scene may be different in consecutive frames and it may affect the

precision of the motion detection. As a result, a relatively small value of m is used so that blurring of the local neighborhood is minimized,