Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND IMAGE PROCESSOR UNIT FOR PROCESSING IMAGE DATA
Document Type and Number:
WIPO Patent Application WO/2023/174546
Kind Code:
A1
Abstract:
The invention refers to a method for processing image data (IMGRAW) of an image sensor (4), wherein the image data comprising a burst of frames captured by the image sensor (4) per image, each frame comprising a raw matrix of pixels. The method comprises the steps of: A) determining motion vectors (MV1…N), wherein each forward motion vector in forward direction represents the displacement of a block in a selected anchor frame of the burst of frames to the best-matching block in an respective alternate frame of the burst of frames and each backward motion vector in backward direction represents the displacement of a block in a respective alternate frame of the burst of frames to the best-matching block in an selected anchor frame of the burst of frames; B) determining reliability factors for the motion vectors determined in step A) to assign the reliability of the respective motion vector for a given block by use of the difference between the forward motion vector and the related backward motion vector, wherein the reliability increases with decreasing difference; C) aligning the frames of the burst of frames for an image, wherein the motion is compensated by use of weighted motion vectors, wherein the motion vectors are weighted with the respective reliability factor determined in step B).

Inventors:
SCHEWIOR GREGOR (DE)
EL-YAMANY NOHA (DE)
Application Number:
PCT/EP2022/057016
Publication Date:
September 21, 2023
Filing Date:
March 17, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DREAM CHIP TECH GMBH (DE)
International Classes:
G06T5/00; G06T5/50; G06T7/223; H04N5/14; H04N19/51; H04N19/513
Foreign References:
US20100225768A12010-09-09
US20070002058A12007-01-04
US20090147853A12009-06-11
Other References:
ADITHYA POTHAN RAJ V ET AL: "Detection of small moving objects based on motion vector processing using BDD method", ADVANCED COMPUTING (ICOAC), 2011 THIRD INTERNATIONAL CONFERENCE ON, IEEE, 14 December 2011 (2011-12-14), pages 229 - 234, XP032136815, ISBN: 978-1-4673-0670-6, DOI: 10.1109/ICOAC.2011.6165180
WONSANG YOU ET AL: "Moving Object Tracking in H.264/AVC Bitstream", 30 June 2007, MULTIMEDIA CONTENT ANALYSIS AND MINING; [LECTURE NOTES IN COMPUTER SCIENCE;;LNCS], SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, PAGE(S) 483 - 492, ISBN: 978-3-540-73416-1, XP019064716
SAMUEL W HASINOFF ET AL: "Burst photography for high dynamic range and low-light imaging on mobile cameras", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 35, no. 6, 11 November 2016 (2016-11-11), pages 1 - 12, XP058306351, ISSN: 0730-0301, DOI: 10.1145/2980179.2980254
SING BING KANG ET AL: "High dynamic range video", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 22, no. 3, 1 July 2003 (2003-07-01), pages 319 - 325, XP058249526, ISSN: 0730-0301, DOI: 10.1145/882262.882270
S. HASINOFFD. SHARLETR. GEISSA. ADAMSJ. T. BARRONF. KAINZJ. CHENM. LEVOY: "Burst photography for high dynamic range and low-light imaging on mobile cameras", ACM TRANSACTIONS ON GRAPHICS, vol. 35, no. 6, November 2016 (2016-11-01)
B. WRONSKII. GARCIA-DORADOM. ERNSTD. KELLYM. KRAININC. K. LIANGM. LEVOYP. MILANFAR: "Handheld Multi-Frame Super-Resolution", ACM TRANSACTIONS ON GRAPHICS, vol. 38, no. 4, July 2019 (2019-07-01)
L. C. MANIKANDANR. K. SELVAKUMAR: "A Study on Block Matching Algorithms for Motion Estimation in Video Coding", INTERNATIONAL JOURNAL OF SCIENTIFIC & ENGINEERING RESEARCH, vol. 5, July 2014 (2014-07-01)
R. YAAKOBA. ARYANFARA. A. HALINN. SULAIMAN: "A Comparison of Different Block Matching Algorithms for Motion Estimation", THE 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI 2013
R. HARTLEYA. ZISSERMAN: "Multiple View Geometry in Computer Vision", 2004, UNIVERSITY PRESS
G. H. GOLUBC. F. VAN LOAN, MATRIX COMPUTATIONS, 1989
S. CHAN ET AL., SUBPIXEL MOTION ESTIMATION WITHOUT INTERPOLATION
G. H. GOLUBC. F. VAN LOAN: "Matrix Computations", 1989
M. A. FISCHLERR. C. BOLLES: "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography", COMMUNICATIONS OF THE ACM, vol. 24, no. 6, 1981, pages 381 - 395
Attorney, Agent or Firm:
MEISSNER BOLTE PATENTANWÄLTE RECHTSANWÄLTE PARTNERSCHAFT MBB (DE)
Download PDF:
Claims:
Claims

1 . Method for processing image data (IMGRAW) of an image sensor (4), wherein the image data comprising a burst of frames captured by the image sensor (4) per image, each frame comprising a raw matrix of pixels, characterised by:

A) determining motion vectors (MVI ...N), wherein each forward motion vector in forward direction represents the displacement of a block in a selected anchor frame of the burst of frames to the best-matching block in an respective alternate frame of the burst of frames and each backward motion vector in backward direction represents the displacement of a block in a respective alternate frame of the burst of frames to the best-matching block in an selected anchor frame of the burst of frames;

B) determining reliability factors for the motion vectors determined in step A) to assign the reliability of the respective motion vector for a given block by use of the difference between the forward motion vector and the related backward motion vector, wherein the reliability increases with decreasing difference;

C) aligning the frames of the burst of frames for an image, wherein the motion is compensated by use of weighted motion vectors, wherein the motion vectors are weighted with the respective reliability factor determined in step B).

2. Method according to claim 1 , characterised by assigning a reliability factor of Zero (0) to a set of forward motion vector and related backward motion vector in case that the difference between the forward motion vector and the related backward motion vector exceeds a predefined threshold value, so that the motion vectors are disregarded in the further processing. Method according to claim 1 or 2, characterised by assigning a reliability factor of One (1 ) to a set of forward motion vector and related backward motion vector in case that the difference between the forward motion vector and related back- ward motion vector is below a predefined motion consistency threshold value (TMC), SO that the motion vectors are considered in the further processing. Method according to one of claims 1 to 3, characterised in that the difference between the forward motion vector and related backward motion vector is the L1-norm sum of absolute difference (SAD) calculated by the least absolute deviations wherein denote the forward motion vector and [ub(i,j),vb(i,j)] denote the backward motion vector with i = 1 , 2 , M and j = 1 , 2, ... , N where M and N is the size of the M x N frame. Method according to one of the preceding claims, characterised by assigning a reliability factor of Zero (0) to motion vector in case that the magnitude of at least one of the x-component in x-direction or y-component in y-direction of the motion vector is equal to the block matching search size (P). Method according to one of the preceding claims, characterised by assigning a reliability factor to a motion vector for a given block by use of the difference of the respective motion vector and the motion vectors of neighbouring blocks. Method according to claim 6, characterised by assigning the reliability factor of Zero (0) to the motion vector for a given block in case that at least one of the difference of the magnitude of the x-component in x-direction of the motion vector and the average magnitude of the x-components in x-direction of the motion vectors of a set of blocks located in the direct and/or indirect neighbour- hood to the selected block exceeds a predefined threshold value (Tu) or the difference of the magnitude of the y-component in y-direction of the motion vector and the average magnitude of the y-components in y-direction of the motion vectors of a set of blocks located in the direct and/or indirect neighbour- hood to the selected block exceeds a predefined threshold value (Tv).

8. Method according to one of the preceding claims, characterised by calculating the sum of absolute differences during the block matching in step A) and assigning a reliability factor for the motion vector of a given block by use of the sum of absolute difference (SAD) which corresponds to the motion vector for the best-matching block.

9. Method according to claim 8, characterised by assigning a reliability factor of Zero (0) in case that the sum of absolute difference (SAD), which corresponds to the motion vector for the best-matching block, exceeds a predefined threshold value.

10. Method according to one of the preceding claims, characterised in that a total reliability factor is calculated by combining a set of reliability factors assigned to a motion vector for a given block, for example by multiplying the reliability factors of the set assigned to a related motion vector.

11 . Method according to claim 8, characterised in that the motion vectors are weighted with the respective total reliability factor (RMV (i, j)) in step C).

12. Method according to one of the preceding claims, characterised in that the frames considered for steps a) to c) are a selected region of interest (ROI) in a larger frame captured or being able to be captured by the image sensor (4).

13. Method according to one of the proceeding claims, characterised in that the best-matching block is determined by calculating the sum of absolute difference (SAD) between the selected block in a first frame and the block under search in a second frame for a number of blocks under search having different relative positions in the second frame from each other, and determining the best- matching block as the block having the smallest sum of absolute difference (SAD).

14. Method according to one of the proceeding claims, characterised by determining a global motion vector for a frame by calculating an average motion vector from the set of weighted motion vectors for the frame, which are each weighted with the respective total reliability factor or at least one of reliability factor determined for the motion vectors of given blocks of the frame.

15. Method according to claim 14, characterised by

- aligning an alternate frame to a selected anchor frame by use of the respective global motion vector for the alternate frame, and

- determining subpixel-level forward motion vectors respectively for each pair of blocks in the alternate frame and anchor frame,

- determining a global subpixel level motion vector by calculating an average motion vector from the set of subpixel-level forward motion vectors determined for the blocks of the frame, and

- aligning the alternate frame to the anchor frame by use of the global subpixel level motion vector determined for the alternate frame.

16. Method according to one of the preceding claims, characterised by

- determining a respective local subpixel level motion vector for each block of a frame and determining at least one reliability factor for each local subpixel level motion vector, and

- blockwise aligning the blocks in the alternate frame to the anchor frame by use of the local subpixel level motion vectors determined for the respective blocks of the alternate frame.

17. Method according to one of the preceding claims, characterised by estimating the motion of blocks by use of sensor signals of auxiliary sensors, preferably by use of the acceleration of the image sensor (4) measured by an accelerometer, a gyroscope and/or optical depth sensor.

18. Method according to one of the preceding claims, wherein each frame comprises a plurality of colour channels, characterised by selecting the colour channel having the highest information density compared to the information density of the other channels of the plurality of channels, i.e. the highest sampling colour channel, aligning the raw matrix of pixel of the frame for the selected colour channel by interpolating missing pixel in the selected colour channel, and proceeding with the steps A) to C) on the interpolated pixel of this selected colour channel.

19. Image processor unit (3) for processing raw image data (IMGRAW) provided by an image sensor (4), said image sensor (4) comprises a sensor array providing a burst of frames captured by the image sensor (4) per image, each frame comprising a raw matrix of pixels a raw matrix of pixels per image, characterised in that the image processor unit (3) is arranged to:

A) determine motion vectors (MVI ...N), wherein each forward motion vector in forward direction represents the displacement of a block in a selected anchor frame of the burst of frames to the best-matching block in an respective alternate frame of the burst of frames and each backward motion vector in backward direction represents the displacement of a block in a respective alternate frame of the burst of frames to the best-matching block in an selected anchor frame of the burst of frames;

B) determine reliability factors for the motion vectors determined in step A) to assign the reliability of the respective motion vector for a given block by use of the difference between the forward motion vector and the related backward motion vector, wherein the reliability increases with decreasing difference; and

C) align the frames of the burst of frames for an image, wherein the motion is compensated by use of weighted motion vectors, wherein the motion vectors are weighted with the respective reliability factor determined in step B).

20. Image processor unit (3) according to claim 19, characterized in that the image processor unit (3) comprises or is connected to at least one auxiliary sensor, preferably an accelerometer, a gyroscope and/or optical depth sensor, wherein the image processor unit (3) is adapted to estimate motion of blocks by use of signals of the at least one auxiliary sensor.

21 . Image processor unit (3) according to claim 19 or 20, characterised in that the image processor unit (3) is arranged for processing image data by performing the step of one of claims 1 to 17.

22. Computer program comprising instructions which, when the program is executed by a processing unit, causes the processing unit to carry out the steps of the method of one of the claims 1 to 18.

Description:
Method and image processor unit for processing image data

The invention relates to a method for processing image data of an image sensor, wherein the image data comprising a raw matrix of pixel per image, i.e. raw image data.

The invention further relates to an image processor unit for processing raw image data provided by an image sensor, said image sensor comprises a sensor array providing a raw matrix of pixel per image.

Further, the invention relates to a computer program arranged to carry out the steps of the aforementioned method.

Digital imagers are widely used in everyday consumer products, such as smart- phones, tablets, notebooks, cameras, cars and wearables. The use of small imaging sensors is becoming a trend to maintain a small, light-weight product form factor as well as to reduce the production cost. Even when imaging sensors with a high number of megapixels are used, colour filter arrays (CFAs), such as common Bayer colour filter array, are typically used to reduce the cost. The use of colour filter arrays limits or reduces the spatial resolution, as the full colour image is produced via interpolation (demosaicing) of undersampled colour channels.

The resolution/dynamic range/noise limitations drove engineers to explore Multi-Frame computational photography-based image processing pipelines, which are also referred to as burst image signal processors (BISP) and are known to address resolution limitations, dynamic range limitations and noise limitations. In a BISP, a burst of frames is captured, preferably with pre-defined (e.g. programmable) settings, and the frames are fused together to achieve a variety of goals.

S. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Barron, F. Kainz, J. Chen and M. Levoy, “Burst photography for high dynamic range and low-light imaging on mobile cameras,” ACM Transactions on Graphics, Vol. 35, No. 6, November 2016, SIGGRAPH Asia 2016 describes a multi-frame technology designed to reduce the noise and to increase the dynamic range. A burst of underexposed frames are captured, aligned and merged to produce a single intermediate image of high bit depth, and tone mapping this image to produce a high-resolution photograph. The merge method operates on image tiles in the spatial frequency domain, said tiles being overlapped by half in each spatial dimension. By smoothly blending between overlapped tiles, visually objectionable discontinuities at tile boundaries are avoided. Additionally, a window function must be applied to the tiles to avoid edge artifacts when operating in the DFT domain.

In the raw image alignment solution presented in this reference, each 2x2 block (quad) in the RGGB Bayer raw data is averaged to create a down-sized grayscale image (1/4 the resolution of the raw CFA data). Multi-scale (pyramidal) motion estimation is performed on the down-sized grayscale image. Block matching is pursued for motion estimation at each of the scales, with an L2 cost function being minimized at all levels of the image pyramid, except at the finest scale (down-sized grayscale image), where an L1 cost function is minimized. Subpixel-level accuracy is sought during the motion estimation at all scales of the pyramid, except at the finest scale, where pixel-level accuracy is sought. This strategy effectively limits the pixel displacements between the raw frames to be multiple of 2. This constraint is considered to be sufficient for the purpose of the application of multi-frame denoising and high-dynamic-range (HDR) fusion.

B. Wronski, I. Garcia-Dorado, M. Ernst, D. Kelly, M. Krainin, C. K. Liang, M. Levoy and P. Milanfar, “Handheld Multi-Frame Super-Resolution,” ACM Transactions on Graphics, Vol. 38, No. 4, Article 28, July 2019, SIGGRAPH 2019 discloses a multi- frame super-resolution MFSR, wherein a burst of raw frames being shifted by a normal hand shaking or hand tremor in sub-pixel dimension are fused to produce a higher-resolution frame. A captured burst of raw (Bayer CFA) images are input to the algorithm. Every frame is aligned locally to a single base frame. Each frame’s contribution at every pixel is estimated through kernel regression and these contributions are accumulated separately per colour channel. The kernel shapes are adjusted based on the estimated local gradients and the sample contributions are weighted based on a robustness model, which computes a per-pixel weight for every frame using the alignment field and local statistics gathered from the neighborhood around each pixel. The final merged RGB image is obtained by normalizing the accumulated results per channel. This process of merging a burst of frames has the effect to boost the perceived resolution or simply to enable choosing the best shot or an application to zero-shutter-lag use cases. An additional step of three iterations of Lukas-Kanade optical flow image warping is proposed, which achieves subpixel-level accuracy, since pixel-level accuracy is not sufficient for the purpose of multi-frame super-fusion algorithms.

Most modern digital single lens cameras (SLR) support continuous or burst shooting. The burst mode is supported to enable selecting the best shot, or to perform sophisticated multi-frame processing, such as a multi-frame super-resolution feature. The burst rate (i.e. how many frames taken in fast succession) varies but is increasing with advancing camera technologies.

An essential step in the development of many multi-frame (burst) processing solutions is image alignment. In this step the motion between the captured frames or the motion of selected frames with respect to an anchor (reference) frame is estimated, and the frames are subsequently aligned or registered to compensate for the global motion and/or local motions. The aligned frames can then be fused in a variety of ways, depending on the intended feature. Image alignment can take place in the raw colour- filter array (CFA) domain or in the full-colour (e.g., RGB) domain, can be designed to achieve pixel-level accuracy or subpixel-level accuracy, and can be tailored to fit a variety of motion models, such as the translational, similarity, affine, and projective motion models.

Since the goal of multi-frame processing is typically to overcome the imaging sensor limitations and boost the final image quality, alignment (and fusion) of the frames in the raw CFA domain is preferred. Image alignment in the raw domain, however, poses a challenge, primarily due to the typical colour under-sampling in the CFA, e.g., in the standard RGGB Bayer sensor, 50% green, 25% blue and 25% red. In addition, it is necessary that the alignment algorithm achieves arbitrary precision/accuracy to support solutions that either require pixel-level accuracy or subpixel-level accuracy. Furthermore, in real-time, resource-constrained environments (as in the case of many consumer camera products), it is desirable that image alignment would be computationally inexpensive, due to the speed, power, and memory constraints.

Image alignment includes the estimation of motion in the burst of frames. The technique of motion estimation that range from pixel-based to feature-based solutions, supporting global and local motion estimation as well as a variety of motion models is well known in the art.

L. C. Manikandan and R. K. Selvakumar, “A Study on Block Matching Algorithms for Motion Estimation in Video Coding,” International Journal of Scientific & Engineering Research, Volume 5, Issue 7, July-2014 describes different block matching algorithms used for motion estimation. Block Matching via exhaustive search (brute- force search) is quite expensive computationally. Significant speedup of the search can be achieved, for example via diamond search or other fast search algorithms.

R. Yaakob, A. Aryanfar, A. A. Halin and N. Sulaiman “A Comparison of Different Block Matching Algorithms for Motion Estimation,” The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013) four different block matching algorithms for motion estimation are evaluated, namely Full Search Exhaustive Search (ES), Three-Step Search (NTSS), Simple and Efficient Search (SES) and Adaptive Rood Pattern Search (ARPS).

R. Hartley and A. Zisserman, “Multiple View Geometry in Computer Vision”.

University Press, Cambridge, 2004 and G. H. Golub and C. F. Van Loan, Matrix Computations, second edition, 1989 each describes algorithms for processing of image sensor data in detail. The object of the present invention is to provide an improved method and an image processor unit for processing image data of an image sensor.

The object is achieved by the method comprising the features of claim 1 , the image processor unit comprising the features of claim 18 and the computer program comprising the features of claim 21 . Preferred embodiments are described in the dependent claims.

In order to achieve improved alignment of frames or parts thereof (e.g. blocks in frames), the method comprises the steps of:

A) Determining motion vectors, wherein each forward motion vector in forward direction represents the displacement of a block in a selected anchor frame of the burst of frames to the best-matching block in an respective alternate frame of the burst of frames and each backward motion vector in backward direction represents the displacement of a block in a respective alternate frame of the burst of frames to the best-matching block in an selected anchor frame of the burst of frames;

B) Determining reliability factors for the motion vectors determined in step A) to assign the reliability of the respective motion vector for a given block by use of the difference between the forward motion vector and the related backward motion vector, wherein the reliability increases with decreasing difference;

C) Aligning the frames of the burst of frames for an image, wherein the motion is compensated by use of weighted motion vectors, wherein the motion vectors are weighted with the respective reliability factor determined in step B).

The alignment can be used to register frames or parts thereof to respective image positions in related frames or parts thereof. A frame refers to a full image captured by the image sensor or a part of a full image, i.e. a frame with a particular full or reduced size. The alignment of frames can be performed e.g. by alignment of a complete frame or full image by use of a global motion vector for the complete frame or image or by alignment of a plurality of blocks of a frame by use of a respective local motion vectors for a related block. For example, the frames considered for steps A) to C) can be a selected region of interest (ROI) in a larger frame captured or being able to be captured by the image sensor.

The use of weighted motion vectors has the effect of improved quality of the motion vectors by overweighting the more reliable motion vectors and underweighting the less reliable motion vectors. Weighting in this meaning also includes the use of only two weighting factors ZERO and ONE to consider the motion vectors having a reliability above a given threshold and fully disregarding the other motion vectors having a reliability below the given threshold.

Depending on the definition of the threshold value, “above” or “exceeds” can be understood also as “equal and above” or alternatively “below” can be understood as “equal or below”.

The method can be performed by estimating the motion directly from the raw colour filter array (CFA) image data of the image sensor. The bi-directional motion estimation allows adaptive selection of anchor (reference) frames. The motion estimation via explicit identification of unreliable motion estimates leads to improved robustness.

There are no iterative operations/optimizations, nor multi-scale/pyramidal representations required, so that the method can be implemented in a digital signal processor or image data processing software with reduced computational effort and hardware expenses. The methods allows implementation with low computational complexity. It is possible to implement the method with complexity being linear in the number of pixels.

The motion can be estimated with arbitrary accuracy, e.g. pixel-level and/or subpixel- level accuracy. Subpixel-level motion estimation with arbitrary accuracy is achievable, without the use of any iterative or multi-scale procedures. The estimated subpixel-level motion vectors can be used to derive the parameters of a variety of motion models, supporting global and/or local motions. The method supports global motion as well as local motion, and different motion models.

The method can be implemented to perform real and direct image registration (i.e. alignment) in the raw domain for a chosen colour channel that possess the highest sampling/strongest response. Since the offsets of other colour channels in the CFA with respect to the chosen one are known and fixed, registration of those colour channels is done implicitly without any additional computational effort.

The method explicitly detects unreliable motion vectors by means of the reliability criteria. Thanks to the availability of forward and backward motion vectors, there is the possibility of adaptive/variable selection of the reference frame.

A number of different motion vectors reliability rules can be applied to obtain at least one reliability factor for a respective frame or part thereof, e.g. a set of reliability factors for a block with each reliability factor of the set being obtained by a related reliability rule so that the set includes reliability factors obtained by different predefined reliability rules.

By use of a first reliability rule, a reliability factor of Zero (0) obtained can be assigned to a set of forward motion vector and related backward motion vector, e.g. to a motion vector determined for the respective block or frame and used for further processing, in case that the difference between the forward motion vector and related backward motion vector exceeds a predefined threshold value, so that the motion vectors are disregarded in the further processing. The motion vector determined for the respective block or frame and used for further processing can be for example the forward motion vector only.

By use of the first reliability rule, a reliability factor of One (1 ) can be assigned to a set of forward motion vector and related backward motion vector, e.g. to a motion vector determined for the respective block or frame and used for further processing, in case that the difference between the forward motion vector and related backward motion vector is below a predefined motion consistency threshold value, so that the motion vectors are fully considered in the further processing. The motion vector determined for the respective block or frame and used for further processing can be for example the forward motion vector only.

The difference between the forward motion vector and related backward motion vector can be determined as the L1-norm sum of absolute difference (SAD) calculated by the least absolute deviations with the exemplary formula: wherein denote the forward motion vector and denote the backward motion vector with i = 1 , 2 ... , M and j = 1 , 2, , N where M and N is the size of the M x N frame.

By use of a second reliability rule, a reliability factor of Zero (0) can be assigned to motion vector in case that the magnitude of at least one of the x-component in x- direction or y-component in y-direction of the motion vector is equal to the block matching search size. Thus, if the magnitude of the x and/or y-component of the motion vector for a given block is trimmed to the block matching search size, the motion vector most likely is unreliable. Otherwise, the reliability factor related to this second rule can be set to ONE (1 ).

By use of a second reliability rule, a reliability factor can be assigned to a motion vector for a given block by use of the difference of the respective motion vector and the motion vectors of neighbouring blocks. This implicitly imposes a smoothness constraint on the motion vector field of a frame.

This can be achieved by assigning a third rule reliability factor of ZERO (0) to the motion vector for a given block in case that at least one of the difference of the magnitude of the x-component in x-direction of the motion vector and the average magnitude of the x-components in x-direction of the motion vectors of a set of blocks located in the direct and/or indirect neighbourhood to the selected block exceeds a predefined threshold value or the difference of the magnitude of the y-component in y-direction of the motion vector and the average magnitude of the y-components in y- direction of the motion vectors of a set of blocks located in the direct and/or indirect neighbourhood to the selected block exceeds a predefined threshold value. During the block matching in step a) the sum of absolute differences can be calculated. A fourth rule can be implemented with the purpose to identify only high- quality motion vectors as reliable. A forth rule reliability factor can be assigned to the motion vector of a given block which is determined by the use of the sum of absolute difference (SAD) which corresponds to the motion vector for the best-matching block. Thus, after estimating the sum of absolute differences SADs during block matching, for a given block, the SAD value that corresponds to the best match motion vector is compared to a pre-defined threshold, and if it is above that threshold, then the corresponding motion vector is considered to be unreliable, i.e. reliability factor of Zero. Otherwise, the motion vector is considered as reliable, so that the related reliability factor can be set to ONE.

A reliability factor of Zero (0) can be assigned in case that the sum of absolute difference (SAD) which corresponds to the motion vector for the best-matching block exceeds a predefined threshold value. In that case, the motion vector is fully disregarded for the further processing of the image.

A total reliability factor can be calculated by combining a set of reliability factors assigned to a motion vector for a given block. This can simply be calculated by multiplying the set of reliability factors, which are determined by use of different rules for a related motion vector.

The motion vectors can be weighted e.g. with the respective total reliability factor in step C).

The method can be performed by use of any applicable strategy for evaluating matching of parts of images, in particular block matching.

The best-matching block can be, for example, determined by calculating the sum of absolute difference (SAD) between the selected block in a first frame and the block under search in a second frame for a number of blocks under search having different relative positions in the second frame from each other, and determining the best- matching block as the block having the smallest sum of absolute difference (SAD). A global motion vector can be determined for a frame by calculating an average motion vector from the set of weighted motion vectors for the frame, which are each weighted with the respective total reliability factor or at least one of reliability factor determined for the motion vectors of given blocks of the frame.

Estimation of motion with subpixel-level accuracy can be achieved by the steps of:

- Aligning an alternate frame to a selected anchor frame by use of the respective global motion vector for the alternate frame;

- Determining subpixel-level forward motion vectors respectively for each pair of blocks in the alternate frame and anchor frame;

- Determining a global subpixel level motion vector by calculating an average motion vector from the set of subpixel-level forward motion vectors determined for the blocks of the frame; and

- Aligning the alternate frame to the anchor frame by use of the global subpixel level motion vector determined for the alternate frame.

For each block of a frame, a respective local subpixel level motion vector can be determined. At least one reliability factor can be determined for each local subpixel level motion vector, and the blocks in the alternate frame can be aligned blockwise (i.e. separate for each block of a frame block by block) to the anchor frame by use of the local subpixel level motion vectors determined for the respective blocks of the alternate frame.

If the information of some motion-related auxiliary sensors, such as an accelero- meter/a gyroscope and of depth sensors, is available, the motion estimation process can be supported by such information, and the motion vector quality could be further improved. Preferably, sensor signals could be used for each of the anchor and alternate frames. This can be used to solve e.g. the issue, that the motion model is valid for global motions and might not be fully accurate for objects that are not in the same distance to the imaging system.

The image sensor may capture image data such that a plurality of colour channels are provided. This might include also White- or Grey colour. In case that each frame (i.e. image) comprises a plurality of colour channels, estimation of motion vectors is performed on a selected channel, which has the highest information density, i.e. the highest-sampling colour channel. This is in case of the Bayer Colour Filter Array the Green Channel due to the pattern R-G-G-B. The method therefore comprises the steps of selecting the colour channel having the highest information density compared to the information density of the other channels of the plurality of channels, aligning the raw matrix of pixel of the frame for the selected colour channel by interpolating missing pixel in the selected colour channel, and proceeding with the steps A) to C) on the interpolated pixel of this selected colour channel.

The object is also achieved by an image processor unit for processing raw image data provided by an image sensor. Said image sensor comprises a sensor array providing a burst of frames captured by the image sensor per image, each frame comprising a raw matrix of pixels a raw matrix of pixels per image.

According to the present invention, the image processor unit is arranged to:

A) Determine motion vectors, wherein each forward motion vector in forward direction represents the displacement of a block in a selected anchor frame of the burst of frames to the best-matching block in an respective alternate frame of the burst of frames and each backward motion vector in backward direction represents the displacement of a block in a respective alternate frame of the burst of frames to the best-matching block in an selected anchor frame of the burst of frames;

B) Determine reliability factors for the motion vectors determined in step A) to assign the reliability of the respective motion vector for a given block by use of the difference between the forward motion vector and the related backward motion vector, wherein the reliability increases with decreasing difference; and

C) Align the frames of the burst of frames for an image, wherein the motion is compensated by use of weighted motion vectors, wherein the motion vectors are weighted with the respective reliability factor determined in step B).

The image processor unit is arranged for processing image data by performing the aforementioned method steps. Thus, the object is further solved by the image processor unit comprising the features of one of the claims 18 19 or 20. The object is further solved by a computer program comprising instructions which, when the program is executed by a processing unit, cause the processing unit to carry out the steps of the aforementioned method.

In the following, the invention is explained by way of exemplary embodiments with the following figures. It shows:

Figure 1 - Block diagram of an electronic device comprising a camera, an image processor unit and a mechanical actuator;

Figure 2 - Flow diagram of the method for processing image data of an image sensor;

Figure 3 - Schematic diagram of block matching related to macroblock displace- ment in the reference (anchor) and alternate frame within a search window in the alternate frame;

Figure 4 - Exemplary estimated motion vector field for a frame in pixel-level accuracy;

Figure 5 - Exemplary estimated motion vector field for a frame according to figure 4 with weighted motion vectors disregarding the unreliable motion vectors;

Figure 6 - Exemplary estimated motion vector field for a frame according to figure 5 with weighted motion vectors in sub-pixel-level accuracy

Figure 7 - Schematic diagram of the method adapted to higher-order motion models.

Figure 1 is an exemplary block diagram of an electronic device 1 comprising a camera 2 and an image processor unit 3 for processing raw image data IMGRAW provided by an image sensor 4 of the camera 2.

The image sensor 4 comprises an array of pixels so that the raw image IMGRAW is a data set in a raw matrix of pixels per image. In order to capture colours in the image, a colour filter array CFA is provided in the optical path in front of the image sensor 4. The camera comprises an opto-mechanical lens system 5, e.g. a fixed uncontrolled lens. The electronic device 1 further comprises a mechanical actuator 6. The mechanical actuator 6 is provided in the electronic device primarily for other purposes, e.g. for signalling to a user. This is a well-known feature of smartphones for signalling incoming new messages or calls.

In this regard, the electronic device 1 can be a handheld device like a smartphone, a tablet, wearables or a camera or the like.

The image processor unit 3 is arranged for processing image data IMGRAw from the image sensor 4 capturing a burst of images I frames FI ...N for one image. The image processor unit 3 is arranged to process estimated motion vectors MV and to align the matrices of pixels of bursts of captured images to one specific alignment and to combine the burst of images to achieve a resulting image IMGFIN by use of the plurality of pixels of the raw matrices available for each pixel position and the matrix of the resulting image.

Figure 2 shows a flow diagram of the method for processing raw image data IMGRAW of an image sensor 4. The raw image data IMGRAW comprises a burst of images, or a burst of frames FREF, FALT_1, FALT_2, ... , FALT_N captured for one image. The raw image data IMGRAW are processed by automatically performing at least the steps a) to h) e.g. on a respectively arranged digital signal processor or by software running on an image processor. The steps a) to h) includes the option to divide a step in order to perform a similar routine of a step each on a respective frame (i.e. step a) = {a0), a1 ), ... , aN)}; step b) = {b0), b1 ), ... , bN)}).

The underlying assumption in the flow of Figure 2 is that the goal is to align the frames FREF, FALT_1, FALT_2, ... , FALT_N, and hence global motion estimation is pursued. However, the proposed approach can be generalized to local motion estimation, e.g. by incorporating a motion segmentation step, which employs the estimated local motions, or by performing motion estimation on pre-defined regions in the frame. a) Colour Channel Interpolation (CCI)

The first step a), or set of steps a0), a1 ), a2), ... , aN) for each frame of a burst of frames for an image is designed for Colour Channel Interpolation CCI. In the raw CFA data, the colours are under-sampled, e.g. 50% green, 25% blue and 25% red in the standard RGGB Bayer sensor. It is proposed to perform alignment on the colour channel that has the highest sampling in the CFA and/or the channel that offers more details/higher response. An example of such a channel is the green channel in the standard RGGB Bayer CFA or the white channel in the RGBW Bayer CFA. However, as mentioned, the colour channels in the CFA are typically under-sampled. Hence, the first step in the image alignment pipeline is to interpolate the selected colour channel to fill in the gaps due to the missing samples. Interpolation is performed only in the missing samples places, in order to generate a full-resolution colour channel data. Detail-preserving interpolation, such as bi-cubic/bilateral interpolation, could be pursued. However, if computational complexity is of concern, simpler interpolation, such as bilinear interpolation, could be done.

Motion estimation will then be performed on the interpolated, full-resolution colour channel. Since the positions of the other colour channels in the CFA with respect to the selected channel are known, motion estimation is implicitly performed for those channels as well, without additional computational efforts. For simplicity, it is referred to the selected interpolated, full-resolution colour channel as C. The anchor (reference) is denoted by C anchor and the alternate frame/frame region of interest (ROI) is denoted by C alternate . b) Colour Channel Smoothing (CCS)

Next in step b), or the division of the routine into steps bO), b1 ), b2), ... , bN) each for a respective frame, Colour Channel Smoothing CCS is performed.

This step b) is designed to lowpass-filter both selected colour channel C anchor and ^alternate- This is performed in order to robustify the motion estimation operations against noise in the raw data. Lowpass filtering is achieved by convolving C anchor and C aUernate wih a smoothing filter, such as e.g. a two-dimensional Gaussian filter.

The smoothed images are denoted by and where And F is the 2-dimensional smoothing filter and ® denotes the 2-dimensional convolution operation. c) Fast Forward and Backward Block Matching (F-B-ME)

The following step c) (reflected by step A) in the claims) is designed for estimating motion vectors in forward and backward direction for the alternate frames (including the option of alternate ROIs of a frame) with respect to the anchor (reference) frame (or ROI of an anchor frame).

Block matching (BM) is well known and by far the most popular motion estimation method used in various image/video processing applications. The image can be divided e.g. into M x N rectangular blocks, each of size L x x L y and for each block in the current (alternate) frame, a search is performed in a window in the anchor (reference) frame to find the best matching block, which minimizes some pre- defined block distortion metric (e.g. the so called Displaced Frame Difference (DFD)).

The matching search can be performed within ±P pixels, i.e. supporting up to P pixels in the ±x and ±y directions, as depicted in Figure 3.

Block matching BM via exhaustive search (brute-force search) is quite expensive computationally. Significant speedup of the search can be achieved, for example via diamond search or other fast search algorithms known in the prior art.

The motion vector MV is computed by finding the best match, which achieves the minimum block distortion metric in the search window (±P pixels); the motion vector MV for a given block is calculated as the displacement of the reference frame block to the alternate frame best-matching block. Of course, the implicit underlying assumption in block matching BM is that within a small block, the motion can be modeled as translational. In addition, the brightness is assumed to be constant. While this latter assumption may not hold, for example in frames taken with different exposure times, a pre-processing photometric alignment can be performed prior to block matching BM, in order to find the motion vectors MVs under the brightness constancy assumption. The third step c) in the motion estimation pipeline is designed to perform block matching BM to estimate the motion between the smoothed colour channel C anchor and image data. Block matching is performed in both of the forward direction (matching towards and the backward direction (matching towards

The bi-directional block matching BM for motion reliability estimation allows to estimate a reliability factor for the respective motion vectors as explained later in respect to step d). It is also worth mentioning that having both the forward and backward motion estimation results available could facilitate adaptive selection of the anchor (reference) frame in subsequent multi-frame fusion operations.

Fast block matching search can be achieved by employing the diamond search strategy. At this stage, only pixel-level accuracy is thought. The block matching/distortion criterion used is the sum of absolute difference (SAD) between the block in the anchor frame and the block under search in the alternate frame (in the forward mode and vice versa in the backward mode), because of its low computational complexity. To speed up the determination of the motion vector MV for each block, only selected pixels of the blocks to be matched can be used to calculate the Displaced Frame Difference DFD, e.g. by subsampling the blocks.

For example, skipping every other pixel in the x and y directions would reduce the computations by a factor of four. The larger the block size, the more accurate the comparison, but at the same time the resolution of the vector field is lower, which leads to inaccurate vectors especially at moving object edges. To achieve high accuracy and resolution, a larger area is often used to calculate the Displaced Frame Difference DFD than the actual block size.

For this purpose, additional pixels can be included around the borders of the blocks and form an enlarged area with the size WL X x WL y , called wide block, where WL X ≥ L x and WL y ≥ L y .

If the image is divided into M x N rectangular blocks, let the motion vector MV for block (i,j), where i = 1 , 2, ... , M and j = 1 , 2, ... , N, be denoted by [u(i,j), v(i,j)]. The collection of MVs, {u(i,j),v(i,j)}, i = 1, 2 ...,M and j = 1, 2, ...,N represents the motion vector field for the alternate frame (in the forward mode). d) Motion Vectors Reliability Calculation (MV-R)

Many factors could lead to unreliable estimation of the motion vectors MVs, such as occlusion, motion blur, reflections, just to name a few. In addition, if the x and/or y absolute component of the real motion vector MV exceeds the block matching BM search size P, the estimated motion vector MV will not be correct, and the magnitude of the exceeding component will be trimmed to the search size P. Therefore, in the fourth step d) (reflected by step B) in the claims) of the proposed motion estimation pipeline, it is necessary to be able to detect unreliable motion vectors MVs, so that their values would be underweighted, e.g. fully excluded, in later analysis for finding the frame global motion parameters (or local motions, if needed).

Preferably a number of different motion reliability rules are applied to estimate respective reliability factors assigned to a motion vector for a given frame or part thereof (e.g. block). The reliability factors of the set of reliability factors assigned to one common motion vector for a given frame or part thereof can be combined to achieve a total reliability factor assigned to the motion vector for the given frame, i.e. full frame, block or the like.

Motion Reliability Rule #1 :

As mentioned earlier, in the third step c), forward and backward motion estimation are performed. Ideally, after block matching BM, the forward motion vector MV should be the opposite of the backward motion vector MV; i.e. the difference between the respective x and y components of the forward and backward motion vector should be zero. Given that fact, detection of unreliable motion estimation (due to occlusion, saturated pixels, motion blur, shadows, reflections and other local variations in the scene) can be checked by calculating the difference between the forward motion vector MV and the backward motion vector MV, and if the difference is not zero nor a very small value, then the motion vector MV for that given block is declared as unreliable. Mathematically, this can be formulated as follows.

- For a given block (i,j), let [u forward (i,j), V forward (i,j)] denote the forward MV and let [u backward (i,j), V backward (i,j)] denote the backward MV.

- The L1 distance between those two MVs is calculated and compared to a pre- defined threshold, T MV , and if it exceeds the threshold, then the binary reliability factor for that block, is set to Zero (0), otherwise is set to One (1 ). In the case of pixel-level accuracy and to ensure high-quality motion vectors, T MV could be set to Zero (0).

The first rule reliability factor for block (i, j) is denoted by

Motion Reliability Rule #2:

As eluded earlier, for a given block, if the magnitude of the x and/or y component of the motion vector MV is equal (trimmed) to the block matching BM search size P, then most likely it is unreliable. Therefore, an additional motion vector MV binary reliability factor R2( i,j) is calculated as follows.

The second rule reliability factor for block (i, j) is denoted by

Motion Reliability Rule #3:

The two reliability metrics presented above are further complemented by a third reliability rule, which implicitly imposes a smoothness constraint on the motion vector field of the frame. For a given block (i,j), if the corresponding motion vector MV is significantly different from the motion vectors MVs of the neighboring blocks in a small window, then that motion vector MV is considered as unreliable. Mathematically, this is formulated as follows

Where u avg and v avg are the average x and y component of the motion vectors MVs of the neighboring blocks, e.g. in the 3x3 block neighborhood around the block under consideration. And T u and T v are tunable parameters that control the strength of the motion vector MV smoothness constraint.

Motion Reliability Rule #4:

The aforementioned three reliability metrics are further complemented by a fourth one, whose purpose is to identify only high-quality motion vectors MVs estimates as reliable. To achieve this, after estimating the sum of absolute differences SADs during block matching, for a given block, the sum of absolute difference SAD value that corresponds to the best match motion vector MV is compared to a pre-defined threshold, and if it is above that threshold, then the corresponding motion vector MV is considered to be unreliable, i.e. forth rule reliable factor

Otherwise, the corresponding motion vector MV is considered to be reliable, i.e. forth rule reliable factor

Combination to Total Reliable factor:

For a given block (i,j), the total motion vector MV binary reliability is then calculated as follows: wherein Riviv(i,j) is denoted as total reliable factor assigned to the motion vector of a given block. In case that the reliable factors R1(i,j), R2(i,j), R3(i,j), R4(i,j) are set to ON-Off-states by digital Zero or One values, the total reliable factor Riviv(i,j) is Zero, once at least one of the reliable factors R1(i,j), R2(i,j), R3(i,j), R4(i,j) is zero. This can be implemented simply by use of a logical AND gate. e) Pixel-Level Global Motion Parameters

In the following, options for the calculation of Pixel-Level Global Motion Parameters are explained. In particular, two important motion estimation related choices are emphasized.

1 ) The Motion Model

While translational motion vectors MVs are calculated by the block matching BM strategy, this does not mean that the global motion of the frame needs to be modeled as translational. For example, an affine motion model can be assumed for the whole frame, and its parameters could be calculated from the reliable translational motion vectors. So, depending on the chosen motion model, the derivation of the global motion parameters will follow, as described shortly.

2) The Motion Accuracy

Motion estimation can be performed with pixel-level accuracy or subpixel-level accuracy in mind, depending on the target application. Block matching BM would be more computationally complex if it would be performed with subpixel-accuracy. Therefore, block matching BM with pixel-level accuracy is preferably performed, followed by a non-itertative, non-interpolative step of motion estimation with arbitrary subpixel accuracy. This way, motion vectors MVs with arbitrary subpixel accuracy are calculated, while still keeping the overall computational complexity low.

Now that block matching BM has been performed to calculate the pixel-level motion vectors MVs for the anchor and alternate frames, and also the corresponding reliability has been calculated, in the fifth step e) of the proposed motion estimation pipeline shown in Figure 2, the global motion parameters for the alternate frame with respect to the anchor one (or vice versa, if needed) are calculated.

Let the collection of reliable motion vectors be denoted by Ω MV , where A global, robust translation/displacement between the anchor frame and the alternate frame is calculated from the reliable motion vectors MVs in Ω MV . One possible robust estimate is the average value, e.g. the median, of all those reliable motion vectors MVs, as shown below. The result of this operation is a global motion vector MV, denoted by [u global , v global ]. f) Alternate Frame Alignment including Calculation of Subpixel-Level Global Motion Parameters

Subpixel accuracy is key to the operation of various multi-frame processing features, such as multi-frame super-resolution. Arbitrary accuracy is essential to achieving high-guality image registration, and thereby high-guality subseguent multi-frame processing. To achieve this goal, in the sixth step f) (reflected by step C) in the claims) of the proposed motion estimation pipeline, a non-iterative, interpolation-free approach is employed similar to the one described in S. Chan et al.: “Subpixel Motion Estimation Without Interpolation” referenced in the above description of the prior art. The alignment is based on Taylor approximations.

First, the alternate frame is aligned to the reference frame based on the estimated global pixel-level motion vector MV. Let B alternate (i,j) and B anchor (i.,j) denote the (i,j)-th block in the alternate and anchor frames, respectively, after global alignment, and (i,j) e Ω MV . Subpixel-level motion estimation of the aligned alternate frames can then be calculated as follows:

1 . For each B alternate (i,j) and B anchor (i,j) block pair, a subpixel-level forward motion vector, [u subpixel (i,j),v subpixel (i,j)], can be calculated as the solution of the following system of equations

2. To derive a global subpixel-level motion estimate for the alternate frame, a global, robust subpixel-level motion vector MV can be calculated from the subpixel-level motion vectors MVs in Ω MV . One possible robust estimate is the median of all the subpixel-level motion vectors MVs, as shown below. The result of this operation is a global subpixel-level motion vector MV, denoted by

The total global subpixel-level motion vector can then be calculated as follows

After the global subpixel-level motion vector MV has been estimated, the alternate frame FALT_κ can be aligned to the anchor frame FREF. The alignment error signal is expected to be high in areas where the motion vectors MVs were not reliable (i.e. , and that is a very useful information (coupled with ) for subsequent multi-frame processing.

At this point, for illustration purposes of the steps of the motion estimation pipeline, one example of estimated motion vector fields is presented in the Figures 4 to 6 for an alternate frame having a ground-truth global displacement with respect to the anchor frame of ΔX = -0.25 and ΔY = 4.

Local variations are present in the scene because of an object, e.g. as in the captured image a car, moving on the right, along with its shadow, as well as tree leaves motion and other reflections and small variations in the scene.

Figure 4 presents the estimated motion vector field with pixel-level accuracy in horizontal X- and vertical Y-direction of the frame without any reliability check. This is the result of the block motion estimation with L x = L y = WL X = WL y = 32 and the search size P = 6 pixel. There is a high irregularity of the motion vectors in some areas, e.g. in the X,Y-Section of about (40, 5) to (55, 15).

Figure 5 presents the estimated weighted motion vector field with pixel-level accuracy component in horizontal X- and vertical Y-direction of the frame. The unreliable motion vectors are disregarded. The reliable local motion vectors MVs are shown after excluding the motion vectors MVs whose total reliability factor is zero. The calculation of the global pixel-level motion vector MV for the whole frame results in ΔX = 0 and ΔY = 4.

Figure 6 presents the estimated weighted motion vector field with subpixel-level accuracy in horizontal X- and vertical Y-direction of the frame. Again, the unreliable motion vectors are disregarded. Subpixel-level motion vectors MVs are added to the pixel-level motion vectors MVs in Figure 5 to get the total subpixel-level motion vectors MVs. The calculation of the global subpixel-level motion vector MV for the frame results in ΔX = -0.25 and ΔY = 4.

The aforementioned exemplary embodiment is based on translational motion models. However, the method is not limited to the translational motion model, and can be adapted to other higher-order models, in particular non-translational motion models.

To elaborate this aspect, let’s assume an affine model is chosen for the global motion. The affine model could then be represented as

The parameters t x and t y capture the two-dimensional translation in the x and y directions, respectively. The rotation is captured by the parameters a 1 , a 2 , a 3 and a 4 , wherein the zooming is captured by the zooming-parameters a 1 and a 4 , and wherein shear is captured by the shear-parameters a 2 and a 3 . There are various ways one could derive the affine motion model six parameters from the already calculated translational motion vectors. One exemplary approach is explained in the following.

1 . Instead of aligning the alternate frame globally based on the calculated global pixel-level motion vector MV, block-wise alignment is pursued based on the local pixel-level motion vectors MVs, followed by block-wise subpixel-level estimation of the motion vectors MVs. This results in a new subpixel-level motion vector field.

2. The reliability of the calculated motion vectors is calculated in a similar manner (employing proper thresholds for the rules #1 to #3) in order to identify the reliable motion vectors MVs in the motion vector field achieved in the above mentioned step 1 .

3. Let the number of reliable motion vectors MVs be Q, wherein one motion vector is assigned to a respective block and vice versa one block is related to exactly one motion vector. For those Q blocks, this system of equations could be formed:

Where and A is stack of a number Q of 2x6 matrices, and each of those matrices has entries b is a stack of Q 2x1 matrices, and each of those matrices has entries

Solving for X by calculating the pseudo-inverse of A, we get the global affine motion model parameters as

It is worth mentioning at this point that the motion model is valid for global motions and is not fully accurate for objects that are not in the same distance to the imaging system. If the information of some motion-related auxiliary sensors, such as an accelerometer/a gyroscope and of depth sensors, is available for each of the anchor and alternate frames, the motion estimation process could be supported by such information, and the motion vector quality could be further improved.

Figure 7 shows a schematic diagram of the method adapted to higher-order motion models supporting non-translational transformations, such as the similarity, affine and projective (2D homography) transformations described in R. Hartley and A. Zisserman, “Multiple View Geometry in Computer Vision”. University Press, Cambridge, 2004 referenced in the first part of the description related to the prior art. It should be noted that the method shown in Figure 7 also supports the translational motion model, as described later.

As depicted in the Figure 7, after the estimation of the pixel-level motion vector field and the motion vectors MVs reliability map, subpixel refinement is performed on a block basis between the reference and anchor frames/frame ROIs, without any explicit intermediate alignment. The result is a subpixel-level motion vector MV field which is then transformed into a list of 2D correspondences. The correspondences are then used to estimate the global motion model parameters. Once the motion parameters are calculated, alignment of the frames can be performed.

Steps a) to d) corresponds to the exemplary method described above with reference to Figure 2. Former step e) is modified denoted step i) and is designed for Block- Based Subpixel-Level Refinement BB-ME by use of the reliability factors derived in step d) and the interpolated and smoothed low-pass filtered raw CFA data of the alternate frame or region of interest ROI of an alternate frame derived in steps b1 ), b2), ... bN).

This is followed by step j) designed for Two-Dimensional Correspondences and by step k) designed for Transformation Matrix Calculation. The result of step k) are estimated Motion Parameters.

As described earlier, the output of step a), i.e. divided steps aO), a1 ), a2, ... , aN), is C r and C a for each frame or region of interest ROI of a frame. The output of step b), i.e. the divided steps bO), b1), b2), ... , bN), is and for each related C r and C a .The output of step c) is the forward motion vector field 1, 2 ...,M and j = 1, 2, ...,N and the backward motion vector field i = 1, 2 ... , M and j = 1, 2, ... , N. The output from step d) is the M x N binary motion vector reliability map,

In this step i), block-based subpixel refinement is pursued only for the reliable motion vectors. No explicit global intermediate alignment is performed. Hence, for each block B r in the reference frame, the matching block B a in the alternate frame is identified based on the motion vector MV . Then for each block pair, the subpixel motion vector is calculated by solving the system of equations in section (3.7). The result is a subpixel-level forward motion vector field . Adding the motion vector fields and { , the total subpixel-level motion vector field is obtained, where

As mentioned earlier, Ω R denotes the collection of reliable motion vectors. Let the number of reliable motion vectors in Ω R be denoted by Q. For the reliable motion vectors Q pairs of correspondences, where k =

1, 2, ... , Q, are constructed as follows: where each source point [S x ,S y ] is defined as and the corresponding destination point [S x ,S y ] is defined as

The implicit assumption here of course is that the reference frame/frame ROI is matched towards the alternate (current) frame/frame ROI.

The set of 2D correspondences, {Ψ}, is then used for estimating the parameters of the underlying global motion model as described in what follows.

Given the constructed pairs of correspondences for the reference and alternate frames/frame ROIs, {Ψ}, the motion model parameters are estimated via direct linear transform (DLT). Direct Linear Transform is described in more detail in G. H. Golub and C. F. Van Loan, Matrix Computations, second edition, 1989 and R. Hartley and A. Zisserman, “Multiple View Geometry in Computer Vision”. University Press, Cambridge, 2004. The RANdom SAmple Consensus (RANSAC) algorithm described in M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, 24(6):381-395, 1981 is pursued, where in each RANSAC iteration, DLT algebra system is formed from eight (8) randomly selected pairs of correspondences and the transformation capturing the motion model parameter is estimated by singular value decomposition (SVD). This is described in more detail in G. H. Golub and C. F. Van Loan, Matrix Computations, second edition, 1989.

The 2D transformation of homogeneous image point coordinates from the source to the destination can be defined as where the 3x3 transformation matrix T motion captures the motion model parameters and

In the following, different motion models are described.

1 . Pure Translational Motion Model:

For a given pair of correspondences, the translational motion model can be represented by where t x and t y represent the translation in the x and y directions, respectively. The translational model parameters (t x , t y ) can be estimated from the set of Q correspondence pairs as

Averaging is pursued here, since only the reliable motion vectors are included in the construction of reliable correspondence pairs. The average in the example is the mean value, and in particular the arithmetic mean value. However, the median, square or cubic mean value, logarithmic mean value, trimmed or weighted mean values and the like are applicable. 2. Similarity Transformation Motion Model: (Translation + Rotation + Scaling)

For a given pair of correspondences, the similarity transformation motion model can be represented by where 0 is the angle of rotation, and s is the scaling factor of the x/y coordinates. The corresponding DLT system of equations can be defined as which in terms of the pairs of correspondences and motion model parameters is written as

The unknown motion model parameters vector, X 4x1 , can be estimated by solving the DLT system with SVD:

3. Rigid Transformation Motion Model: (Translation + Rotation)

The estimation is almost the same as the process in similarity transformation, and the only difference is an additional constraint to make the transformation scale invariant:

4. Affine Transformation Motion Model: (Translation + Rotation + Scaling + Shear) For a given pair of correspondences, the affine transformation motion model can be represented by where t x and t y represent the translation in the x and y directions, respectively. And the parameters a, b, c and d are the combination of the scaling, rotation, and shear. The corresponding DLT system of equations can be defined as b 2 Q x 1 = Α 2 Q x6 ▪ X 6x 1 which in terms of the pairs of correspondences and motion model parameters is written as wherein The unknown motion model parameters vector X 6x1 can be estimated by solving the DLT system with SVD:

5. Projective (2D Homoqraphy) Transformation Motion Model:

For a given pair of correspondences, the projective (2D homography) transformation motion model can be represented by

The DLT system of equations can be defined as

Α 2Q x9 ▪ X 9x 1 = 0 2Q x 1 which in terms of the pairs of correspondences and motion model parameters is written as

The unknown motion model parameters vector X 9x1 can be estimated by solving the DLT system with SVD, where the last column of V represents X 9x1 as shown below.