METHOD OF IDENTIFYING LINEAR FEATURES WITHIN AN IMAGE

Title:

METHOD OF IDENTIFYING LINEAR FEATURES WITHIN AN IMAGE

Document Type and Number:

WIPO Patent Application WO/2008/003985

Kind Code:

Abstract:

A method of identifying linear features within an image users a multi-scale transform such as a dual-tree complex wavelet transform. Inter-level and same level products are calculated, and the existence of a feature within the image is identified at a location of locally high and coherent inter-level product magnitude. The feature is then extended from that location in a direction determined by the local same level product phase, to generate a line segment.

Inventors:

KINGSBURY NICHOLAS GEOFFREY (GB)
ANDERSON RYAN ALEXANDER (CA)

Application Number:

PCT/GB2007/002549

Publication Date:

January 10, 2008

Filing Date:

July 09, 2007

Export Citation:

Click for automatic bibliography generation Help

Assignee:

CAMBRIDGE ENTPR LTD (GB)
KINGSBURY NICHOLAS GEOFFREY (GB)
ANDERSON RYAN ALEXANDER (CA)

International Classes:

G06T7/00

Other References:

RYAN ANDERSON, NICK KINGSBURY: "Robust image features from complex wavelet phases" 10 February 2006 (2006-02-10), BRITISH MACHINE VISION ASSOCIATION , XP002482913 Retrieved from the Internet: URL:http://www.bmva.ac.uk/meetings/06/01mar/AndersonCambridge.pdf> [retrieved on 2008-06-04] the whole document

Attorney, Agent or Firm:

KILBURN & STRODE (London WC1R 4PJ, GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method of identifying linear features within an image, comprising: a) applying a multi-scale, multi-directional transform to the image to generate a plurality of scale-related transform levels each having a plurality of transform coefficients defining magnitude and phase; b) calculating an interlevel product (ILP) of the coefficients on a first level and a second level; c) calculating a same-level product (SLP) of the coefficients on the first level and respective adjacent coefficients on the same level; d) identifying the existence of an elongate feature within the image at a location of locally high and coherent ILP magnitude; and e) extending the linear feature from the said location in a direction determined by the local SLP phase, to generate a line segment.

2. A method as claimed in claim 1 in which a directional filter is applied to the ILP before the said identifying step.

3. A method as claimed in claim 1 or claim 2 in which the line segment is stored as a gaussian cluster representation.

4. A method as claimed in claim 1 or claim 2 in which the line segment is stored as a spline representation.

5. A method as claimed in claim 1 or claim 2 in which the line segment is stored in an interpolated ILP coefficient representation.

6. A method as claimed in any one of the preceding claims in which the transform is a dual-tree complex wavelet transform (DT CWT).

7. A method as claimed in claim 6 in which the SLP phase estimate of the location of a new point on the extending linear feature is corrected or replaced by an estimate from the DT CWT phase or magnitude at the said new point.

8. A method as claimed in any one of the preceding claims including generating first and second line segments and, where these are aligned, combining them into a common segment.

9. A method as claimed in any of the preceding claims including setting one or more thresholds to exclude short or weak features.

10. A method as claimed in any one of the preceding claims including adjusting the ILPs prior to the identifying step to bias the identification in favour of long features over high-contrast features.

11. A method as claimed in any one of the preceding claims including characterising the line segment by a context descriptor which includes characteristics of neighbouring areas within the image.

12. A method as claimed in any one of claims 1 to 10 in which the context descriptor includes characteristics of neighbouring line segments within the image.

13. A method as claimed in claim 12 in which the characteristics of the neighbouring line segments include relative direction and ILP phase.

14. A method of matching linear features in target and search images, the method comprising:

1) generating line segments for the target and search images according to the method of claim 1 ;

2) characterising said line segments by context descriptors which include characteristics of respective neighbouring areas within the images; and

3) comparing the context descriptor to determine a match or non- match between a line segment in the target image and a line segment in the search image.

15. A method of matching linear features in target and search images, the method comprising:

1) generating line segments for the target and search images according to the method of claim 1 ;

2) representing the ILP coefficients for the target and search images as respective Gaussian cluster distributions;

3) calculating a similarity measure between said distributions, said measure including orientation and ILP phase; and 4) determining a match or non-match between a line segment in the target image and a line segment in the search image in dependence upon the similarity measure.

16. A method of matching linear features in target and search images, the method comprising: 1) generating line segments for the target and search images according to the method of claim 1 ;

2) hashing a representation of the line segments of the target image to create a hash table in which ILP phase is used as a hashing parameter; and

3) generating potential matches by comparing transformed search image line segments with entries in the hash table.

17. A method as claimed in claim 16 in which the hashing includes the step of orthonormalizing the representations so that the hash table entries are affine-invariant.

Description:

METHOD OF IDENTIFYING LINEAR FEATURES WITHIN AN

IMAGE

The present invention relates to methods of identifying features within an image, and particularly although not exclusively to a method of representing a feature such as an edge with a small number of parameters.

The invention has particular though not exclusive application in the field of computer vision, where the need often arises to determine whether or not a particular defined object is present within a given image or sequence of images. A typical example might be to determine how many cars are shown in a particular photograph, or to decide whether a video clip shows a particular named individual.

While the outputs of these questions are very simple, both the input information and background knowledge required to make such decisions are large and complex. Thus, to make fast decisions from input image data, it is desirable to reduce the quantity of the data as much and as early as possible. The complex decision rules are applied to a minimal set of information that adequately represents all of the interesting features in the original image. These interesting features are not usually at the pixel level; they are much coarser. Therefore, if we process low-resolution versions of an image -say, a 64x48 version of a 640x480 image - we can first quickly identify the coarse areas of the image that contain interesting content. Then, we can "drill down" to finer levels of resolution wherever necessary to find finer image details. For instance, we can quickly find a car in an image with very coarse information, localize where we expect the licence plate to be, and then use fine, pixel-level information in this region only to read the letters and numbers on the licence plate.

Prior work by the present inventors in this field includes published PCT Patent Application No. WO-A-2006/064239, the teaching of which is to be considered as embedded within the present application. The entire teaching of that earlier application is hereby incorporated by reference.

WO-A-2006/064239 describes a method of identifying features within a data set, such as an image, by making use of the properties of a multi-scale transform such as the Dual-Tree Complex Wavelet Transform (DT CWT), as described in Kingsbury, NG: "Image Processing With Complex Wavelets", Phil. Trans. Royal Society London, 1999. The patent application describes the InterLevel Product (ILP), a concept also used in the present application. The earlier application also describes the Inter Coefficient Product (ICP), which will be referred to below by its more recent synonymous name of Same Level Product (SLP).

Briefly, both the ILP and the SLP comprise the steps of: a) applying a multi-scale transform to an image to generate a plurality of scale related transform levels, each having a plurality of transform coefficients defining magnitude and phase; and

b) multiplying a first coefficient associated with a first location with the complex conjugates of a second coefficient associated with a second location.

For the ILP, the first and second coefficients are on different levels of the transform. For the SLP, the first and second coefficients will typically be adjacent coefficients on the same level.

It is to be understood that, as described in WO-A-2006/064239, some preprocessing may be necessary in order to allow an ILP to be taken across all coefficients on to separate levels. For example, to ensure there is only a 1:1 equivalence, the coefficients on one level may need to be up sampled and interpolated. The phase angles may also need to be adjusted (for example doubled, flipped, and/or conjugated to complete the correspondence. Typically, the procedure will be to sample the coarser level coefficients to that of the final level, by interpolating complex phase and magnitude separately. Then, the complex conjugate of the result is multiplied by the corresponding coefficient of the finer level to create the ILP.

Although it is mathematically convenient, it is not essential for coefficients to be represented in complex form. It would equally be possible, and of course would be mathematically equivalent, to perform calculations on suitable pairs of real numbers instead. For example, instead of determining the phase of the ILP by multiplying coefficients with complex conjugates of coefficients on a different level, one could instead simply subtract the corresponding (real) phase angles.

Instead of using complex wavelet transforms, one could use suitable Gabor transforms (or indeed other suitable transforms) in phase quadrature here.

Further details of ILP may be found in R. Anderson, N. Kingsbury, and J. Fauqueur, "Coarse level object recognition using interlevel products of complex wavelets," in International Conference on Image Processing (ICIP), September 2005: http://www-sigproc.eng.cam.ac.uk/~ngk/anderson ILP.pdf

Further details of the SLP may be found in R. Anderson, N. Kingsbury, and J. Fauqueur. "Determining multiscale image feature angles from complex wavelet

phases", in International Conference on Image Analysis and Recognition (ICIAR), September 2005: http://www.sigproc.eng.cam.ac.uk/~ngk/anderson ICP.pdf

Work by others in the general field of feature identification includes the following:

A Geometric Hidden Markov Tree Wavelet Model. J. Romberg, M. Wakin, H. Choi, R. Baraniuk, A Geometric Hidden Markov Tree Wavelet Model, International Symposium on Optical Science and Technology, San Diego, CA, (August 2003). This paper maps a relationship between complex wavelet magnitudes/phases and edge feature angles/offsets. However, it assures that every feature is a simple step edge.

Kovesi Feature Extraction. P. Kovesi, Image Features from Phase Congruency Videre: J. Computer Vision Research, vol. 1, no. 3, 1999. Kovesi's work focuses upon identifying areas of maximum phase congruence and then linking these together. An edge feature based on Kovesi's maximum phase congruence is pursued and compared in US Patent Application No. US-A-2004/0047498, described below.

Gaussian Cluster Registration. B. Jian, B. Vemuri, J. Marroquin. Robust Nonrigid Multimodal Image Registration Using Local Frequency Maps. Information Processing in Medical Imaging 2005, Glenwood Springs, USA, July 2005. Bing Jian uses a mixture-of-Gaussian model to represent the registration error of a local phase matching algorithm.

Bing Jian and Baba C. Vemuri. A robust algorithm for point set registration using mixture of Gaussians. In Tenth IEEE International Conference on Computer Vision (ICCV'05), volume 2, pages 1246-1251, 2005.

US Patent Application No. US-A-2004/0047498 discloses a further method for detecting features in images.

According to the present invention there is provided a method of identifying linear features within an image, comprising: a) applying a multi-scale transform to the image to generate a plurality of scale-related transform levels each having a plurality of transform coefficients defining magnitude and phase;

b) calculating an interlevel product (ILP) of the coefficients on a first level and a second level;

c) calculating a same-level product (SLP) of the coefficients on the first level and respective adjacent coefficients on the same level;

d) identifying the existence of an elongate (eg a linear or approximately linear) feature within the image at a location of locally high and coherent ILP magnitude; and

e) extending the linear feature from the said location in a direction determined by the local SLP phase, to generate a line segment.

According to a further aspect of the invention there is provided a method of matching linear features in target and search images, the method comprising: 1. generating line segments for the target and search images according to the method of claim 1 ;

2. characterising said line segments by context descriptors which include characteristics of respective neighbouring areas within the images; and

3. comparing the context descriptor to determine a match or non-match between a line segment in the target image and a line segment in the search image.

Preferably, the multi-scale transform is a Dual-Tree Complex Wavelet Transform. The ILP and SLP calculations may be carried out by any convenient means, including representing the coefficients in complex form, and multiplying each coefficient by its corresponding complex conjugate. Alternatively, the necessary operations may be carried out on pairs of real numbers. It is specifically envisaged that other mathematically equivalent techniques may be used, and accordingly the expressions ILP and SLP are to be understood as encompassing both the procedures described and all such mathematical equivalents.

The invention may be carried into practice in a number ways and one specific embodiment will now be described, by way of example, with reference to the accompanying drawings, in which:

Figure 1 shows an exemplary image, with ILP coefficients, both before and after filtering;

Figure 2 shows details of an individual feature (cluster), indicating the interpolated locations and their relation to a nearby ILP coefficient;

Figure 3 illustrates the creation of an edge profile cluster by line growing;

Figure 4 shows a sample arrangement of bins around an edge profile cluster that contributes to its cluster context descriptor;

Figure 5 is a schematic illustration of the main steps involved in creating cluster context descriptors; and

Figure 6 illustrates the concept of transforming EPC parameters to an orthonormal basisv

0 Introduction

The preferred embodiment of the present invention, described in detail below, introduces the concept of Edge-Profile Clusters (EPC). Briefly, EPCs are a way of robustly representing edge like entities within an image, using a small number of parameters. EPCs provide a convenient method for matching two images, for example where one is an affine transformed version of the other.

1 Motivation

The dominant visual features of an image are typically edges and lines that are high-contrast, spatially broad, and span multiple scales. If one were to (skillfully) sketch an artistic drawing of an image, one would typically focus upon representing these features. Edge Detection has been one of the cornerstones of computer vision and image processing, and is typically performed by seeking image areas where pixel gradients are high, and then linking these areas together. This process is somewhat subjective and unstable, as one needs to set arbitrary thresholds at several stages in the process. In particular, textured regions can cause several phantom edges to appear.

With the Interlevel Product (ILP), we have a new, more perceptual method for finding an edge: we consider an edge to be a region with a consistent symmetry profile. In particular, such a definition would suppress textured regions, which may have high pixel gradients, but do not have perceptually coherent edges.

We can find regions with consistent symmetry profiles by seeking coherent patterns in the ILP and SLP values. We can then represent these regions with clusters, defined by a minimal number of parameters to code their position, shape, overall ILP/SLP profiles, and "weight". In this case, weight refers to a cluster's perceptual importance, and is determined by the length and contrast of the edge. We call these clusters "Edge Profile Clusters" (EPC).

By retaining a top proportion of EPC Clusters only, we are able to extract the key edge features in an image with a minimal number of parameters. We can then use these parameters for various image processing activities, such as object matching.

We describe methods for creating and for matching EPC Clusters below.

2 Creating Edge-Profile Clusters

At edge locations, ILP phases are not only informative; they are also stable to several linear and non-linear transformations. Edge and ridge objects of interest occur where ILP coefficients, within the same subband, possess the following two qualities:

• Large magnitude, indicating that activity is present; and

• Spatial adjacency to a number of coefficients with approximately the same complex phase (coherent ILP coefficients), implying that the same, dominant feature is influencing all coefficients.

2.1 Complex blur and line-growing

It will be assumed in the following discussion that we have already transformed a target image T into two pyramids of coefficients, ILP (denoted as χ(T)) and SLP (denoted as ψ(T)). The following factors are the critical parameters we want to generate for each cluster:

• Feature Type: θ _c . A feature is either a pure ridge, a pure edge, or a combination of the two ("combination" is a loosely defined term that includes curvy features, half-ridges, noisy edges, etc.) The feature type is represented by the mean complex angle, θ _c , of the ILP coefficients that comprise the entity.

• Feature Location: μ _c . The 2-D location of the feature is defined relative to other features in its level of the multi-scale hierarchy. The location of the midpoint of the cluster may be recorded, and/or the endpoints. • Feature Shape: ∑ _c . The shape parameter summarizes the size, orientation, and spatial distribution of the entity. ∑ _c may be the covariance matrices of a Gaussian, or a set of points, or spline knots.

• Feature Saliency: α _c . The saliency of the feature refers to the perceptual importance of the feature, and is determined by both the size and contrast of the feature. It is typically calculated from the sums of the comprising ILP coefficients.

In the preferred method, we build EPC clusters around ILP maxima, and "region grow" in the direction of a line, assuming that the phase coherence is being caused by a linear feature.

The main idea of this method is to start at an ILP coefficient that is a local spatial maximum in magnitude, and build a representation of the edge in the direction of the SLP phases (which are expected to be tangential to the edge being represented).

For this method to work reliably, we will preferably wish to precondition the ILP data to smooth out spurious responses. To do so, we apply a long, thin smoothing filter whose major axis is oriented in the direction of the subband, with a major axis approximately five times the length of the minor axis. This filter has two purposes: it reduces ILP phase fluctuations along an edge (possibly cause early termination of the growing process); and, perhaps more importantly, it attenuates the magnitudes of the ILP coefficients that have contradictory phase with its neighbours, while augmenting the magnitudes of coherent ILP regions. In the original image, this amounts to a suppression of textured regions and an emphasis upon strong edges. One can see the before- and-after of ILP smoothing in Figure 1.

We build the edge in both directions by adding more coefficients to the original maximum coefficient, for each cluster. Typically, since an SLP direction will point at some intermediate point between the next two ILP values on the grid, we interpolate the ILP and SLP values of this point from these two nearest ILP values. This is shown in figure 2 which illustrates an individual cluster, showing the interpolated locations and their relation to nearby ILP coefficients.

We keep adding points to the cluster in this manner until one of the following termination criteria is satisfied:

• The magnitude of the new interpolated ILP coefficient has dropped below some fixed percentage of the maximum (say, 20%), or the entire ensemble of points has passed some variance-of-magnitude threshold (edge has become weak, or stopped).

• The phase of the new interpolated ILP coefficient has passed a variance of- phase threshold (say, π/2) for the ensemble (edge has changed profile).

• The SLP direction has deviated some fixed angle out of the directional subband -say, for the 15° subband, the SLP has indicated a direction of 40° to the next interpolated coefficient (edge has left this direction of analysis).

• The new interpolated ILP coefficient has come too close to an existing EPC cluster that has already been calculated in this directional subband (edge is already identified).

Note that a hysteresis window may be applied to any of these thresholds to reduce unnecessary splitting of a coherent edge.

The end product of this clustering process is a series of points, with ILP/SLP values, that form a line segment at a given level. Figure 3 shows examples of a few clusters at Level 2, in the 15° subband. Within a given region, a maximum ILP coefficient is selected, as indicated by the open circles 10, 20, 30. Lines 12, 22, 32 are then grown from those prints, in the directions indicated by the local SLP phase, until one of the above termination criteria is met.

If desired, parent or child ILP coefficients may also be clustered at these locations, provided that these coefficients (or some subset of them) can also collectively satisfy the termination criteria above. While we could analyze coarser/finer (parent/child) levels of the ILP pyramid independently, we can

improve localization and create tools to quickly interpolate ILP across scale by linking these parent/child coefficients to those at the main level of analysis at this stage.

We have multiple options to represent this line segment, listed in order of decreasing complexity:

1. Interpolated ILP Coefficients. The raw outputs of the clustering process, this representation is the most informative of the data formats, but affine transformations and comparisons of clusters in this format could still prove data intensive.

2. Spline Representation. A spline could be fit to the line segment. This can reduce the amount of data required for line segment representation while preserving rudimentary contour representation.

3. Gaussian Cluster Representation. A Gaussian representation loses more contour detail, but has the benefit of minimal parameter representation and access to a wide suite of tools that can be used for fast transformation and registration.

2.2 Additional Clustering Considerations

The following activities may accompany the clustering process to generate sim- pier, more accurate, or more intuitive results.

2.2.1 Refining EPC locations with DT CWT Phase

In the Line Growing method, the SLP information guiding the direction of our line growth is a differential operator that relies upon the accuracy of the previous coefficient location; thus, errors can tend to accrue over long clusters, and the edge position estimate may tend to drift in a "dead reckoning" manner.

The phase of an SLP coefficient indicates orientation. However, the phase of the original DT CWT coefficient is determined by a combination of both feature type and feature offset; thus, its value is ambiguous. However, since we can infer the feature type from the ILP phase, we can therefore disambiguate the DT CWT phase response and determine a better estimate of the offset of the original feature, that does not rely upon the accuracy of previous coefficient estimates. This local estimate of the offset can be used to correct our SLP estimates of each new interpolated ILP coefficient, in a linear combination or Kalman filter implementation.

2.2.2 Linking Adjacent Edges

One of the advantages of EPC clusters over traditional edge detection is its robust representation of an edge, that does not require edge linking of pixels. However, EPC clusters may still arbitrarily break up a coherent edge because it changes direction or momentarily changes ILP phase. In these situations, one may still employ edge-linking algorithms to join up two collinear EPCs into a single entity. Such linking can reduce the amount of data used for object representation and make individual clusters more reliable across image trans- formations.

2.2.3 Cluster Ranking

EPC clusters can become arbitrarily small - one can continue to find smaller and smaller clusters until one finally gets to two-and one-coefficient clusters, which would normally be useless. If one uses and analyzes clusters this small, one may lose any significant computational advantage.

Thus, in the interests of processing efficiency, an arbitrary threshold may be established to either avoid calculation of small clusters (by setting a minimum ILP magnitude maxima to investigate) or removing the least important clusters after calculation (by eliminating a bottom proportion of α _c -weighted clusters). Such thresholding may speed up processing.

2.2.4 ILP Magnitude Saturation

Our ranking system for clusters assigns a value to each cluster that is propor- tional to a) the number of coefficients in the ensemble (i.e. the length of the cluster), and b) the magnitudes of the coefficients in the cluster (determined by the contrast of the edge).

However, the relationship between these two contributors need not be fixed. Consider the following example: the two strongest edges in an image have equivalent weights, because one has length 2x and contrast y, but the other has length x and contrast 2y. Should they be considered the same weight?

From empirical observation, human perception of images does not weight the difference in contrast as much as the difference in length. Furthermore, a long string of coherent ILP coefficients is more unique than a short string of strong

values. Thus, we have an interest in biasing our weighting system to favor long edges over high-contrast edges.

We do so by saturating the magnitudes of ILP coefficients. If we histogram the magnitudes of all the ILP coefficients in an image, and then saturate the top 20%, we limit their ability to increase their weighting versus longer edges with less contrast. Other non-linear laws, such as a logarithmic law, would also be suitable for this purpose.

3 Edge-Profile Cluster Matching

Edge-Profile Clusters, if considered individually, have one attribute that is relatively invariant to most transformations and reasonable degradations: the ILP phase, θ _c . This invariance stems from the fact that a line or an edge, no matter how it is affine transformed, remains a line or an edge respectively. However, this identity and corresponding scalar phase value is far from unique. We must include the spatial context of the edge clusters to assist us with our match. Algorithm 1, below, describes a process that this section explores in further detail.

If we treat the EPCs as Gaussian clusters in both the target and search images, we can estimate a homography (rotation, translation, scaling, skew) that transforms the constellation of clusters in the former, to the constellation in the latter. There exists a closed-form expression to calculate the L2 distance or error -between the search image clusters and the transformed target clusters.

Algorithm 1 An affine-invariant matching algorithm using Edge-Profile Clusters.

1. ILP/SLP transform the target and search images.

2. Find the major EPCs for any or all scale levels of interest, for both images.

3. Build Cluster Context (CC) Descriptors for both images, from major EPCs.

4. Look for potential matches in the search image for at least four of the highest-weight CC Descriptors in the target image. One may look across multiple scales if large scale changes are expected.

5. Calculate the homography that would make this match. If the homography is infeasible, return to step 4.

6. Use the homography to transform all of the EPCs (in Gaussian representation) and determine if they align correctly. If they do not align beyond some threshold, return to step 4 and continue pursuing other potential matches.

7. If a good match is found, iteratively improve the alignment by gradient descent.

3.1 Finding correspondences: Cluster Context descriptor

To find the correspondences described in Step 4 of the above algorithm, we develop a shape descriptor that defines each EPC in terms of its spatial (and potentially scalar) neighbours. This shape descriptor is partially inspired by SIFT [David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91-110, 2004], which calculates gradients in a grayscale patch around the object of interest; it is also partly inspired by the shape descriptor of Belongie et al. [Jitendra Malik Serge J. Belongie and Jan Puzicha, Shape matching and object recognition using shape contexts. Technical Report UCB/CSD-01-1128, EECS Department, University of California, Berkeley, 2001], which defines each object solely by its relative position to the other objects in the image. Our shape descriptor for a particular cluster (herein called the "primary cluster") creates histogram bins as shown in

Figure 4 to accumulate nearby clusters. There are three spatial bins 102: one for the primary cluster 100 itself and any collinear ILP coefficients (within +/-15°, say) and one on either side of the cluster. Within each side bin, we also split the coefficients into a 2-D grid of sub-bins by a) their relative direction as compared to the primary cluster (three directional sub-bins 104 are shown in Figure 4), and b) their ILP phase (four sub-bins 106 shown).

After accumulating nearby clusters into these bins, we apply a smoothing function across the directional/phase subbins to make them more robust, and normalize them. We name the resulting descriptor a Cluster Context (CC) descriptor.

CC descriptors are expected to be strongly invariant to rotation, translation, and brightness/contrast changes; we also expect moderate invariance to scale

(within a single level of scale, which is all we need), affine skew, and noise.

We can use a typical L ₂ distance, Earthmover's distance, relative entropy, or other basic measure to compare histograms. Candidate EPC clusters with CC descriptors that are similar by one of these metrics are candidate matches with which one can estimate a homography. If one expects to match objects in conditions of a) heavy skew or perspective distortion, or b) changes in 3-D illumination (i.e. changes in shadows), one can sum marginals across directional bins or ILP phase bins respectively to reduce sensitivity to these parameters.

The main purpose of the CC descriptor is to find four possibly matching points, estimate a homography, verify if it's feasible (i.e. the homography matrix is non-singular), and quickly verify or reject it by the Gaussian registration

methods discussed in the previous section. Several false matches may be introduced at the CC comparison stage and quickly rejected.

4 Summary of Procedure

As an aid to understanding, it may be helpful to provide a final brief summary of the various steps, set out above, in the preferred embodiment. Figure 5 shows these steps in simplified schematic form.

• Step 1 (reference numeral 50)

We start initially with an input image to be analysed.

• Step 2 (reference numeral 52)

A multi-scale transform, preferably a DT CWT, is applied to the image. The output of this step will typically be a decimated complex wavelet transform.

• Step 3 (reference numerals 54 and 56)

Both ILP and SLP transforms are applied to the output of the previous step.

• Step 4 (reference numeral 58)

A filter is applied to the ILP transform of the preceding step to effect an asymmetric blurring.

• Step 5 (reference numeral 60) An appropriate scale of analysis is selected, in a variety of orientations. In the preferred embodiment, six separate orientations are used, each spaced apart by approximately 30°.

• Step 6 (reference numeral 62)

Local maxima are identified within the blurred ILP coefficients, as shown in figure 3.

• Step 7 (reference numeral 64) The SLP phase directions (also shown in figure 3) are used to grow lines in one or both directions from each of the identified maximum ILP coefficients.

• Step 8 (reference numeral 66)

The clusters found in the previous step are analysed, and cluster content descriptors, defining the clusters using a small number of parameters, are generated. Other procedures may also be carried out at this point, including linking adjacent clusters, ranking clusters, and making allowances for ILP magnitude saturation.

Finally (not shown in Figure 5) the cluster content descriptors may be used to match corresponding features of two separate images. Alternatively, a cluster content descriptor may be used to determine whether or not a particular target feature occurs or does not occur in a sample image.

5 Feature-to-feature matching using geometric hashing.

As an alternative to the Edge - Profile Clusters (EPC) described above, geometric hashing may instead be used to carry out matching. Geometric hashing is described in general terms in Haim J. Wolfson and Isidore Rigoutsos. Geometric hashing: An overview. IEEE Computational Science & Engineering, 4(4): 10-21, October 1997. One application is disclosed in JP-A- 2005174062.

In the present embodiment, we make use of the rapid extraction and comparison of sparse features to represent key elements of image content. We

extract Edge-Profile Clusters from both the target and search images and compare their parameters directly.

Our match method is split into two stages: a) estimation of a homography that transforms the target features onto the image features, and b) generation of a metric that rates and/or "fine-tunes" the final set of potential matches.

Feature-to-Feature comparison allows us to evaluate a wider range of transformations in this first stage - while earlier we only match through translation and rotation transformations, here we allow affine transformations between the target and search image.

The recent interest in local features has resulted from the ability to compare them without "context"; that is, without needing to consider the relative positions of features for the main stage of feature comparison.

Features such as the SIFT descriptor have a very rich descriptor that represents local gradients around a 1-D interest point. By contrast, our features are 2-D, with relatively unstable endpoints, and possess a very simple descriptor - an

ILP phase Q =Zχ _c , shape/orientation σ , and a relative weight α _c . These parameters are not sufficient to compare context-free, as several edge objects will be similar. Thus, we shall consider each EPC in the context of its neighbours, with methods discussed in W. E. L. Grimson and T. Lozano-Perez. Recognition and localization of overlapping parts from sparse data in two and three dimensions, Proc. IEEE International Conf. on Robotics and Automation, pages 61-66, 1985.

To do so, we look at the geometric hashing algorithm to quickly encode and compare these neighbour relationships.

5.1 Geometric Hashing of EPCs

Geometric Hashing is a method typically used to do fast matching of point sets. Here we describe an implementation of this concept specific to the particular qualities of EPC features.

There are several reasons why geometric hashing is particularly appropriate for EPC matching:

• Our invariant/covariant EPC parameters, ILP phase and feature orientation respectively, are simple enough to include in a geometric hash along with relative position. • Orientation information for each EPC can be used to reduce the number of EPC features required to make an affine-invariant bin set.

• Our ranking system allows us to restrict hashing to a top proportion of EPC coefficients, with a high probability of overlap between the target and search feature sets.

We will therefore design a variant of the traditional hashing algorithm that will exploit these qualities of the EPC feature set.

J. Beis and D. Lowe, Shape indexing using approximate nearest-neighbor search in high dimensional spaces. Proc. IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, pages 1000-1006, 1997, present arguments to show that k-d trees are superior to geometric hashing tables for moderate-to-high dimension retrieval tasks. It is these methods that are incorporated into the SIFT comparison methods. We do not pursue these methods, partly because the dimensionality does not require it, but also because the number of influential neighbouring EPC features will be variable, with the structure proposed below. This variability in feature vector length of each EPC is easily accommodated by the voting scheme in the hashing structure, but would complicate a k-d tree search.

5.1.1 EPC Hashing Algorithm Step 1: Hashing the Target

We assume that we will hash the top n _h EPC clusters of the target at a given level.

The first step of a hashing algorithm is to create the hash table for the target image. This procedure involves selecting as many subsets of the target feature set as possible, creating an affine transform that orthonormalizes this feature set, and then applying this transform to project the entire target feature set onto this new basis. The resulting transformed features are then binned into a quantized table as "votes" for the feature set that created them. While point hashing requires three points to define an affine-invariant basis, we can build such a basis with just two EPCs. For each EPC pair a and b, we create a new basis by defining a third point at the intersection between the lines extended from the major axes of the two EPCs. The coordinates of this third point ab are

J x _b tanψ _b -x _a tanψ _a +(y _a -y _b ) y _b coty _b -y _i cotψ _i +(x _a -x _b ) λ J ' where

We will set ab to be the origin of our new basis, and a and b will be at (0,1) and (1,0) of this basis. The interior angle of the vectors from ab to a and b will be in the positive quadrant; this implies that the vector clockwise of the interior angle will be on the horizontal axis and the counterclockwise vector will be on the vertical axis (without loss of generality, we will assume hereafter that a is

clockwise and b is counterclockwise). EPC pairs where the interior angle is close to 180 ^° or 0 ^° will be rejected, as they form unreliable, degenerate bases.

At this point, it must be noted that the EPC intersection point will be quite positionally stable, but the (1,0) and (0,1) points, defined by the midpoints of the contributing EPCs, will be less stable - their positions will be dictated by the relatively arbitrary endpoints of the EPCs. As a result, there are scale instabilities in the orthonormalized space in both directions. To reduce this instability, we divide long EPCs, in the target only, into short segments four sampling intervals long prior to hashing. This increases the number of entries in the table; however, this increase would be akin to increasing the number of entries in a database; if properly indexed, the increase should have a negligible impact on performance. The midpoint of a search candidate, therefore, would be no further than two sampling intervals from any of the segmented candidates in the hash table.

Note that this definition makes a cardinal ordering for our EPC pair selection, eliminating the need to check both orderings of a search image pair and halving our computational load.

The affine transformation that creates this new basis (using homogenous coordinates) is simply

For each valid (a,b) basis pair, we apply A _ab to all c=\...n _h EPCs, as shown in Figure 6.

ψ _c ^{ab) = tan -i ^l ~ a , where (5)

Svc SvC (6)

As described previously, we do not expect the affine transform to change the ILP phase of the feature significantly.

Having transformed the features to a canonical basis, we now store the resulting n _h values into an index table. Instead of the typical 2-D spatial table typical to a geometric hash, we use a 4-D table that also quantizes the orientation and ILP phase of each feature. These distinctions will help us to reduce false matches.

Quantization values are a tuning parameter that may need to be adjusted. We find that the bin sizes in Table 1 give satisfactory results.

Table 1: Table of suggested bin sizes for quantizing and indexing the parameters of transformed cluster sets. There are 82 944 bins in total.

For each cluster, an entry (or 'vote') is placed in the appropriate bin indicating which (a,b) basis pair produced it (and potentially which target model T, if one wishes to search simultaneously for multiple targets). Values that fall outside of the spatial bin limits are discarded. By removing these values, we are able to

remove degenerate basis pairs where the EPCs overlap, which produce unstable bases.

More importantly, if the two images have occlusions, or distortions that are not globally affine - perspective, for instance, or a non-linear radial lens distortion - then we can set limits restricting a match region to a size where affine models may be applied locally. Let us say, for instance, that we will restrict basis pairs to be within 15 sampling intervals of one another, and set spatial bins that are no more than 2 units from the origin of the orthorectified plane. This combination will restrict the EPCs of influence to be no larger than 60x60 coefficients, with many significantly smaller. Thus, we implicitly create "local descriptors" in the hash table.

Any matching algorithm must also deal with the instability of having ambiguous orientation polarity for symmetric EPCs. For our implementation, we find entries in the two extreme ILP bins containing EPCs with χ~π/2 and χ~-π/2 and merge the bin-pairs of opposite orientation polarity. For example, two EPCs at the same location, having the same ILP phase χ=π/2 but opposite polarities ψ^π/6, ψ ₂ =7π/6 would be located in the same bin.

To increase robustness, one can also submit multiple votes along the length of the edge, if it spans multiple spatial bins. Increasing these votes may increase calculation time, both when creating and accessing the hash table. In our tests, we submit up to three votes - one for the center location of each EPC, and one for each endpoint location. If any of these three locations lie within the same spatial bin, only one vote is tallied, thus giving moderate preference to longer edges.

As a matter of convenience, we store the affine transformations A _ab as well, indexed by the same identifier (α,b). It can be used for verification as described in section 5.2.

5.1.2 EPC Hashing Algorithm Step 2: Search image query

Having populated the hash table, we now wish to access it using the top m _h

EPCs of a given search image. We follow the standard hash table algorithm at this point: we select a pair of EPCs (c,d) from the search image set as our orthonormal basis, use it to transform all m EPCs, and determine the bins where the transformed features would reside. We then make a list of all of the entries/votes that exist in these bins. If some acceptable proportion of n _h votes occur for any given (α,b) target basis - say, 20% of n _h - then we have a potential match pair in the target for the EPC features (c,d) from the search image. We can then quickly determine the affine transformation between (a,b) and (c,d) by calculating

The search should be repeated multiple times, using several different (c,d) combinations, to generate several different affine hypotheses, which will then be verified using the method in section 5.2.

The affine features c,d chosen from the search image should certainly not be randomly selected. We can improve the chance of a successful match by not only searching the highest α-weighted features, but also features within a given proximity of one another. Assuming that the sampling rate is the same for both image - if not, the difference should be explicitly modelled - then we know that a scaling of the image of more than λ/2 or l/ ^* \/2 will move the feature of interest into another level of scale (although, as the main features are generally multiscale, it does not mean that this feature will disappear from the current scale). This means that we can restrict the search EPC pairs to those that are less than twice the distance of the furthest features forming a basis in the target. One can progress through the larger search image with a sliding-box scheme similar to a template match, seeking reasonable basis pairs with which we

query the hash table. For the scenario where a small target is sought in a much larger search image, this restriction is quite helpful.

Other restrictions are also employed to reduce spurious or unreliable cluster pairs, both for the target indexing and the search querying. We will discard

71 cluster pairs that are within T of one another, as their intersection point will be

too unstable. Cluster pairs where the aspect ratio is more than 8:1 are also discarded; that is, pairs where the distance from one cluster to the intersection point is at least eight times the distance of the other. Such bases are also relatively degenerate and unreliable.

The result of the hash table algorithm, with the above restrictions imposed, will be a series of basis pair correspondences and accompanying affine transformations that have a reasonable likelihood of being the correct transformation. These correspondences can be sorted by total number of votes; however, this ordering often ranks all true and false matches of an "edge-busy" area over a simpler area. Instead, we retain some top number of match results for each query pair, and discard any query responses with less than a given minimum of votes. For our implementation, we retain the six best matches for each query, discarding any matches with less than ten votes.

We must now test these remaining hypotheses more thoroughly, using the methods described below.

5.2 Matching Verification: Complex Mixture Overlap

Geometric hashing provides a fast measure to reduce a large number of affine transformation hypotheses down to a reasonable number. We now wish to apply a metric to the remaining candidates to find the best matches.

The position stability of our method does not compare to that of the affine descriptors in Krystian Mikolajczyk, Tinne Tuytelaars, Cordelia Schmid,

Andrew Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, A comparison of affine region detectors. International Journal of Computer Vision, 65(l/2):43-72, 2005. However, these descriptors require precision so that intricate SIFT-like descriptors can be applied to each affine region and compared across images. By contrast, our invariant descriptors are much simpler (and less distinctive) - a region where ILP phases are congruent and large magnitude. Different sizes and shapes of region, etc., can be tolerated, provided that there is a moderate amount of overlap between an EPC in the search image and the expected location of the affine-transformed target EPC that corresponds to it.

As previously noted, the normalized cross correlation works well with ILP coefficients. In this section, we have nothing but parametric representations of clusters for both transformed-target and search images. For each region comparison, we could reproduce a pair of patches of transformed Gaussian- distributed ILP coefficients and compare them with an NCC; however, it would be computationally preferable to compare the EPC parameters directly.

We have been representing ILP regions throughout this thesis with Gaussians (clusters or constellations) G(μ ,σ ) ; that is, we could consider a region xeX,ye Y of ILP coefficients to be

χ(x,y,d) = ∑ a _c - _Zc -G((x,y) \ μ _c ,σ _c ) (8) cψ _c ed where each subband d will be influenced by the EPCs whose directions are in its angular support.

Other work in this area, using Gaussian Mixture Models to cluster interest points includes Bing Jian and Baba C. Vemuri. A robust algorithm for point set registration using mixture of gaussians. In Tenth IEEE International

Conference on Computer Vision (ICCV'05), volume 2, pages 1246-1251,

2005. This uses a very helpful identity to efficiently produce analytic measures for Gaussian overlaps. To measure the overlap of two multivariate Gaussians, one can calculate

If applied to two Gaussian Mixtures, this formula would be

J∑σ ∑α _λ G(0|// _a -μ _b ,∑ _a +σ _b ) (10)

and can be adapted to support mixture models of the form in equation 8 as follows:

= ∑ ∑a _a a _b n _Za -χlMψ _a -Wl]G(O] μ _a -μ _b ,∑ _a +Ej(H)

where the discretization of the d subbands have been discarded in favour of a cosine-based orientation similarity measure; the similarity of ILP phase is also in this form. One can apply a power to either of these cosine similarity measures to increase their "strictness" on a match score.

Once again, we need to address the orientation polarity ambiguity that result from symmetry. We can simply establish an ILP symmetry threshold, above which the absolute value of the cosine similarity measure is used; that is, instead of using Si[χ -χ _& ]9t[ψ -ψ _& ] in equation 11 above, we can use this

This correction also prevents a double-negative pair of cosine measures (positive vs. negative ridge, with opposite orientation polarities) from incorrectly giving a positive similarity measure.

Equation 11 is a simple cross-correlation. To get our desired normalized cross- correlation, we modify it accordingly:

(13) which is in the well-known form of the NCC equation. Note that the summations of these terms are limited to a locality around the region of interest. Also, the α products and cosine similarity terms can be pre-calculated and low-magnitude terms truncated to reduce the number of Gaussian terms calculated, which reduces computation.

We call this method a Complex Mixture Overlap measurement, which yields a number between -1 and 1 indicating the correlation at a given scale. The contributing clusters may be from the same level as the level used to generate the affine hypotheses, but not necessarily.

The result will be a set of correspondences and homographies that are locally centered at the intersections of the two basis vectors in each image. As with other local descriptor outputs, these correspondences can then be combined with RANSAC or Generalized Hough to calculate an overall, smoothly changing homography that can accommodate perspective changes.

If the ultimate goal of the task is registration, another interesting set of tools is available to efficiently continue the alignment process. Equation 11 is differentiable with respect to the affine transform A _ab<^cd> and if we aim to minimize the L ₂ distance between the search and target patches, we can use a gradient descent to fine-tune the affine parameters with this purpose. One can even fit a nonlinear thin-plate spline (TPS) (F.L. Bookstein. Principal warps:

thin-plate splines and the decomposition of deformations. Pattern Analysis and Machine Intelligence, IEEE Transactions on, l l(6):567-585, 1989) to improve cluster alignment rapidly, before resorting to a pixel or coefficient-based method to complete the alignment (presumably a complex-wavelet based alignment method such as would be appropriate, given that the transform is already calculated such as: Bing Jian and Baba C. Vemuri. A robust algorithm for point set registration using mixture of gaussians. In Tenth IEEE International Conference on Computer Vision (ICCV'05), volume 2, pages 1246-1251, 2005. M. Mellor and M. Brady, Non-rigid Multimodal Image Registration Using Local Phase, Medical Image Computing and Computer- Assisted Intervention-MICCAl, pages 789-796, 2004) gives a good overview into the cluster-based registration process.

Note that one of the biggest advantages of this cluster comparison metric is that the number of times an edge region is segmented is irrelevant, provided that each segment is above the threshold of EPC clusters to process.

Previous Patent: RESORBABLE INSERT

Next Patent: AN ELECTRICAL MACHINE