Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TRACKING METHOD AND APPARATUS
Document Type and Number:
WIPO Patent Application WO/2004/003849
Kind Code:
A1
Abstract:
A method of tracking an object, the method including the steps of: (a) capturing a video sequence of the object comprising a plurality of image frames; (b) detecting a plurality of features within at least an initial image frame of the video sequence; (c) generating one or more hypotheses relating to whether two or more detected features are interconnected to one another by comparing the relative positioning of the two or more features in at least the initial image frame; (d) determining the position of a plurality of features located in subsequent image frames and testing the strength of the hypotheses for the subsequent image frames utilising the determined location of the features.

Inventors:
HEINZMANN JOCHEN (AU)
THOMSEN COLIN (AU)
Application Number:
PCT/AU2003/000794
Publication Date:
January 08, 2004
Filing Date:
June 25, 2003
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SEEING MACHINES PTY LTD (AU)
HEINZMANN JOCHEN (AU)
THOMSEN COLIN (AU)
International Classes:
G06K9/00; G06K9/62; G06T7/20; G06T7/246; (IPC1-7): G06T7/20
Foreign References:
EP0984386A22000-03-08
US5802220A1998-09-01
Other References:
"Detecting faces in images: a survey", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (PAMI), vol. 24, no. 1, January 2002 (2002-01-01), pages 34 - 58
Attorney, Agent or Firm:
Shelston IP. (Sydney NSW 2000, AU)
Download PDF:
Claims:
THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:-
1. A method of adaptively creating a tracking model from a series of visual images, the method comprising iteratively performing the steps of: (a) locating a series of new tracked objects within a current image and adding them to the set of previously tracked objects to form a current set of tracked objects; (b) determining a new series of relationships between objects in the current set of tracked objects, and adding them to the set of previous series of relationships to form a current series of relationships; and (c) assessing the members of the current series of relationships deriving assessed merit values between successive visual images and deleting a predetermined number of relationships from said current series of relationships having low assessed merit values.
2. A method as claimed in claim 1 wherein step (b) further includes utilising the distance between objects in the determination of the relationship value.
3. A method as claimed in claim 1 wherein step (c) further includes as part of said assessment, determining for each member of said current series of relationships, a fitness match with the current image.
4. A method as claimed in claim 1 wherein said assessment includes a measure of the distance between objects in a relationship with the greater the distance, the greater the value of assessment of the relationship.
5. A method as claimed in any previous claim wherein said step (b) further includes modifying existing relationships by adding further tracked objects to the relationship.
6. A method as claimed in any previous claim wherein said step (b) further includes modifying existing relationships by altering the expected distance between tracked objects in the relationship based on the distance between corresponding objects in the visual image.
7. A method as claimed in claim 1 wherein said step (b) further includes the step of allowing members of a relationship to be occluded for a predetermined number of frames.
8. A method of adaptively creating a tracking model from a series of visual images, the method comprising the steps of: tracking objects within the series of visual images to form a series of tracked objects; determining relationships between the tracked objects to form a series of hypotheses; assessing the validity of said hypothesis over said series of visual images; applying a selective pressure to cull said objects and said hypotheses between objects between members of said series.
9. A method of tracking an object, said method including the steps of: (a) capturing a video sequence of the object comprising a plurality of image frames; (b) detecting a plurality of features within at least an initial image frame of the video sequence; (c) generating one or more hypotheses relating to whether two or more detected features are interconnected to one another by comparing the relative positioning of the two or more features in at least the initial image frame; (d) determining the position of a plurality of features located in subsequent image frames and testing the strength of the hypotheses for said subsequent image frames utilising the determined location of said features.
10. A method as claimed in claim 9 further comprising the step of: (e) displaying said features with a current frame when said hypotheses satisfies a first predetermined condition.
11. A method as claimed in claim 10 further comprising the step of: (f) determining if a second predetermined condition is satisfied in relation to said hypotheses and not displaying said features when said second predetermined condition is satisfied, said first and second condition having an interrelationship such that a hysteresis display condition is set up for the display of said features.
12. A method as claimed in claim 9 wherein said hypotheses include that the features are rigidly connected to one another.
13. A method as claimed in any previous claim wherein said features include areas of the image having a high contrast texture.
14. A method as claimed in claim 13 wherein said high contrast texture is derived by forming a covarience matrix from derived images which are derived from a current frame.
15. A method as claimed in claim 14 wherein said derived images are formed from orthogonal calculations carried out on said current frame.
16. A method as claimed in any previous claim further comprising the step of discarding features which exhibit only a small amount of motion over an extended sequence of images.
17. A method as claimed in any previous claim further comprising formulating hypothesis that include the disappearance of features for a predetermined number of image frames.
18. A method as claimed in any previous claim further comprising the step of incorporating further features into a current hypotheses in subsequent frames of an image sequence.
19. A method as claimed in any previous claim wherein the expected relative positioning of features within a hypotheses is adapted to change over time from one sequence to a next sequence.
20. A method as claimed in any previous claim wherein said hypotheses are assigned a quality value depending on the features in said hypotheses.
21. A method as claimed in claim 20 wherein said features are also assigned a feature quality value.
22. A method as claimed in claim 21 wherein the feature quality value is varied in accordance with the feature's proximity to other features in a hypotheses.
23. A method as claimed in claim 21 wherein the feature quality value is varied in accordance with the amount of strain a feature produces on a hypotheses.
24. A method as claimed in claim 9 wherein said hypotheses evolve over time.
25. A method as claimed in claim 9 wherein in step (d) includes the sub step of: assigning a quality weighting to each of said two or more detected features of an hypothesis ; and calculating a quality value for each hypothesis based on the quality of its respective detected features.
26. A method as claimed in claim 25 wherein in step (d) includes the sub step of selecting one of said hypotheses based on a calculated quality value of the hypothesis.
27. A method as claimed in any one of the preceding claims in which the method includes the additional step of: adapting said one or more hypotheses based on a comparison of a predicted position of each detected feature using the respective hypothesis and the measured position of each detected feature.
28. A method as claimed in any one of the preceding claims in which said one or more hypotheses can have a predetermined maximum number of detected features.
29. A method of adaptively creating a tracking model substantially as hereinbefore described with reference to the accompanying drawings.
Description:
Tracking Method and Apparatus Field of the invention The present invention relates to a method and apparatus for tracking a rigid body within a sequence of video images. It should be understood that the present invention is applicable to the tracking of any rigid body, however the present invention will be described in the context of tracking the face of a person in a sequence of video images.

Background of the invention One goal of research into automatic monitoring and detection of rigid bodies within a video image is to allow any object to be placed in front of a camera and for a computer to be able to reliably track the position and orientation of the object. In the context of face and gaze tracking the goal of research is to allow any person to sit in front of a camera and for a computer to be able to reliably monitor the orientation of the person's head (the head pose), and the gaze direction of their eyes.

In this context, the existing face tracking methods generally operate by testing whether facial features detected in a sequence of images fit those of a predetermined face model based on an"average"face. However, the existing methods generally experience two shortcomings. The reliance on some form of"average face"in the generation of a face model results in people with significantly different faces from the"average face"being unable to use the system. Furthermore, the need for calibration of such systems before use makes using these systems slow and cumbersome.

Furthermore, often many such systems are unable to work in real time, and must work with saved image sequences.

Summary of the invention It is an object of the present invention to provide an improved tracking method and system.

In accordance with a first aspect of the present invention, there is provided a method of adaptively creating a tracking model from a series of visual images, the method comprising iteratively performing the steps of: (a) locating a series of new tracked objects within a current image and adding them to the set of previously tracked objects to form a current set of tracked objects; (b) determining a new series of relationships between objects in the current set of tracked objects, and adding them to the set of previous series of relationships to form a current series of relationships ; and (c) assessing the members of the current series of relationships between successive visual images and deleting a

predetermined number of relationships from the current series of relationships having low assessed merit values.

Preferably, the step (b) further includes utilising the distance between objects in the determination of a relationship value and the step (c) further includes as part of the assessment, determining for each member of the current series of relationships, a fitness match with the current image. The method can also allow the assessment to include a measure of the distance between objects in a relationship with the greater the distance, the greater the value of assessment of the relationship. The method can also include modifying existing relationships by adding further tracked objects to the relationship or modifying existing relationships by altering the expected distance between tracked objects in the relationship based on the distance between corresponding objects in the visual image. Ideally the method allows members of a relationship to be occluded for a predetermined number of frames.

In accordance with another aspect of the present invention there is provided a method of adaptively creating a tracking model from a series of visual images, the method comprising the steps of: tracking objects within the series of visual images to form a series of tracked objects; determining relationships between the tracked objects to form a series of hypotheses; assessing the validity of the hypothesis over the series of visual images, and applying a selective pressure to cull the objects and the hypotheses between objects between members of the series.

In accordance with a further aspect of the present invention, there is provided a method of tracking an object, the method including the steps of: (a) capturing a video sequence of the object comprising a plurality of image frames; (b) detecting a plurality of features within at least an initial image frame of the video sequence; (c) generating one or more hypotheses relating to whether two or more detected features are interconnected to one another by comparing the relative positioning of the two or more features in at least the initial image frame; (d) determining the position of a plurality of features located in subsequent image frames and testing the strength of the hypotheses for the subsequent image frames utilising the determined location of the features.

Preferably, the method also comprises the step of: (e) displaying the features with a current frame when the hypotheses satisfies a first predetermined condition; and the step of: (f) determining if a second predetermined condition can be satisfied in relation to the hypotheses and not displaying the features when the second predetermined condition can be satisfied, the first and second condition having an interrelationship such that a hysteresis display condition can be set up for the display of the features.

The hypotheses can include that the features are preferably rigidly connected to one another. The features can include areas of the image having a high contrast texture.

The high contrast texture can be derived by forming a covarience matrix from derived images which are preferably derived from a current frame. The derived images are preferably formed from orthogonal calculations carried out on the current frame. The method can also include the step of discarding features which exhibit only a small amount of motion over an extended sequence of images. The hypothesis can include accounting for the disappearance of features for a predetermined number of image frames.

The expected relative positioning of features within a hypotheses can be adapted to change over time from one sequence to a next sequence. The hypotheses are preferably assigned a quality value depending on the features in the hypotheses. The features are preferably also assigned a feature quality value. The feature quality value can be varied in accordance with the feature's proximity to other features in a hypotheses.

Also, the feature quality value can be varied in accordance with the amount of strain a feature produces on a hypotheses.

The step (d) preferably can include the sub step of: assigning a quality weighting to each of the two or more detected features of an hypothesis; and calculating a quality value for each hypothesis based on the quality of its respective detected features. In step (d) preferably can include the sub step of selecting one of the hypotheses based on a calculated quality value of the hypothesis.

Brief description of the drawings Notwithstanding any other forms, which may fall within the scope of the present invention, the invention will now be described, by way of example only with reference to the accompanying drawings in which: Fig. 1 shows a flow chart representing an overview of an embodiment of the tracking method; Fig. 2 shows a step in the generation of sub images used for feature detection in the method of Fig. 1; Fig. 3 shows a flow-chart depicting how detected features move between feature sets within the method of Fig. 1; Fig. 4 shows a series of image frames each having a plurality of detected features within each frame, illustrating the transition of the features between the features sets of Fig. 3;

Fig. 5 shows a set of factors which affect the quality of a feature within a hypothesis used in the method of Fig. 1; Fig. 6 illustrates the concept of a"gravity"field which is used in the embodiment of Fig. 1 to scale the quality of detected features in a face model ; Fig. 7 illustrates a graph of gravity strength variation with distance; Fig. 8 illustrates the concept of strain, which is used to vary a hypothesis used in the method of Fig. 1; and Fig. 9 illustrates a hysteresis effect, which is used to determine which features of a model are displayed in the method of Fig. 1.

Detailed description of the embodiments In broad concept the preferred embodiment of the present invention provides a system and method for automatic generation of 3D models of rigid bodies. A flow chart outlining a first embodiment of the method is shown in Fig. 1.

The method 10 can be broken into four basic steps. An initial step 20 being acquisition of a series of video images of the object to be modelled. In the next step 30 features of the object being modelled are detected within the series of images. Next the isolated features, which are independently tracked, are turned into a model of a face. To track the features of an object, the notion of a hypothesis 40 is used, rather than the traditional approach of matching the detected features to a fixed template. The hypothesis represents a belief about whether a set of features on the object are rigidly connected to each other. This belief is not something that is definitely true or false, but it can be stronger or weaker.

Thus, in step 40 an example hypothesis relating to whether or not two or more of the features detected within the image are maintained in a fixed spatial relationship i. e, are rigidly connected, with each other, is tested against the detected features. If the detected features do not match the hypothesis, the hypothesis is refined and the process is restarted at step 20. Those features in step 40 which fit the hypothesis to within a predetermined confidence threshold are displayed in step 50 to the user to provide a model of the object being tracked.

Each substeps 20-50 of the method 10 will now be described in greater detail beginning with the process of acquiring images.

Acquisition of images in step 20 is a relatively straightforward procedure and can be performed by positioning the object to be modelled in front of a set of stereo cameras

mounted on a tripod. Images captured by the video cameras are transferred to a PC running application software capable of preforming steps 30 to 50 of the method. The application software can include suitably encoded codes using C++ or the like. In the current embodiment, the computer system is an IBM compatible PC with a Pentium III Processor or above. Preferably the PC is in communication with the video cameras via a Firewire IEEE1394 Video capture card. Typically Video capture process can be performed at a rate at between 60 and 100 frames per second. It will be evident to those skilled in the art of digital image processing that other computer systems and programming languages could be utilised in the construction of the preferred embodiment.

For each image in the sequence of video images captured, the steps 30 to 50 of the method 10 are performed.

In step 30 of the method 10, features of the object being tracked are detected and tracked within the sequence of images from the camera. Features which the system can track are generally characterised by a high contrast texture. Additionally, it is preferable that the texture provides contrast in different directions. In order to identify if a suitable feature is within each of the images, the system calculates a covarience matrix of a pair of subimages. Fig. 2 shows an example pair of subimages S1 and S2 extracted from a main image by the system. The main image 200 may comprise a frame or partial frame of the image sequence provided by the cameras of the system. Each image is broken into a horizontal difference subimage 210 and a vertical difference subimage 220. For each subimage e. g. region 230, the system calculates the covarience matrix of the subimage.

The eigenvalues of the covarience matrix will correspond to the amplitude of the texture in the horizontal and vertical directions.

It will be appreciated by those skilled in the art of image processing that horizontal and vertical directions have been chosen for convenience only, and any other pair of near orthogonal directions may be used for producing the covarience matrix. The two eigenvalues are multiplied to obtain a single value that corresponds to the suitability of a subimage to be used as the feature for tracking the objects.

A covarience matrix C is formed using the product (n) of the subimages s, and s2 : The two eigenvalues (S, and 2) are calculated from this 2x2 matrix, and the suitability S of each feature for tracking is then given by:

s=At Once a plurality of features has been identified, the feature locations are used to generate a hypothesis relating to the expected relative positions of features within the object being tracked.

It should be noted however that not all features are suitable for tracking or for incorporation into an hypothesis. This may be due to the fact that the same features may appear to move independently of all other features or do not move at all ; Other reasons for unsuitability is that the feature may be difficult to track or have only recently been detected. For this reason during the substeps 40 (Fig. 1) of generating and refining a hypothesis, a sub process of feature management is implemented in order to ensure that the most reliable and robustly tracked features are emphasised in generating and maintaining hypothesis. The so-called"feature management"process will now be explained with reference to Figures 3 and 4.

Turning now to Fig. 3 which shows a schematic view of how features move between sets of a hypothesis. It should be noted that there may be more than one hypothesis at any time and each hypothesis has its own background features and uncommitted features, committed features and hidden features. The available features list 300 comprises any features visible within the image sequence and are common to all hypotheses currently used by the system.

In general terms, a number of feature sets are used in order to speed up computation. The feature management allows features to be tracked between frames and allows the model's reliance on each feature to be increased from it being initially identified within an image and it becoming part of a hypothesis. In this regard, the following feature sets are used by the preferred embodiment of the present invention: 1. New feature (310) These are features which have not been tracked for a sufficient number of frames of the image sequence i. e. these features have a short motion history. As would be appreciated by those skilled in the art, during the initial frames of running the system all features identified will be considered as new features. To speed up calculations, new features are not tested to see whether they fit a hypothesis once the system is tracking an object.

2. Background features (330) Generally, these are features which do not display a great deal of motion. The computation speed of the system is increased as these background features, once

identified are no longer tracked. Furthermore, new features searches are no longer preformed within a region defined by the background features.

3. Uncommitted features (323) Uncommitted features are features that have been tracked in enough frames of the image sequence to be part of a hypothesis but they have not yet been entered into hypothesis. This may be due to the fact that the features are either tracking poorly or have been occluded during some of the frames of the image sequence.

4. Committed Visable features (321) These features are part of a hypothesis used to model the object being tracked.

5 Hidden features (322) Hidden features are part of a hypothesis, but are temporarily hidden from view.

In Fig. 3 all of the features identified within an image are initially grouped within the available features set 300. If these features are tracked by the system with a predetermined degree of confidence, for a set period of time, they may become part of the new feature set 310 of the hypothesis. Once the feature has been classified as a new feature 310 its trackability is used by the system to determine whether the feature either fits a current hypothesis 311, is part of the background 312 or whether the feature is not sufficiently trackable 324 and hence not of use to the system. If the feature fits the hypothesis to within a predetermined quality, the feature moves from the new features set 310 into the hypothesis set 320. Once in the hypothesis set 320, if the feature is visible in a particular frame it falls within the visible features set 321, and if it is temporarily hidden it falls within the hidden feature set 322. Between frames any particular feature within the hypothesis can move between the visible features set 321 and the hidden feature set 322 without being removed from the hypothesis set 320.

If a feature within the new feature set is determined by the system not to be moving it is transferred 312 from the new feature set 320 into the background feature set 330. As described above the background features are used to define a background region within which no features are tracked, thereby reducing the computational load on the system.

If the new features 310 are being reliably tracked by the system but neither fall within the background feature set 330 or accurately fit the current hypothesis 320 they may be transferred to the uncommitted feature set 323. If an uncommitted feature begins to move in accordance with the current hypothesis it may be transferred into the committed features set 321, or alternatively if it becomes untrackable for any reason it

may be removed from the uncommitted feature set 340 and returned to the available feature set 300.

If at any time a particular feature becomes unsuitable for tracking the feature will be removed from its current set and returned into the pool of available features.

In order to illustrate the movement of features between the sets of Fig. 3 a sequence of image frames each with a plurality of features are shown in Fig. 4. The frames 1 to 4 of Fig. 4 are shown in sequence but should not be seen as a set of consecutive images from the video sequence. In each of the frames detected features are represented by a circle. Those features which form part of a hypothesis are shaded with diagonal lines, and are linked with bars to denote the fixed physical relationship of each shape to its neighbour within the hypothesis. Those features which are determined to be background features are filled with cross hatching and new and uncommitted features are shown as open circles.

Turning firstly to frame 1 (401), there is shown a frame having 12 available features within the frame. The hypothesis 410 as depicted in frame 1 includes 6 features e. g. 415 and 416. Frame 1 additionally includes 6 unfilled circles which represent new or uncommitted features. The uncommitted features e. g. 421,422, 233 may either later become part of the background, or be newly detected features with only short motion histories and therefore have not yet found their way into the hypothesis. Turning now to frame 2 (402), which shows the features of the hypothesis 410 having been rotated with respect to the image frame 401. For example, the object being tracked within the series of frames in Fig. 4 maybe that of a face and the features of the hypothesis comprise trackable features within the face, such as eyes, mouth corners or nostrils. The tilting of the features of the hypothesis as shown in Fig. 2 can, in this case, be caused by the subject tilting the head to one side. It can be seen that image frame 2, in addition to the features of the hypothesis 410, also includes the 5 new or uncommitted features 426-429.

In this regard, with the exception of feature 425, the new or uncommitted features do not appear to have moved in concert with the features of the hypothesis. For this reason these features may be determined to be background features or remain as uncommitted features. On the other hand feature 425 appears to be the same feature as 423 however it has moved up and slightly to the right, as it would be expected to if it was rigidly attached to the features in the hypothesis. If the feature 425 continues to move in accordance with the features of hypothesis it may, once it has a sufficiently long motion history, become part of the hypothesis.

Turning now to frame 3 (403), it can be seen that the features comprising the hypothesis 410 have rotated in an anti clockwise direction. Furthermore, uncommitted feature 425 again appears to have moved in concert with the features hypothesis of the. whereas the other features of the frame have not. The features 430,431, 432 appear to correspond to uncommitted features 426,427, 428 of frame 2 respectively. As these features have not moved since frame 2 it appears that they are part of the background, and accordingly features 430,431, 432 are transferred from the new feature set into the background feature set. On the other hand, feature 433 has no equivalent feature in a previous frame and as such remains as a new feature. In the final frame of the sequence, frame (404), it can be seen that the hypothesis now includes 7 features. Since feature 425 has tracked in accordance with the hypothesis and maintained a fixed physical relationship with the other features of the hypothesis for a predetermined length of time, feature 425 can now been transferred into the hypothesis, and its physical relationship with its neighbour 411 determined. As discussed in relation to frame 3, features 430,431, 432 remain as background features. Feature 433 on the other hand also appears to have not moved a significant amount, however due to its short motion history it has not been transferred into the background features set. It can be seen from Figs. 3 and 4 that the hypotheses can be constantly evolving overtime by adding or removing features from them.

As described above, isolated features which are independently tracked are combined into a model of the object using a hypothesis representing a belief about whether a set of features on the object are rigidly connected to each other. This belief is not something that is definitely true or false, but it can be stronger or weaker.

To allow for'strengthening'and'weakening'of a hypothesis, each feature in a hypothesis is assigned a quality value. In this way, a hypothesis with high quality features is strong, whereas a hypothesis with lower quality features is weaker. As time passes, the quality of a feature in a hypothesis can be modified based on tracking, strain, visibility and proximity to other features, as shown in Fig. 5.

Fig. 5 shows a series of factors which affect the quality of a feature within a hypothesis. Good correlation of the feature with the hypothesis increases the quality of that feature, and accordingly increases the confidence in the hypothesis. There are also four factors which decrease the quality of a feature within a hypothesis. These are: (1) gravity; (2) the feature being temporarily hidden;

(3) poor tracking of the feature between frames of the image sequence; and (4) the strain the feature is exerting on the hypothesis.

It is clear that a feature which becomes hidden from view will be less easily tracked than one which is visible. Accordingly, the quality of such hidden features is decreased, thereby lowering the confidence in the hypothesis. Similarly, any feature which is unable to be reliably tracked between frames of the image sequence will decrease the quality of that feature within the hypothesis. As will be appreciated, if the quality of tracking of the feature decreases, the overall quality of the hypothesis also becomes weaker.

Whilst the affect of tracking quality and features becoming hidden on the confidence in the hypothesis are relatively self-explanatory, the concepts of gravity and strain are hereinafter further explained. The concept of gravity between features of a hypothesis is introduced in order to favour hypotheses having the features separated by large distances. Thus, "gravity"refers to an arrangement whereby features which lie nearby each other exert a so-called gravity force on each other, which decreases the quality of the weaker of the two features. Thus, if a feature of high quality is close to a feature of lower quality, the quality of the second feature is decayed by the gravity of the stronger feature.

In Fig. 6 there are shown two features 600 and 610. The quality rating of feature 600 is 35, whereas the quality rating of feature 610 is 3. Thus feature 600 exerts a gravity field illustrated by graph 620 in Fig. 7 on feature 610. The graph 620 in Fig. 6B plots the strength of the gravity on the vertical axis and distance from the strong feature on the horizontal axis. In the present instance, the gravity field 620 can be based on a tanh function. The gravity function is used essentially as a multiplier to scale the quality of any features falling within it, thereby reducing the quality of the feature 610. Thus, it can be seen that features which fall within the gravity field of other higher quality features are reduced in their overall contribution to the confidence in the hypothesis. Thus, the hypotheses will tend to be stronger if the features forming them are spaced such that no feature falls within the gravity field of any other feature.

The concept of strain can be viewed as the difference between the predicted position of a feature using the model generated on the basis of a given hypothesis and the measured feature position. Fig. 8 illustrates an example of the concept of strain. In Fig. 8, there is shown a set of features detected within an image and their correlation with their expected feature positions according to a hypothesis. The measured feature positions are represented by octagons eg. 700,710, and the predicted positions of the features are

represented by open circles eg 720 and 730. The fixed relationships between the predicted feature positions according to the hypothesis are represented by the lines joining the predicted feature positions.

It can be seen in Fig. 8 that each of the measured feature positions with the exception of feature 700 are centered on their predicted feature position. Thus, by not being in its expected position, feature 700 is exerting a strain on the hypothesis, thus suggesting that feature 700 is not rigidly connected to the rest of the features.

As well as being used to determine the quality of a feature within a hypothesis, "strain"can also be used to adapt the hypothesis of the system. Instead of picking an original model position for each of the features within the hypothesis and keeping it for the duration of the hypothesis, the model positions can be made to adapt a small amount with each frame. This means that if a feature is placing a strain on the model i. e. the predicted model position is different from the actual detected feature position, the model adapts slowly towards the measured model position. The adaptation can be inversely proportional to the quality of the feature. Thus the following equation can be used to express the new model position for each feature: new model position = (1-adaptFactor) oldM + adaptFactor x predictedM in which <BR> <BR> <BR> <BR> adaptspeed<BR> adaptFactor= and<BR> <BR> 1 + log10 (quality) M is the position of a feature and adaptspeed can be found via experimentation depending on tracking requirements.

As an alternative, the adaptation speed can be set to be low so that features which cause significant strain do not cause significant distortion of the model. These features can be dropped from the model rather than straining the model to the point where the hypothesis becomes overly weak.

In this regard, features with good quality remain part of a hypothesis, whilst those with lower quality are removed from the hypothesis. At the same time, new and existing features can be merged into a hypothesis if they fit the existing rigid body motion of the model, as illustrated in Fig. 4. Individual hypotheses can also be merged together if they appear to be moving consistently with each other. Generally, a limit on the number of features in hypothesis is desirably used. This means that the model becomes more robust after an initialisation time because the features with good quality keep getting better whilst

those with poorer quality are eliminated and replaced until an hypothesis containing features with only high quality factors.

The final step 50 in the method 10 of Fig. 1 is that of displaying the generated model to the user of the system. To ensure that the user is viewing a stable model, not all features that are part of the hypothesis are displayed. As shown by Fig. 8, a hysteresis effect can be used to prevent features of the model apparently jumping into and out of the model being displayed to the user. In Fig. 9, it can be seen from the left hand graph 810 that features already displayed in the model must have a significant degradation in quality before being removed from the displayed model. From the right hand graph it can be seen that features which are not displayed must have a significant increase in quality in order to be included in the displayed model. The requirement to get into the display is higher than the requirement needed to get into a hypothesis, meaning that only the most stable and robust features are displayed to the user.

In operation, the preferred embodiment allows for the automated construction of a 3D feature model builder (i. e. determining the 3D location of features with respect to one another) and a 3D pose tracker which derives the 3D pose of the object from the measured feature locations. Ideally, there is no separate step to make the model, either by manual selection of features, or by using markers or by, searching for predefined features (for example the corners of the eyes). During operation, the model is refined in 3 ways: New features consistent with the current object hypothesis are being added, existing features which do not conform with the current object hypothesis are removed, and the location of each feature with respect to the other features is refined according to the ongoing object pose estimation.

The dual use of model building and tracking has advantages in that the building of the 3D feature model is fully automatic and does not require prior knowledge about the appearance of the object. Further the tracking is more robust under changing conditions (i. e. illumination) as features are being continually replaced with new ones.

It will be understood that the invention disclosed and defined herein extends to all alternative combinations of two or more of the individual features mentioned or evident from the text or drawings. All of these different combinations constitute various alternative aspects of the invention.

The foregoing describes embodiments of the present invention and modifications, obvious to those skilled in the art can be made thereto, without departing from the scope of the present invention.