Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
OBJECT TRACKING SYSTEM
Document Type and Number:
WIPO Patent Application WO/2006/027627
Kind Code:
A1
Abstract:
An object recognition system is disclosed having at least one video stream receiving means for receiving one or more images in succession and a processing means, wherein the processing means is adapted to identify one or more objects in a first image and one or more objects in a second image and tracking the identified said one or more objects between first and second images thereby producing a movement vector of the one or more objects. The system may use high contrast regions and image manipulation techniques to identify and track one or more objects for use, for example, as part of a computer game, training simulation or computer interface.

Inventors:
REN JINCHANG (GB)
KANJO EIMAN (GB)
LU TIEYING (GB)
ASTHEIMER PETER (GB)
Application Number:
PCT/GB2005/050144
Publication Date:
March 16, 2006
Filing Date:
September 06, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIVERISTY OF ABERTAY (GB)
REN JINCHANG (GB)
KANJO EIMAN (GB)
LU TIEYING (GB)
ASTHEIMER PETER (GB)
International Classes:
G06T7/20; G06K9/00
Foreign References:
US6690156B12004-02-10
FR2710434A11995-03-31
Other References:
HALL E L ET AL: "AN EDUCATIONAL COMPUTER VISION AND ROBOTICS SYSTEM", COMPUTER SOCIETY CONFERENCE ON PATTERN RECOGNITION AND IMAGE PROCESSING. LAS VEGAS, JUNE 14- 17 1982, NEW YORK, I.E.E.E, US, vol. PROC. 1982, 14 June 1982 (1982-06-14), pages 647 - 649, XP001002401
Attorney, Agent or Firm:
MURGITROYD & COMPANY (165-169 Scotland Street, Glasgow Strathclyde G5 8PL, GB)
Download PDF:
Description:
OBJECT TRACKING SYSTEM Description

[001] The present invention relates to object recognition systems and particularly, but not exclusively, to object recognition in video frames as an interface to computer ap¬ plications. [002] Most computer applications have a user interface to enable a person to interact with the application. In most cases, the user interface receives inputs from electronic devices such as a computer mouse, a keyboard or a joystick to enable the application to be controlled. [003] Prior art systems, such as US 5534917 -. , introduce the use of video as an interface with a computer application. The system of US 5534917 -. identifies areas of interest and processes a video stream to identify if a participant has entered the area of interest. [004] According to a first aspect of the present invention there is provided an object recognition system comprising: at least one video stream receiving means for receiving a plurality of images in succession; a processing means; wherein the processing means is adapted to identify one or more objects in a first image and one or more objects in a second image and tracking the identified said one or more objects between first and second images thereby producing a movement vector of the one or more objects. [005] It should be appreciated that a movement vector produced may be zero thereby identifying no movement for a particular object between images. [006] In addition, an object may be a physical object or part of an object or an area of recognisable interest within the field of view. [007] Preferably, the object recognition system further comprises at least one video camera wherein the at least one video camera produces a video stream for the at least one video stream receiving means. [008] Preferably, the object recognition system further comprises one or more outputs wherein the processing means modifies the one or more outputs depending on the identified one or more objects or the movement vector of the one or more objects. [009] Alternatively or further preferably, the one or more objects may be a part of a person's or animal's anatomy. [010] Movement of a finger or hand on to an object or over an area or object in the image may trigger a particular output. For example, the movement of a finger on to a particular area may be linked to an output that is the equivalent of clicking a mouse button. [Oil] Preferably, a high contrast region or large colour difference surrounds the one or more objects thereby enabling a region of interest to be more readily identified. [012] Preferably, a board or mat with a high contrast border forms the high contrast region. [013] Preferably, the one or more objects are identified in at least two video streams, the processing means is further adapted to identify a three dimensional position of the one or more objects. [014] According to a second aspect of the present invention there is provided a method of recognising objects comprising the steps of: receiving at least one video stream for receiving a plurality of images in succession; processing the plurality of images; identifing one or more objects in a first image and one or more objects in a second image; tracking the identified said one or more objects between first and second images; and producing a movement vector of each of the one or more objects. [015] According to a third aspect of the present invention there is provided a computer program product directly loadable into the internal memory of a digital computer comprising software code portions for performing the method according to the second aspect. [016] According to a fourth aspect of the present invention there is provided a computer system comprising an execution environment for running an application and an object recognition system according to the first aspect. [017] The processing means may use one or more of the following image manipulation techniques to identify and track objects: [018] Detection of change between image frames A first and second image of the same resolution and colour depth are compared. Only pixels at the same location in both the first and second images which have different RGB values or RGB values outside a given threshold are retained in a combined image. The first and second images can originate from two consecutive image frames or from any other image order. For example, the first image may be static and originate from the very start of the image sequence and the second image can be the current image frame. [019] Correction of colour A first and second image of the same resolution, colour space and depth are used as input to compute a scaling factor. With each colour channel L, M and S of the first and second image a scaling factor kL, kM and kS is computed as follows: 1. Selection of pixels of same location in the first or second image. The number of selected pixels can be equivalent to cover the whole image. The criteria for selection of a pixel can be firstly the euclidean distance to grey with zero or small distance preferred over large distance and secondly value of luminance with higher value preferred over lower value, e.g. white preferred over black as a compensation of typical camera characteristics. For example in the LAB colour space a weight for a pixel i can be calculated as Li divided by the euclidean distance to A=O and B=O. The larger the weight of a pixel, the better candidate it is for selection. A fixed percentage of the total image pixels with the highest weights can be selected. In addition, to ensure an even distribution of selected pixels, the first and second image can be tiled and the process described can be applied to each tile. 2. For the first and second image, for the selected pixels, for each colour channel L, M and S the mean values are computed. The scaling factors kL, kM and kS are then computed for each colour channel L, M and S by dividing the mean value of the first image by the mean value of the second image.

[020] The first and second images can originate from two consecutive image frames or from any other image order. For example, the first image may be static and originate from the very start of the image sequence and the second image can be the current image frame. [021 ] Normalisation of contrast An image is converted to RGB colour space. For each of the three colour channels R, G and B a colour histogram is computed, then normalised between the maximum (typically 1) and minimum value (typically 0). Finally the three histograms are recombined into an RGB image. [022] Normalisation of colour Normalisation of colour can be achieved through one or a combination of these methods: 1. An image is transformed from the RGB colourspace into the rgb (normalised Red, Green, Blue) colourspace. For each pixel in the RGB colourspace the r component is computed by dividing R by R+B+G, the g component is computed by dividing G by R+B+G and the b component is computed by dividing B by R+B+G. 2. An image is transformed into the HSV (Hue, Saturation and Value) colourspace. The V, or brightness, component is set to 0, the S and V component normalised to a 0...1 range and then transformed into the RGB colourspace or any other colourspace. 3. An image is transformed into the LAB (Luminance, Red-Green range and Blue- Yellow range) colourspace. The L, or luminance, component is set to 0, A and B remain unchanged and then transformed into the RGB colourspace or any other colour space.

[023] Calibration of camera view - method 1 Edge detection: One method uses edge detection, i.e. a vectorisation of the captured image. Edges, which are likely to correspond to the borders of a board are retained. This step assumes a rectangular board with the borders close to the image limits. A transformation can then be calculated which maps the detected vectorised board to a normalised rectangular area. This transformation can then be applied to each pixel in the image. [024] Calibration of camera view - method 2 Point detection: Regions of high contrast in the captured image are identified. These regions may be caused by, for example, a black border around a board with a light background. The minimum/maximum points of the regions of high contrast indicate the four corners of the board. A transformation can then be calculated which maps the detected four corner points to a normalised rectangular area. This transformation can then be applied to each pixel in the image. [025] Colour recognition In a first phase a number of simple colours are defined. A simple colour is a specific RGB value and a positive tolerance value indicating the allowed variation of R, G and B values. In the later recognition process the colour of physical objects represented in captured images will be compared to the pre-defined simple colours. The definition can be in the form of a list of RGB values or a colour look up table (CLUT), which specifies for all possible permutations of RGB values the corresponding object identifier. A CLUT is favourable in terms of short processing time but at the cost of a larger memory requirement. If a colour pixel in a captured image does not equal a pre¬ defined simple colour it can be rejected or alternatively, it can be matched to the object identifier which has the shortest euclidean distance. Hands can be detected via skin colour which is a special case of colour detection. In addition to colour, the size of an object can be a criterion for recognition. [026] Post-processing Once the object(s) have been isolated, a post-processing step eliminates noise effects which are manifested through scattered pixel areas. Scattered recognised pixel areas (further away from each other than a specified distance and/or smaller than a given size and/or surrounded within a given distance by an insufficient number of pixels) are discarded or merged into a single, larger pixel area. [027] Centre and orientation of object The centre of gravity of a pixel area is computed. The orientation of a pixel area is computed through calculation of its moments. [028] Embodiments of the present invention will now be described with reference to the accompanying drawings, in which: Fig. 1 shows a depiction of one intended use of the invention; and Fig. 2 shows a flow diagram of the invention. [029] Referring firstly to Figure 1, a digital video camera 10 is shown connected to a computer system 12. Video Images from the camera 10 are processed by the computer system 12. [030] The camera 10 is pointed approximately towards a game board 14. Several game objects 16 are within the boundaries of the game board 14. A person 18 interacts with the game objects 16. Interactions are detected by the computer system 12 through analysis of the video images from the camera 10. [031] Interaction of the person 18 with the objects 16 are analysed by the computer system 12. Particular actions, such as the movement of an identified object or placing an object in an identified area, result in the computer system 12 generating an output. For example, in an interactive game of chess, the movement of a particular chess piece can be logged to enable the computer system 12 to respond with its own move. The computer system 12 could also output a sound if an illegal chess move has been made so the person 18 is aware that the move is incorrect. [032] The invention allows the use of tangible everyday and game objects when in¬ teracting with a computer system. In addition, the invention can allow a computer system to be less obtrusive by performing function of other input devices such as a mouse, keyboard or joystick. [033] Referring now to Figure 2, the digital video camera 10 has an output video image stream 32. A frame grabber 34 stores each frame from the image stream 32 in a memory 36. An image processor 38 then analyses each frame from the frame grabber 34 and may store the result in the memory 36. An object recognition step 40 analyses each frame to identify objects such as the game objects 16 shown in Figure 1. An object tracking step 42 analyses movement in the objects identified from the previous step. A controller 44 receives data from the object tracking step 42 indicating the position and movement of any identified objects. Movement of identified objects may trigger outputs of the computer system. If the controller 44 detects a trigger movement instructions are sent to an output 46. The output 46 may take the form of audio, video or other program commands or peripheral instructions. [034] The invention may operate in a number of different modes depending on the game type. The following descriptions are examples of different types of modes. [035] Turn-based mode with recognition of any one object (any shape, any colour/ texture/material) : For initialisation, an image of the background without the object is recorded. To recognise the object against the background, the following steps are performed repeatedly: 1. The detection of any one of the following conditions performed during ini¬ tialisation and in any order will cause the method to abort, i.e. return a negative recognition result: 1. Detection of change 2. Detection of hand e.g. through skin colour and/or size 3. Failing to calibrate

2. In case of absence of above conditions the following steps are performed: 1. Calibration (method 1 or 2) 2. Subtract the current image from the initial background image resulting in a combined image as described in "Detection of change between image frames" 3. Colour and/or contrast normalisation (optional) 4. Post-processing (optional) 5. Calculation of centre and orientation of object (orientation optional)

[036] This mode of recognition can, for example, may be applied to a play set where one and only one character is moved around. [037] Realtime mode with recognition of any one object (any shape, any colour/ texture/material) : For initialisation, an image of the background without the object is recorded. To recognise the object against the background, the following steps are performed repeatedly: 1. Subtract the current image from the initial background image resulting in a combined image as described in "Detection of change between image frames" 2. Colour and/or contrast normalisation (optional) 3. Post-processing (optional) 4. Calculation of centre and orientation of object (orientation optional)

[038] This mode of recognition can, for example, may be applied to a model aeroplane in hand to control a flight simulator. [039] Turn-based mode with recognition of multiple differently coloured objects: For initialisation, an image of the background without the objects is recorded. To recognise the objects against the background, the following steps are performed repeatedly: 1. The detection of any one of the following conditions performed in any order will cause the method to abort, i.e. return a negative recognition result: 1. Detection of change 2. Detection of hand e.g. through skin colour and/or size 3. Failing to calibrate

2. In case of absence of above conditions the following steps are performed: 1. Calibration (method 1 or 2) 2. Subtract the current image from the initial background image resulting in a combined image as described in "Detection of change between image frames" 3. Colour and/or contrast normalisation (optional) 4. Post-processing (optional) 5. Colour recognition 6. Calculation of centre and orientation of objects (orientation optional)

[040] This mode of recognition can, for example, may be applied to the board game Ludo. [041] The present invention can be utilised in a large number of areas, for example: • with computer games with a novel, tangible interface to the computer and the game itself; • with strategy games or role playing games where it is often very difficult to keep an overview of the game (e.g. which unit or character is where), especially on large maps. It would be possible to print a large copy of the map and move your units on the map itself; • with real-time computer games where a mouse, keyboard, joystick or joypad does not really feel like the object that would normally be used. A real or small-scale object such as a tennis racket or a model airplane could be used to control the game; • for level design as creating maps with NPCs (Non-Player Characters) and resources it is hard to keep an overview of the whole world and achieve a balanced gameplay due to the limited screen size. The whole world can be printed, put on the table or floor and then place your items; • with board games where simulation of throwing of dice, detection of cheating, complying with rules, rewards and stimulation (e.g. cheering/booing crowd), simulation of the moves detected, advice and hints, robot can move opponent pieces, logging of moves which is useful in games like chess; • with books and interactive stories where Children's books or other stories can be transformed into an interactive story or content. This can be appealing to kids, for the hobbyist (e.g. in DIY or gardening) applications or for the pro¬ fessional worker. The physical objects can represent characters, buildings or other objects in a story, which can be controlled by such physical objects or the story can set tasks which can be followed with the physical objects; • with toys which can be used as interaction objects on the table or carpet; • for training, where simulation on the PC with a tangible interface on the table (similar to a military sandbox) can be used. The physical objects represent relevant objects in the simulation. By moving physical objects, the application will calculate and present the outcome. The simulation can set tasks which can be solved by replacing physical objects. Trainees profit from a better overview and control interface. Groupwork and discussions are easier; • as an interface to a computer system. The invention can be used to act as or substitute keyboard, mouse or other computer interfaces. The computer system can be hidden with just camera, screen and/or speakers visible. Any physical object can be used to control a cursor, a virtual button, a menu item or any other action which can be performed by a pointing device like a mouse. The invention can also replace joysticks and joypads for computer games and other applications; • with downloadable board games where the map or board is downloaded and printed and used with pieces from a games compendium to play; and • with novel applications designed around the new possibilities with tangible interfaces.

Improvements and modifications may be incorporated without departing from the scope of the present invention.