METHOD OF DETECTING HUMAN USING ATTIRE

Title:

METHOD OF DETECTING HUMAN USING ATTIRE

Document Type and Number:

WIPO Patent Application WO/2011/122931

Kind Code:

Abstract:

A method for detecting human using attire is provided. The method comprises detecting and subtracting group motion blob on image sequences from background image; detecting human attire in the group motion blob with help of at least one predefined attire template stored in a database; and validating and extracting individual human image from the image sequences using the detected human attire.

Inventors:

LIANG KIM MENG (MY)
LIM MEI KUAN (MY)
TANG SZE LING (MY)
ZULAIKHA KADIM (MY)

Application Number:

PCT/MY2010/000239

Publication Date:

October 06, 2011

Filing Date:

October 29, 2010

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MIMOS BERHAD (MY)
LIANG KIM MENG (MY)
LIM MEI KUAN (MY)
TANG SZE LING (MY)
ZULAIKHA KADIM (MY)

International Classes:

G06T7/00

Foreign References:

US20070237364A1	2007-10-11
JP2007272421A	2007-10-18
US20020076100A1	2002-06-20

Other References:

WILLIAM ROBSON SCHWARTZ ET AL.: "Human Detection Using Partial Least Square Analysis", 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, 29 September 2009 (2009-09-29) - 2 October 2009 (2009-10-02), pages 24 - 31

Attorney, Agent or Firm:

YAP, Kah Hong (Suite 8.02 8th Floor,Plaza First Nationwide,16, Jalan Tun H.S. Lee Kuala Lumpur, MY)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

1. A method for detecting human using attire from images sequences of a live video stream, said method comprising: detecting motion blobs on the image sequences; template matching human attire in each detected motion blobs with one or more predefined attire template stored in a database; and extracting each human individual from the image sequences based on the matched human attire, wherein each of the motion blobs comprises one or more human individual. 2. The method of claim 1, wherein the one or more predefined template comprises a set of triple attire templates,

3. The method of claim 2, wherein the triple attire template comprises a frontal, a left-side and a right-side attire views.

4. The method of claim 1, further comprises subtracting the image sequences from a background image to obtain the motion blobs.

5. The method of claim 1 , further comprises filtering single human motion blob from the motion blobs based on a height to width ratio of each motion blobs, wherein the height to width ratio within a predefined range will be considered a motion blob consisting more than one human individuals

6. The method of claim 5, wherein the template matching is not carried out for motion blob that consists only one human individual.

7. The method of claim 1 , further comprises: dispersing the one or more predefined attire template on the motion blobs; and fitting each of dispersed attire template to fit homogenous color region on the image sequences, wherein each fitted attire is taken as one human individual.

8. The method of claim 7, wherein the one or more predefined attire template is dispersed randomly on the motion blobs. 9. The method of claim 7, wherein fitting the dispersed attire template comprising minimization technique.

10. The method of claim 1 , further comprising human estimation for validating and detecting the human individuals from the detected motion blobs.

11. The method of claim 10, wherein the human estimation further comprising marking each detected human individual with a contour, wherein the contour is either a rectangular or a ellipse-shaped boundary.

Description:

Method of Detecting Human Using Attire Field of the Invention

[0001] The present invention generally relates to a human detection method and in particular to a human detection method using attire. Background

[0002] In intelligent surveillance applications, human detection feature is very crucial for human tracking, which subsequently enables high-level human behavior analysis in real time surveillance applications. Prior to robust human tracking in a particular scene, human detection is utilized for initializing the object of interest to be tracked.

[0003] One of the challenges in human detection is to detect individual human figure from a group of people. A big crowd of peoples is usually detected as a big motion blob by motion detection feature of video streams. Individual detection of each human figure from the motion blob is very tricky and challenging due to occlusions. This problem is commonly found in conventional motion detection of video streams, which might lead to an inaccurate behavior analysis of each individual human detected in a particular scene.

£0004] Another challenge in human detection is the ability to detect human when the human subject is back-facing the camera (back-view). In an unconstrained environment, it is not always possible to capture human's frontal view or human's face, in particular. E0005] In most of the conventional approaches, human detection is based on head or face detection, with the assumption that human subject has a head upright to the body. US2002/0076100 discloses a method for detecting human figures in digital color images. The method comprises the steps of segmenting images into non- overlapping homogenous color or texture regions, detecting human skin color regions, further detecting candidate of human faces regions and constructing human figures using semantic network. The nodes of body parts and the directed links encode constraints from anthropological proportions and kinetic motions. This method is designed for detecting cropped human figures, where certain parts of the human body do not appear in the images. Yet, this method still requires detection of the face regions beforehand in order to isolate the human figure from background and therefore is unable to detect human figures from the back.

[0006] US2008/0123968 discloses a human detection system for detecting a plurality of humans in an image, which comprises a full body detector, a plurality of part detector and a combined detector. The full body detector is used to detect the full body of each human. Each of the plurality of the part detectors is associated with different body parts of each of the humans, for example head-shoulder, torso and legs body parts. The combined detector is used to combine all the full body and part detection. However, this method is not suitable for human detection in a crowded scene, as the extraction of a fine and clean silhouette from each human subject in the group is very difficult to achieve due to high internal occlusion in the group.

[00071 US2009/0041297 discloses a human detection and tracking system for security applications. The human detection method makes use of the assumption that human's head is located upright to the body, as well as a few other additional assumptions extracted from general human object properties, e.g. human face might be visible. The human detection method comprises separating foreground and background using motion and change modules, dividing foreground regions into separate blobs using blob extraction module and detecting human subject in the scene using human detection module. In this method, the human profiles extraction module analyses the number of human profiles in each blob by studying the vertical projection of the blob mask and top profile of the blob. However, the vertical or horizontal projection will only perform well in simple side-by-side human profiles and may fail for blobs containing a group of humans.

SUMMARY

[0008] In accordance with one aspect of the present invention, there is provided a method for detecting human using attire from images sequences of a live video stream. The method comprises detecting motion blobs on the image sequences; template matching human attire in each detected motion blobs with one or more predefined attire template stored in a database; and extracting each human individual from the image sequences based on the matched human attire. Each of the motion blobs comprises one or more human individual. [0009] In one embodiment, the one or more predefined template may comprise a set of triple attire templates. The triple attire template comprises a frontal, a left-side and a right-side attire views. [0010] In another embodiment, the method further comprises subtracting the image sequences from a background image to obtain the motion blobs. It may further comprise filtering single human motion blob from the motion blobs based on a height to width ratio of each motion blobs. The height to width ratio within a predefined range will be considered a motion blob consisting more than one human individuals. The template matching may not be carried out for motion blob that consists only one human individual.

[0011] In yet another embodiment, the method may further comprise dispersing the one or more predefined attire template on the motion blobs; and fitting each of dispersed attire template to fit homogenous color region on the image sequences. Each fitted attire may be taken as one human individual. The one or more predefined attire template may also be dispersed randomly on the motion blobs. Yet, the step of fitting the dispersed attire template may include minimization technique.

[0012] In a further embodiment, the method may further comprise human estimation for validating and detecting the human individuals from the detected motion blobs. The human estimation may further comprise marking each detected human individual with a contour, wherein the contour is either a rectangular or an ellipse- shaped boundary. Brief Description of the Drawings

[0013] Preferred embodiments according to the present invention will now be described with reference to the figures accompanied herein, in which like reference numerals denote like elements; [0014] FIG. 1 illustrates an overall system for detecting human in image sequences using attire according to one embodiment of the present invention;

[0015] FIG. 2 illustrates an algorithm used in the system of FIG. 1 in accordance with one embodiment of the present invention; [0016] FIG. 3 illustrates a flowchart of a group motion blob detection component of the system algorithm of FIG. 2 in accordance with one embodiment of the present invention;

[0017] FIG. 4 illustrates a flowchart of an attire detection of the system algorithm of FIG. 2 in accordance with one embodiment of the present invention; [0018] FIG. 5 illustrates a triple attire template used in the attire detection of

FIG. 4 in accordance with one embodiment of the present invention.

Detailed Description [0019] Embodiments of the present invention shall now be described in detail, with reference to the attached drawings. It is to be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated device, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates. [0020] FIG. 1 shows a process flow of detecting human using attire in accordance with one embodiment of the present invention. Particularly, the process provides a human tracking initialization from crowded scene or in the scenarios where face or head of the human subject is not obvious. The process 1 is carried out by an imaging system (not shown) generally adapted for processing image sequences 11 forming a continuous video stream. The process 1 comprises a human detection 10 and a human tracking 20. During the human detection 10, the imaging system receives the images from an imaging device, such as a camera. On each image, an attire template matching is carried out for identifying individuals that appears within the image. The attire template matching requires a template database 30 that comprises attire/costume template. Blobs representing each individual human are extracted through identifying when each of the matched attire is found during the attire template matching. Once the individual humans are identified, the imaging system performs the human tracking 20 for each detected blob. The human tracking 20 can be done by any known imaging tracking system and method and therefore not defined specifically for simplicity.

[0021] FIG. 2 illustrates a process flow of human detection 10 of FIG. 1 in accordance with another embodiment of the present invention. The process comprises group motion blob detection 101, an attire detection 102 and a human estimation 103. At the group motion blob detection 101, the images sequences 11, upon which the individual human detection is going to be performed, are fed into the imaging system. The imaging system subtracts the image sequences with a background image 12 to extract group motion blobs from the background. The background image 12 is a pre- acquired image of a same field of view as the image sequences 11 without any motion object. The imaging system then obtains images with group motion blobs 13 identified. Each of the group motion blobs 13 may consist one or more human individuals. Then, the image sequences 11 with the group motion blobs detected 13 are processed together with the template database 30 at the attire detection 102 for detecting the attires from each group motion blob 13. As mentioned, any template matching system and method can be desired for the attire detection 102. The attire detection 102 aims to detect outline of each attire on each human individual, whereby details of the attire itself is not of the interest for identifying each human individual. Once the attire detection 102 is completed, image sequences 11 with detected attires 15 are obtained. The image sequences 11 with the detected attires 15 are further processed with the human estimation 103 for validating and detecting the human individuals from the image sequences the human estimation 103, the detected attire 15 is used for location estimation of human. Each identified human individual is being marked to provide a fit area of human, such that more information could be extracted for later use in human tracking. The markings could be in the form of a rectangular box, an ellipse or any suitable shape with proper ratio of a contour of human. Such marking is used for individuals tracking, as well as to allow the system to extract further information of the human from the marked area when necessary.

[0022] It is understood that the group motion blob detection 101 may adapt any state of the art blob detection techniques available. The above embodiment is provided herewith for illustrations only, not limitations to the scope of the present invention.

[0023] FIG. 3 shows a flowchart of the group motion blob detection 101 of

FIG. 2 in accordance with one embodiment of the present invention. The group motion blob detection 101 comprises of a background subtraction 104 and a blob analysis 105. At the background subtraction 104, image sequences 11 are subtracted against a pre- acquired background image 12 for extracting the motion blobs from the background. Subsequently, all the motion blobs, which may contain one or more human individuals, are processed in the blob analysis 105 to filter single human motion blobs from the group motion blobs. The single human motion blobs are herein referred to as a blob that consist only one human individual. Typically, one group motion blob is wider than a single human motion blob. Hence, the group motion blob can be identified by the width to height ratio of the motion blobs detected from the image sequences 11. When the motion blob's width to height ratio is greater than the ratio of a human (with respect to the camera view), the motion blob is then categorized as a group motion blob. In the case of motion blobs with ratio less than that of a single human ratio, no further processing is required.

[0024] FIG. 4 shows a flowchart of the attire detection 102 of FIG. 2 in accordance with one embodiment of the present invention. The attire detection 102 is carried out after the group motion blobs are detected from the previous processes. In the group motion blob detection 101, each group motion blob is being marked within a fit area thereof. In one embodiment of the present invention, a rectangular box 18 is used as the markings. Due to the image subtraction, image details of each individual are lost. Accordingly, the rectangular box markings 18 are applied onto the unprocessed image sequences, whereby the corresponding box marking 19 contains the group motion blob of human with details. Therefore, the attire detection 102 is carried out on each detected rectangular box marking 19 the comprises the following steps: dispersing template in the group motion blobs at step 106; expand the templates at step 107; and determining if any templates fits to homogeneous color region at step 108. At the step 106, the imaging system disperses templates randomly within the rectangular box marking 19 of the image sequences. At step 107, the templates are expanded to the edge of homogenous color region. Energy minimization technique, for example, can be desired for this purpose. The attire template functions are like a highly elastic rubber band that stretchable for fitting to the edge of the homogenous color regions. After subsequent iteration steps of energy minimization, at the step 108, the imaging system determines whether any attire templates fit to the homogeneous color region. Those templates that are not fitted to any homogeneous color region are discarded. Any fitted template shall be regarded as a detected attire, and hence a detected human individual.

[0025] In another embodiment of the present invention, if there is only one or no attire is detected in the group motion blob, the dispersion process is repeated with the same attire template. In this process, the motion blobs are expected to contain at least two persons, as the previous processes have filtered out motion blobs that contain only one attire. The dispersion process may be repeated on the marking region only but except the region with the detected attire.

[0026] The detected attire templates are superimposed on marking region 19 and will then be used for further human detection process in the human estimation component 103. During the human estimation 103, the difference in the width to height ratio (with respect to camera view) between an individual human and a group is utilized to estimate humans in the group. Further, by referring to the detected attire location, a template that consider radius of the object, such as an ellipse template, could be applied to better fit to the human subject. Both the ellipse shape template and the individual human ratio are expected to fit and extract an individual human subject from the group in the image. As such, the human estimation component 103 is able to estimate the entire human and extract more information within the region.

[0027] FIG. 5 illustrates a triple-attire template in accordance with one embodiment of the present invention. The triple-attire template comprises a frontal view and two side attire views; left and right of attire. An appropriate attire template is selected from the available template database 30 to suit different environments of the scene. For example, a casual shirt template is used for informal scenes, while a formal shirt template is used for formal scenes. [0028] The human detection method of the present invention is able to detect individual human subject from a group or a crowded scene, without the need for face or skin detection to initiate the detection process. This method enables individual human detection even in an occlusion, where skin and face of the humans may not available for initiation of the detection process. It also enables human detection from human's back view where the head or face of the human may not be obvious.

[0029] The feature of detecting human using his/her attire is beneficial as the attire worn by the human subject is the most salient property in most of the surveillance images, instead of skin color or the other human properties. The algorithm of the method of the present invention is designed to detect the human attires from the images without relying on skin color or any other human properties. Hence, the dependencies upon face or head properties of the human, which might not be available sometimes, are eliminated. [0030] While specific embodiments have been described and illustrated, it is understood that many changes, modifications, variations, and combinations thereof could be made to the present invention without departing from the scope of the invention.

Previous Patent: ASSESSMENT SYSTEM AND METHOD FOR INNOVATIVE LEARNING

Next Patent: PLANAR MICROPUMP WITH INTEGRATED PASSIVE MICROMIXERS