Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR FACE RECOGNITION
Document Type and Number:
WIPO Patent Application WO/2012/071677
Kind Code:
A1
Abstract:
A method of face recognition for a plurality of users, comprising: obtaining face samples of the plurality of users sequentially, and classifying the obtained face samples of at least one users to face sample set of each user by a dynamic similarity measure.

Inventors:
ZHOU JIE (CN)
CHENG PU (CN)
GU QUANQUAN (CN)
CHEN ZHIBO (CN)
Application Number:
PCT/CN2010/001911
Publication Date:
June 07, 2012
Filing Date:
November 29, 2010
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TECHNICOLOR CHINA TECHNOLOGY (CN)
ZHOU JIE (CN)
CHENG PU (CN)
GU QUANQUAN (CN)
CHEN ZHIBO (CN)
International Classes:
G06K9/62
Domestic Patent References:
WO2010062268A12010-06-03
Foreign References:
CN1912890A2007-02-14
CN101221620A2008-07-16
CN101464950A2009-06-24
Attorney, Agent or Firm:
KANGXIN PARTNERS, P.C. (Floor 16 Tower A, InDo Building,A48 Zhichun Road, Haidian District, Beijing 8, CN)
Download PDF:
Claims:
CLAIMS

1. A method of face recognition for a plurality of users, comprising:

obtaining face samples of the plurality of users, and

classifying the obtained face samples of at least one users to face sample set of each user by a dynamic similarity measure.

2. The method according to claim 1, further

comprising :

adjusting dynamically the face sample sets obtained previously based on the dynamic similarity measure.

3. The method according to claim 1 or 2, wherein the dynamic similarity measure is updated according to difference feature of the face samples obtained

previously .

4. The method according to any one of the preceding claims, wherein the dynamic similarity measure is denoted as a ratio of distribution of intrapersonal difference feature and distribution of extra-personal difference feature, and the distribution of intrapersonal difference feature and distribution of extra-personal difference feature are updated by mean of difference features obtained previously.

5. The method according to any one of the claims 2- 4, wherein the step of adjusting comprises deleting a face sample in a set or merging two sets.

6. A face recognition system comprising: a face sample set classifying unit for obtaining face samples of a plurality of users, and classifying the obtained face samples of at least one users to face sample set of each user by a dynamic similarity measure.

7. The system according to claim 6, further

comprising a face image database for storing the face image samples, and

an updating unit for adjusting dynamically the face sample sets obtained previously based on the dynamic similarity measure.

8. The system according to claim 7, wherein the updating unit is adapted to updat the similarity measure according to difference feature of the face samples obtained previously.

9. The system according to claim 8, wherein the face sample sets are deleted in a set or merged from two sets .

10. The system according to any one of the claims 6-9, wherein the dynamic similarity measure is denoted as a ratio of distribution of intrapersonal difference feature and distribution of extra-personal difference feature, and the distribution of intrapersonal difference feature and distribution of extra-personal difference feature are updated by mean of difference features obtained previously.

Description:
METHOD AND SYSTEM FOR FACE RECOGNITION

FIELD OF THE INVENTION

The present invention relates in general to face recognition of a facial expression in smart environment, and more particularly, to method and system for face recognition according to face image. BACKGROUND OF THE INVENTION

A smart environment should be able to identify users so that they can be delivered personalized services. For example, a smart TV can recommend programs based on the identity of the users; a smart store can recognize its regular customers and provide more personalized services; and a smart home can identify the family members and remind them important messages.

In most face recognition systems in smart environment, it is needed to collect face samples in a supervised manner to identify users. For example, the recognition systems should store face images of known users beforehand, and then compare obtained face images with the stored face images to make the recognition. However, in the context of smart environment applications, such as smart TV and smart store, it is not convenient and sometimes even impossible to collect the face samples of the users beforehand. So automatic face registration is needed, i.e. the face samples of each user should be collected and updated automatically during the process of the system. This is very meaningful for many smart environment applications. For example, the smart TV can automatically collect the face samples of the regular users and analyze their favorite programs according to their histories of watching TV programs; the smart store can automatically decide who are the best customers, and provide them with personalized services.

Some works on face recognition have focused on the problem related to automatic face registration. In a prior art, the recognition device started with an empty face image database, and learned various faces when interacting with persons. In this prior art, the recognition of unknown individuals will be based on the Euclidean distance from already stored ones. A model for a new person can be created if the highest relative match likelihood is lower than a given threshold.

In smart environment applications, there are usually great variations of head poses, lighting conditions and face expressions. These issues often make the existing methods fail in practical use.

SUMMARY OF THE INVENTION

The invention concerns a method of face recognition for a plurality of users, comprising: obtaining face samples of the plurality of users sequentially, and classifying the obtained face samples of at least one user to face sample set of each user by a dynamic

similarity measure.

The invention also concerns a face recognition system comprising: a face sample set classifying unit for obtaining face samples of the plurality of users

sequentially, and classifying the obtained face samples of at least one user to face sample set of each user by a dynamic similarity measure.

BRIEF DESCRIPTION OF DRAWINGS

These and other aspects, features and advantages of the present invention will become apparent from the following description of an embodiment in connection with the accompanying drawings:

Fig. 1 is a block diagram showing a configuration of a face recognition system according to an embodiment of the present invention;

Fig. 2 is an illustrative example showing two face images used for face recognition;

Fig. 3 is an illustrative example showing sample set updating according to the embodiment of the invention; and

Fig. 4 is a flow chart showing a method for face recognition according to the embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

In the following detailed description, a system and method for face recognition are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one skilled in the art that the present invention may be practiced without these specific details or with equivalents thereof. In other instances, well known methods, procedures, components and circuits have not been described in detail as not to unnecessarily obscure aspects of the present invention.

Fig. 1 is a block diagram showing a configuration of a face recognition system 100 according to an embodiment of the present invention. As shown in Fig.l, the face recognition system 100 is provided with an image input unit 101, a face detector 102, a face alignment unit 103, a face sample set classifying unit 104, a face image database 105, and an updating unit 107. The image input unit 101 can be a camera or any other video device, from which images of users in front of the image input unit 101 can be passed as input to the face detector 102. Here, the images are input to face detector 102 for later processing continuously and sequentially. The face detector 102 detects whether the image is face or not by matching the obtained image with a template face picture, and obtains information regarding location of faces within visual range of the image input unit 101. The technology of detecting face and obtaining its location information is known to one skilled in the art, and will 'not be described in detail. The technique is disclosed e.g., in a report by V. Vapnic (V. Vapnic (1999) The nature of statistical learning theory second edition, Springer) .

Then obtained face image will be sent from the face detector 102 to the face alignment unit 103. In the face alignment unit 103, a position and a size of the face image are normalized, for example, in such a way that an eye position is (16,24), (31,24) and image size has 46*56 pixels. That is, for obtaining a correct difference between two face images in the face sample set

classifying unit 104 to be described later, the face position needs to be found accurately. Moreover, the position of the eyes and the nose in a face differ from person to person, these positions need to be normalized.

The normalized face images are sent to face sample set classifying unit 104 for further process. Fig.2 is an illustrative example showing two face images used for face recognition and difference feature extraction processed in face sample set classifying unit 104. The difference feature extraction and sample set classifying process is executed at two stages: offline stage and online stage.

At offline stage, the face sample set classifying unit 104 will learn related knowledge, i.e., the

distributions of intra-personal and extra-personal difference features, by the face images in face image database 105 which only includes public face sample sets at this stage, and there is no real user's face sample sets. Here, face images for different individuals has been divided into different face image sets, and each set include at least one face image for the individual, and two classes of difference features are defined: intra- personal difference features r corresponding, for example, to different facial expressions of the same individual; and extra-personal difference features corresponding to variations between different individuals. Moreover, the knowledge of distributions of difference feature denotes the probability functions for same individual and different individuals in respect to a specific difference feature. The detailed description for the functions will be stated later.

According to the embodiment, a local appearance based method is used by the face sample set classifying unit 104 to present the face image sample. This method is based on appearances of local facial regions that are represented with discrete cosine transform coefficients. As shown in Fig.2, each face image is divided into N blocks of 8*8 pixels size. Then the discrete cosine

transform (DCT) is applied on each block to get DCT

coefficient vectors, from which the first M coefficient vectors are selected to form the local feature vector c i' ^' "" '^. The entire face image is represented by

concatenating the selected local feature vectors of all blocks of the face image. The detailed local appearance based method is disclosed in a report by H.K. Ekenel, R. St iefelhagen, "Local Appearance based face recognition using Discrete Cosine Transform", 13 th European Signal Processing Conference, Turkey, 2005. For two face images in Fig.2, the left face image can be represented as 'm = [ i!n' c 2m' ," '¾J and the right face image can be *n - c in' c 2n' c ?in] by the above local

appearance based method. The difference features fdi between the two face images are represented as

fdj = dl L1 (c im , c in )^ where i = l>"">N, and d L1 (c im ,c i:n ) is the distance between two vectors. The Li distance is known to one skilled in the art as the sum of absolute value of element difference in the two vectors.

According to the above mentioned method, two classes of difference features fd: intrapersonal difference features and extra-personal difference features are obtained from the public face image at offline stage.

Then in order to compute the similarity between two face image samples, we assume the difference features follow Gaussian distribution with mean m and variance σ, for example, according to the embodiment the average (mean) of difference features between obtained face image samples will be the mean m of the Gaussian distribution, and the variance σ of the Gaussian distribution is denoted as the following equation:

σ = [∑ ( fdi-m) 2 ] /n (n is the number of fd)

Then from the public face images in database at offline stage, the distributions of intra-personal and extra-personal difference features Gi(fd) and G E (fd) can be obtained according to the Gaussian distribution probability functions respectively. That is, if the difference feature fd between two face images is

calculated, the probability belonging to the same person can be determined from Gi(fd), and the probability belonging to different persons can be determined from G E (fd). The description of Gaussian distribution of difference features for face images was recited in a report by B. Moghaddam, T. Jebara, and A. Pentland,

"Bayesian face recognition," Pattern Recognition, 2000.

In order to classify the input face images into image set (image cluster) of same person, a threshold value Thi is determined according to the above two distributions Gi(fd) and G E (fd). For example, the

threshold value can be 0.8-1.2. If Gj ( fd) /G E ( fd) >Thx , then the two face images with difference feature fd belong to the same person. Here Gi ( fd) /G E ( fd) is defined as a similarity measure between to face images. The original similarity measure can be got by the face sample set classifying unit 104 according to the difference features of the original face images in the database, and shared with updating unit 107.

According to the embodiment, denote the already- collected face images sets at offline or online stage as

{S lJ , , , , S L } i where is the q th face image cluster for one person, and is the r th sample in the q th cluster.

Then when face image m is input from image input unit 101, face detector 102 and face alignment unit 103 to face sample set classifying unit 104, the nearest

S I

sample <* r to the input image 1 m shall be computed based on the following equation:

(qV*) = argmin d E (l m , S qr )

Here, Euclidean distance is used to get the nearest distance in the embodiment. In this equation, is the Euclidean distance between the local appearance based representations of and ^2. "argmin" is used to represent the arguments corresponding to the minimum value. After

ς . .

getting the nearest sample <¾ r , extract differencefeature fd from ] m and S o * r " . If d ( fd) /G E (fd) <=Thi, then a new cluster is created and ^L-S-I since they are not the same person. Otherwise, put *m in the cluster ^q.

Considering the storage space issue in practical use, the oldest samples will be deleted to make sure each cluster has at most K samples.

According to the embodiment, the face samples of users are classified to face sample set of each user by a dynamic similarity measure. That is, the distribution function of int ra-personal and extra-personal difference features Gi(fd) and G E (fd) is changed as a function of the online-collected input face samples by updating unit 107 to make the system robust to the varying environmental conditions .

For example, denote IS as the difference feature set used to adjust G T (fd), and ES as the difference feature set used to adjust G E (fd) . Here, the difference feature set is obtained by calculating difference feature between the samples online-collected sequentially during a specific period. Given the online collected dataset at this period is the method for revising the difference feature distributions includes the following five steps:

Step 1. IS=0, ES= .

Step 2. For sample pairs (^ίπΐ' ^ί η ) , extract difference features fd from them. Here, Si is the i th face image cluster for one person, and S m is the m th sample in the i th cluster. S jn has the similar definition.

Step 3. If i=j, and d ( fd) /G E ( fd) >Thi, IS=IS u ( fd }.

That is, put the difference feature fd into IS. If 1 ^ I , and Gi (fd) /G E (fd) <=Thi, ES=ES u ( fd }. That is, put the difference feature fd into ES . Step 4. Go to step 2 until all the sample pairs have been compared.

Step 5. Set the mean of G^fd) as the mean of IS, and the mean of G E (fd) as the mean of ES so that the

similarity measure GI ( fd) /G E ( fd) is updated for the next face samples. The updated similarity measure is provided to face sample set classifying unit 104.

At last, the system will adjust the collected sample set in the database by updating unit 107. This includes a sample deletion procedure and a cluster merging procedure for sample clusters in the database 105, which include the following 6 steps:

Step 1: For ^sm, calculate

where ^ is the difference features extracted from s im and s in , and if Gi ( fd) /G E ( fd) <=Thi, S m will be recorded as a member of k. Here k is the total number in the sample set Si matching the equation GI ( fd) /GE ( fd) <=Thl .

Step 2: If I S il is bigger than a threshold for example 0.8, delete sample ^im in set Si. Here is the element number of the set A. That is, ^im. has a large distance with most of other members in set Si, so is not a member of the set.

Step 3: Go to Step 1 until all the samples have been processed.

Step 4: For two clusters S a and s b, calculate

t = |((S aKJ S by ) |a≠ b, and GI(fd)/GE(fd) > Thl}|

where ^ is the difference features extracted from c

and fa y. Here t means the total number of samples in the two clusters S a and s b that should belong to the same cluster based on G ( fd) /G E ( fd) >Thi . That is, although these samples are divided into two clusters, they need to be merged into one cluster or classified as same person.

Step 5: If t/ClSj + l s bl) is bigger than a threshold for example 0.8, merge and **b . Delete the oldest samples to make sure the new cluster has at most K samples. Here K is the fixed number for one cluster.

Step 6: Go to step 4 until no clusters can be merged. Fig.3 gives a typical result after adjusting the sample set. Note that the sample set adjusting procedure can correct the mistakes, thus improving the robustness of the system. As shown in Fig.3, two face samples are deleted in cluster 1 and cluster 2 respectively in

Fig.3(a) and get Fig.3(b), and in Fig.3(b) clusters 2 and 3 are merged to get Fig.3(c) .

Fig. 4 is a flow chart showing a method for face recognition according to the embodiment of the invention. In the initial period, there is only a small quantity of samples in the sample sets. The system will classify the input face image based on the learned similarity measure directly, and update the sample set accordingly. When there are sufficient clusters, it will first update the similarity measure according to the collected sample set, and then classify the input face. There will be a sample set adjusting procedure at last to help the system

correct the mistakes and adapt to the changing

environmental conditions. As shown in Fig.4, similarity measure of face image clusters in the public database is obtained at offline stage at step S401, and then at

online stage the face images of users is input at step S402. Here, the face images are obtained sequentially, for example the face images can be obtained and processed at any time when there are users in front of the image input unit 101. That is, the face images obtained early are processed earlier. At step S403, the obtained input face images are classified based on the above similarity measure, and then update the similarity measure based on the difference features of the input face images obtained sequentially during a period at step S404. That is the face samples of users are classified to face sample set of each user by a dynamic similarity measure, and the original similarity measure is updated by difference feature of newly obtained face images. Then at step S405, the face sample sets of users in the database are

adjusted dynamically based on the dynamic similarity measure .

After classifying the face images into each cluster respectively, the cluster will be annotated a specific name or sign to distinguish each other, so that the smart environment can identify users according to the received face images .

To evaluate the effectiveness of our method, we use the FERET (Facial Recognition Technology) database as the offline face dataset and Honda/UCSD video database as the online test dataset. The FERET database consists of 14051 face images representing 1199 individuals. The images contain variations in illumination conditions, facial expressions, head poses etc. We use the frontal face images in FERET database to learn the distributions difference features as described above. The Honda/UCSD video database is commonly used in video-based face recognition research. It contains 75 video sequences from 20 human beings. These video sequences are recorded in an indoor environment at different times, and contain significant head rotations. Every individual is recorded in at least 2 videos. We assume these videos are the ones collected during the initial using process of the system, and automatic face registration will be achieved on these videos. We extract all the images from the videos, and these images will be supplied to the system in a sequence for face registration.

A successful face registration method should avoid the mistakes of placing an individual's faces into multiple clusters or grouping different persons' faces into one cluster. To make a quantitative evaluation of the face registration methods, we view the finally- collected training set is obtained through a series of decisions. A true positive (TP) decision assigns two face images of the same person to the same cluster, and a true negative (TN) decision assigns two face images of

different persons to different clusters. The false positive (FP) decision and false negative (FN) decision are defined similarly. We compare two methods. The baseline method detects frontal faces and two eyes, aligns the faces based on two eyes, and recognizes unknown person by thresholding the Euclidean distance between the local appearance based representation of the input face image and that of the already stored samples, just as the existing methods. The second method is our method. In the experiment, when there are 10 clusters, the system begins to update the knowledge. We compute the TP rate of these methods with a FP rate of 10%. The results are shown in the following table:

From which we can see that the use of dynamic similarity measure as well as the online sample set adjustment procedure has significant effect to improve the performance of face registration.

The foregoing merely illustrates the embodiment of the invention and it will thus be appreciated that those skilled in the art will be able to devise numerous alternative arrangements which, although not explicitly described herein, embody the principles of the invention and are within its spirit and scope.