Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
FACE RECOGNITION SYSTEM AND METHOD
Document Type and Number:
WIPO Patent Application WO/2005/096213
Kind Code:
A1
Abstract:
An automatic face recognition system comprises a detector continuously acquiring an image(s) from a live video source, a face-of-interest (FOI) tracker for tracking multiple faces with random motion, filtering and selecting qualified faces, and a processor for recognizing whether detected faces are known or unknown faces, creating new datasets for recognized unknown faces, adaptively updating existing datasets in the database for recognized known faces, merging redundant face databases and removing face databases of non-interest. The system is high efficient and is capable of dealing with multiple persons showing up at the same time, without any requirement for any user interaction and human supervision or assistance.

Inventors:
Rothermel, Albrecht (Baumgartenstr. 38, Neu-Ulm, 89231, DE)
Mou, Dengpan (Gutenbergstr. 26/4, Ulm, 89073, DE)
Schweer, Rainer (Lärchenweg 12, Niedereschach, 78078, DE)
Application Number:
PCT/EP2005/001988
Publication Date:
October 13, 2005
Filing Date:
February 25, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
Thomson, Licensing (46 Quai A. le Gallo, Boulogne-Billancourt, F-92100, FR)
Rothermel, Albrecht (Baumgartenstr. 38, Neu-Ulm, 89231, DE)
Mou, Dengpan (Gutenbergstr. 26/4, Ulm, 89073, DE)
Schweer, Rainer (Lärchenweg 12, Niedereschach, 78078, DE)
International Classes:
G06K9/00; G06K9/00; (IPC1-7): G06K9/00; G06K9/62
Attorney, Agent or Firm:
Rossmanith, Manfred (Deutsche Thomson-Brandt GmbH, European Patent Operations Karl-Wiechert-Allee 74, Hannover, 30625, DE)
Download PDF:
Claims:
Claims
1. A face recognition system comprising: a detector continuously obtaining images from video sequences and outputting information of whether or not face(s) and eyes are detected in the acquired images; a face of interest (FOI) tracker for tracking multiple detected faces with random motion, filtering and selecting faces; a processor for recognizing whether selected faces are known or unknown faces, creating new databases' for recognized unknown faces, adaptively updating existing databases for recognized known faces, merging redundant face databases and removing face databases of noninterest.
2. The face recognition system according to claim 1, characterized in that the face of interest (FOI) tracker comprises: a face extractor, which is used to extract each detected face from a frame image; a face region separator, which is used to divide a frame image into multiple face check regions according to the number of extracted faces; a face boundary detector, which is used to examine whether a current frame contains any faces in boundaries corresponding to the face check region defined in a last preceding frame image; a face verifier, which is used to determine whether a face examined in the current frame image is as same as the corresponding face in the last preceding frame; a frame filter, which is used to buffer the results of the face boundary detector and the face verifier, and filter them to finally determine the number of faces existed within a predetermined length of frames and which one is which; and a quality selector, which is used to select a qualified face for further processing in the processor.
3. The face recognition system according to claim 2, characterized in that the extraction of face is based on the proportion between the eye distance and the face width and height.
4. The face recognition system according to claim 2, characterized in that each face check region is separated based on the eye distance and a predefined highest motion speed of human.
5. The face recognition system according to claim 2, characterized in that when the similarity between the current examined face and the corresponding face in the last preceding frame is no less than a predefined verifying similarity threshold (VST) , the current examined face is determined as same as the corresponding face in the last preceding frame by the face verifier. β.
6. The face recognition system according to claim 5, characterized in that the selection of the quality selector is based on a predefined selecting similarity threshold (SST) , which is bigger than the threshold VST and smaller than 100%, a face is selected only when the interframe similarity of the face is less than the SST so as to keep less redundant information in the database.
7. The face recognition system according to any one of the preceding claims, characterized in that the processor comprises: a database creator, which is used to create a face database by enrolling face images selected from the face of interest (FOI) tracker, each face database being able to enrol multiple different face images of one face; a recogniser, which is used to compare the selected face with all face databases and output an instant similarity value (ISV) between the current selected face and the most similar face in face databases as a result of the comparision; and a first filter defined with an adaptive recognizing threshold (ART) , which is used to identify the selected face as a known face, when the ISV of the selected face is no less than the adaptive recognizing threshold (ART) , or identify the selected face as an unknown face, when the ISV of the selected face is less than the adaptive recognizing threshold (ART) ; a second filter defined with an adaptive updating threshold (AUT) , which is used to determine whether or not to update a database with the current selected face, if ISV of the selected face is less than the AUT, the corresponding face database is updated with the selected face; a database merger, which is used to merge redundant face databases; and a database remover, which is used to delete occasional databases with only few face images enrolled as well as databases without being updated for a long time.
8. The face recognition system according to claim 7, characterized in that the first filter is designed so that if the following condition is fulfilled, the current face is identified: ∑aχ*Sv, i > ART wherein a^ is a coefficient, n is a filter length, SVr± denotes each similarity value of a certain image compared with a database.
9. The face recognition system according to claim 7, characterized in that the current face is enrolled if the following equation is fulfilled: ART2 < Sv < AUT, and ne < nth wherein AUT is decreasing for each database growing from an initialised state to a stable state, ART2 denotes a threshold which is only slightly smaller than ART, ne is the current number of enrolled face shots in a certain database, and nth is the threshold number of enrolled face shots which indicates the saturation of a database.
10. The face recognition system according to claim 7, characterized in that a maximum number of face images of the face database is predetermined.
11. The face recognition system according to claim 7, characterized in that a third filter defined with a threshold of time parameter is provided to be used to keep the face databases updating with the person's latest visage based on the time period parameter threshold.
12. The face recognition system according to claim 7, characterized in that the adaptive recognizing threshold (ART) is adaptively changeable according to the size of a database, the ART being set to be lower for a database with fewer number of enrolled faces and higher for the database with more faces, so as to ensure the privilege of a database with few number of faces to enrol more images than a database with sufficient number of faces.
13. The face recognition system according to claim 7, characterized in that the change of the ART further depends on the value of false acceptance rate (FAR) , which is the possibility of face shots from another persons wrongly enrolment, and false rejection rate (FRR) , which is the possibility of wrong rejection of the face shots from the same person, an ART being selected only when the FAR is achieved low enough.
14. The face recognition system according to claim 7, characterized in that the AUT is always bigger than the ART and inversely proportional to the size of a database.
15. The face recognition system according to claim 7, characterized in that database merger calculates mutual similarity values (MSV) between each databases and mergers databases with MSV bigger than a predefined threshold.
16. A method for recognition multiple faces with random free motion comprising steps of: (a) acquiring images from video sequences and outputting information of whether or not face(s) and eyes are detected in the acquired frame images; (b) tracking multiple detected faces with random motion, and filtering and selecting qualified faces for further processing; (c) recognizing whether the selected faces are known or unknown faces; (d) creating new databases for recognized unknown faces; (e) adaptively updating existing databases for recognized known faces; and (f) merging redundant face databases and removing face databases of noninterest when no face image acquired for a predefined period time.
17. The method as claimed in claim 16, characterized in that step (b) further comprises steps of: extracting detected face(s) from a frame image; dividing the frame image into multiple face check regions according to the number of extracted faces; detecting whether a current frame contains any faces in a boundary corresponding to the face check region defined in a last preceding frame image; verifying whether a face examined in the current frame image is as same as the corresponding face in the last preceding frame; buffering and filtering the results of the detecting and verifying step in' order to determining the number of faces existed in the frames and which one is which; and selecting qualified face(s) for further processing.
18. The method as claimed in claim 17, characterized in that the detected face(s) is extracted based on the proportion between the eye distance and the face width and height.
19. The method as claimed in claim 17, characterized in that the frame image is divided into multiple face check regions based on the eye distance and a predefined highest motion speed of human.
20. The method as claimed in claim 16, characterized in that step (c) comprises steps of: calculating and outputting results of instant similarity value (ISV) between the selected face of current frame and all face databases; comparing the ISV results of the selected face with an adaptive recognizing threshold (ART) ; identifying the selected face as a known face, when the ISV of the selected face is no less than the ART, or identifying the selected face as an unknown face, when the ISV of the selected face is less than ART.
21. The method as claimed in claim 16, characterized in that the adaptive recognizing threshold (ART) is adaptively changeable according to the size of the database, the ART being set to be lower for a database with few number of enrolled faces and higher for the database with more faces, so as to ensure the privilege of a database with few number of faces to enrol more images than a database with sufficient number of faces.
22. The method as claimed in claim 16, characterized in that the change of the adaptive recognizing threshold (ART) further depends on the value of false acceptance rate (FAR) , which is the possibility of face shots from another persons wrongly enrolment, and false rejection rate (FRR) , which is the possibility of wrong rejection of the face shots from the same person, an ART is selected only when the FAR is achieved low enough.
23. The method as claimed in claim 16, characterized in that step (e) includes steps of: comparing the results of ISV of the selected current face with an adaptive updating threshold (AUT) ; updating a database with the selected current face, if the ISV result of the selected face is less than the adaptive updating threshold (AUT) .
24. The method as claimed in claim 23, characterized in that the adaptive updating threshold (AUT) is always bigger than the adaptive recognizing threshold (ART) and inversely proportional to the size of a database.
25. The method as claimed in claim 16, characterized in that step (f) includes: calculating mutual similarity values (MSV) between each two databases and merging databases with the MSV bigger than a predefined threshold.
26. The method as claimed in claim 16, characterized in that the method further comprises a step of: updating the face database with person' s latest visage based on a predefined time period parameter threshold.
Description:
Face Recognition System and Method

FIELD OF THE INVENTION The present invention relates to a face recognition system and method being capable of recognizing multiple human faces with random free motion.

BACKGROUND OF THE INVENTION Automatic face recognition from video has been studied for many years and is starting to be widely used in daily identification systems, e.g. automatic banking and access control etc., in communication systems, e.g. teleconferencing and video-phone etc., and in public security systems, e.g. criminal identification, digital driver license etc.

One important aspect of the face recognition system is the face detection. Using motion detection as a pre¬ processing step and then detecting faces by applying image- based face detection techniques is commonly applied in many video-based face processing systems. However, using motion as a first step for face detection is subject to meet difficulties in some real-world cases without any human supervision. Background subtraction and frame differencing are two principal methods to detect motion.

The background subtraction method requires a static background. If the background is changing, however, there are unavoidable errors when subtracting the initialised background from the current frame. Complicated adaptive background models are to be introduced as compensation, at the expense of significant computational efforts. The frame differencing method might not work properly, when a face is not apparently moving, or other moving objects exist in the background. Failure of the pre-processing step leads to the failure of face detection. Another important aspect of the automatic face recognition system is the face database, which are vital to the success of face recognition. A database composed of a sufficient number of qualified images per face performs much better than a database with few number and randomly selected images per face. There are two conventional ways to construct face databases from video sequences.

One conventional way to build face databases for recognition needs a separate face registration or enrolment procedure. A human supervisor is normally required to store selected faces for each person as a certain format into databases for further recognition. This is also referred to as a pre-training procedure before recognition, in which the face images from a certain person are carefully selected by a human supervisor. Those chosen face shots are then encoded in a certain format and stored into the corresponding database. The selection criteria are dependent on the recognition methods. Most robust recognition systems collect various face shots from a certain person: under varying lighting conditions, multiple views, different head poses and expressions etc. Although those systems achieve good performance, the training procedure itself demands significant efforts for a human supervisor.

The other way for face database construction requires little effort from the human supervisor but mainly the co- operation of the subjects. This approach still can be annoying to the subjects. A supervisor normally asks them to change their poses and expressions from time to time during the training phase. An improved way to decrease the effort from a human supervisor is asking the subjects to follow some predefined rules. A completely automated system with no help from outside assistance and no requirement for users is still a big challenge. Recently, there are advances in automatic and unsupervised face recognition which greatly reduce the demands for supervisor and subjects. However there are still several common limitations of the conventional automatic systems.

Firstly and most importantly, they assumed that there is always only one face existing in a certain sequence. The occurrence of multiple people or a sudden change from one person to another leads to failure. Although not likely to happen in video sequences from live cameras, the latter case often occurs in films or TV programs. The one-face assumption greatly decreases the complexity of the automatic procedure. But it is actually a very strict requirement for the environment and has few practical application prospects. Moreover, all the observed systems rely on certain existing face detection algorithms to detect faces, and the whole automatic procedure may get into trouble whenever the face detection step fails.

Additionally, all the observed systems achieve an equivalent or even a worse recognition rate than the specific applied face recognition methods taken alone. Another crucial point is that most systems are only concentrating on how to enroll databases to achieve reasonable recognition rate but little research is going on automatically keeping the efficiency of online databases, which includes updating databases by considering recent views, keeping the variety of enrolled face shot etc.

The most critical part of such an automatic system is the enrolment procedure, which generally makes two demands. One is how to quarantine a certain face of interest (FOI) remains the same during enrolment. More people may show up at the same time and the system should not be confused of who is who, although they might be all unknown or partly unknown to the databases. Different faces should be forbidden to be enrolled as one person. The other one is how to select images which are representative enough with very limited number of images, which also reflect the changing of the faces over time (including new makeup, new glasses and aging) .

Another important part of such an automatic system is to keep the database efficiency. Redundant databases should be eliminated. It can happen quite often that one person was enrolled as different face databases. In many applications, occasional faces that only happen to occur for a very short time is none of interests. Such databases should be examined and removed automatically.

Therefore, an intelligent face recognition system which can run automatically and unobtrusively with high quality performance is desirable.

SUMMARY OF THE INVENTION The face recognition system of the invention is capable of dealing with above mentioned problems and therefore to be applicable to any commercial activities requiring face recognition.

The present invention suggests an automatic, unobtrusive and unsupervised real-time system recognizing multiple faces and creating adaptive databases from video signals, which comprises: a detector continuously obtaining images from video sequence (s) and outputting information of whether or not face(s) and eyes are detected in the acquired images; a face of interest (FOI) tracker being capable of tracking multiple detected faces with random motion, filtering and selecting qualified faces; and a processor for recognizing whether the selected faces are known or unknown faces, creating new databases for recognized unknown faces, adaptively updating existing databases for recognized known faces, merging redundant face databases and removing face databases of non-interest.

Advantageously, the inventive system not only can run as it is for any period of time, with no requirement for the user's behaviour, with no interruption to the user, and with no requirement of any human supervision or help, but also can automatically and passively recognize multiple persons showing up at the same time with free and random motion.

The face of interest (FOI) tracker of the inventive system comprises: a face extractor, which is used to extract each detected face from a frame image; a face region separator, which is' intended to divide a current frame image into multiple face check regions, the number of the face check regions corresponding to the number of extracted face regions; a face boundary detector, which is used to examine whether a current frame contains any faces in boundaries corresponding to the face check region defined in a last preceding frame image; a face verifier, which is used to determine whether a face examined in the current frame image is same as the corresponding face in the last preceding frame; a frame filter, which is used to buffer the results of the face boundary detector and the face verifier, and filter them to finally decide how many faces there are in frames and which one is which; and a quality selector, which is used to select a qualified face for further processing in the processor.

Advantageously, the inventive system uses an image- based face detection as a main method and a temporal-based motion detection as a post-processing step which effectively decreases the failure rate of the face detection and reduces the complexity of the detection computation.

The processor of the inventive system comprises: a database creator, which is used to create a face database by enrolling face images selected from the face of interst (FOI) tracker, each face database being able to enrol multiple different images of one face; a recogniser, which is used to compare the selected face with all face databases and output an instant similarity value (ISV) between the current selected face and the 'most similar face in the face databases as a result of the comparison; a first filter defined with an adaptive recognizing threshold (ART) , which is used to identify the selected face as a known face, when the ISV of the selected face is no less than the ART, or identify the selected face as an unknown face, when the ISV of the selected face is less than the ART; a second filter defined with an adaptive updating threshold (AUT) , which is used to determine whether to update a face database with the current selected face, if an ISV of the selected face is less than the AUT, the corresponding face database is updated with the selected face; a database merger, which is used to merge redundant face databases; and a database remover, which is used to delete the occasional databases with only few face images enrolled as well as the databases without being updated for a long time.

Advantageously, the inventive system is continuously keeping the variety of enrolled faces, merging redundant faces and removing faces of non-interest, consequently a face database with sufficient number of qualified images per face is achieved and the recognition rate is enhanced, and therefore efficiency and reliability of the system are improved.

The processor of the inventive system further comprises a third filter defined with a time parameter threshold, which is used to keep the face database updating with person's latest visage based on the predefined time parameter threshold.

Advantageously, the inventive system is automatically and continuously updating the face databases by most recent views.

BRIEF DESCRIPTION OF THE DRAWINGS Fig.l shows a flow diagram explaining the conception of the intelligent face recognition system according to the present invention; Fig.2 shows a flow diagram illustrating one embodiment of the intelligent face recognition system according to the present invention; Fig.3 depicts estimation of the face check region in next frame from the face location of the current frame; and Fig.4 and table 1 respectively show a part of samples and the experiment comparison results of the image sets.

DETAILED DESCRIPTION WITH PREFERRED EMBODIMENTS According to the present invention, an intelligent face recognition system is provided, which automatically recognizes known/unknown multiple faces from real-time video signals, creating or adaptively updating the corresponding face databases, and particularly being capable of dealing with multiple persons showing up at the same time, without any requirement for any users and any human supervision or assistance.

The inventive system can be based on any reasonable image-based face detection and recognition method. As an exemplary embodiment, the image-based face detection and recognition technology used in FaceVACS of Cognitec Systems GmbH is applied in the present invention. The face recognitions system according to the present invention generally comprises a detector, a face of interest (FOI) tracker and a processor. Wherein the detector is intended for continuously obtaining images from video sequence (s) and outputting information of whether or not face(s) and eyes are detected. The face of interest (FOI) tracker consists of a face extractor, a face region separator, a face boundary detector, a face verifier, a frame filter and a quality selector. The processor of the system mainly comprises a database creator, a recognizer, a first filter of adaptive recognizing threshold (ART) , a second filter of adaptive updating threshold (AUT) , a third filter of time parameter threshold, a database merger and a database' remover.

The processing of the face recognition system according to the present invention essentially contains three main parts: face detection, face recognition and database construction parts.

As the first and crucial step, the process of face detection combines an image-based face detection with a temporal-based face detection. Instead of using motion detection as a pre-processing step and then detecting faces by applying image-based face detection techniques as mentioned in the foregoing prior art, the detector of the inventive system performs image-based face detection as a main method, and combine together with the FOI tracker to perform a supplementary detection as a post-processing procedure by utilizing motion information of the face of interest.

Several benefits are achieved in this way. Apparently, as the first step, image-based face detection algorithms are generally the most robust and can achieve much higher face detection rate than common motion detection. However, when the first step fails, the corresponding motion information can still help to complete the face detection procedure, which further increases the detection rate.

The face of interest (FOI) tracker of the inventive system is based on: whole image difference between a current and a preceding image; face region difference between the current and the preceding image; eye region movement between the current and the preceding image.

After the image-based detector determines that a current frame contains any faces, the face extractor of the FOI tracker is used to extract each detected face region from the frame image with the corresponding eye positions. The extraction is based on the proportion between the eye distance and the face width and face height.

Then the face region separator divides the current frame image into several face check regions, the number of the face check regions corresponding to the number of extracted face regions. Each face check region is defined based on two parameters. One is the eye distance. The other one is a predefined highest motion speed. Normally, a certain face cannot move beyond the corresponding check region boundary in the next coming frame, even with the highest possible speed. The separator guarantees that one face shown up in a certain defined face check region in a current frame must be in the same check region in next frame.

The face boundary detector is used to examine whether a current frame contains any faces in boundaries corresponding to the check regions defined in the last preceding frame image. If yes, the current face is considered to probably remain the same as in the last frame, and is to be further checked with the downstream face verifier. If not, either the face might be occluded with another object, or due to a too much rotation, a sudden lighting change etc. that the face extractor fails. In this case, the face check region is reserved for the next frame image and is to be further examined in the downstream frame filter.

Afterwards, the face verifier is used to further compare the current examined face with the corresponding face in the last frame. If the similarity between these two faces is no less than a predefined verifying similarity threshold (VST), they are determined to be the same face. If the similarity is below the threshold (VST) , the face is further checked in the downstream frame filter.

Next, the frame filter, with a length of certain frames (e.g. 7 frames), buffers the results of the face movement detector, i.e. the face boundary detector and the face verifier, and filters them to finally decide how many faces there are in those frames and which one is which. There are different cases that the filter has to deal with. Firstly, within the filter length, if the face verifier always verifies that a certain face check region contains the same face, the downstream quality selector is to select the best quality face among the frames for further processing. Secondly, compared with a certain face in the last frame, if the current face is not verified by the face verifier, the face is to be further checked in the next frame. Thirdly, if no face is detected in a certain face check region for more than a certain number of frames (e.g. 4), the old face as well as the face check region is removed from the filter buffer. Fourthly, although the face boundary detector fails to detect any face in a certain face check region in one or two frames, it detects a face in all the other frames. If the faces in those frames are verified by the face verifier, it is turned to be the first case. If not, it is turned to be the second case. Fifthly, if the face verifier indicates that, in .one or more frames, the detected faces in a certain face region belong to a same face, while in another frames, they belong to another new face, the old face is removed and the new detected face is added in the filter buffer. This -case may not happen for video signal from a live camera, but it can exist in films or TV programs with a sudden shot change.

Subsequently, the quality selector decides whether to select a certain face in the current frame for further processing in the processor. Selection is based on a predefined selecting similarity threshold (SST) . The threshold SST is bigger than the threshold VST and smaller than 100% (which means exactly the same) . If a certain face in different frames has little change, i.e. if the inter- frame similarity is above SST, it is not necessary to select the face in each frame, but only one frame for comparing with the face databases. But whenever one face in a certain frame with an inter-frame similarity below SST, it is selected for further comparison. This selector guarantees that each face selected for further comparison has much less redundant information than taking every detected face.

In the face recognition procedure, the database creator creates a face database by enrolling the current face selected form the FOI tracker. A certain database can contain many enrolled face images for a certain face. There is a maximum number of face images that can be enrolled into one face database. The recogniser is used to compare the current face selected from the FOI tracker with all face databases and outputs the instant similarity value (ISV) between the current face and the most similar face in face databases.

Particularly, the first filter defined with an adaptive recognizing threshold (ART) is used to identify the selected face as a known face, when the ISV of the selected face is no less than the adaptive recognizing threshold (ART) , and to identify the selected face as an unknown face, when the ISV of a selected face is less than the adaptive recognizing threshold (ART) .When the selected face is identified as an unknown face, a new database is created by enrolling the selected face. The adaptive recognizing threshold (ART) is adaptively changeable according to the size of a certain database. For a database with few number of enrolled faces (from the same person) , ART is set to be lower. For the database with many faces (from the same person) enrolled, ART is set to be higher. That means a database with few faces enrolled has a privilege to enrol more images than a database with enough number of faces. ART is also robust enough to deal with the problem that a database with more images enrolled has a higher false recognition probability. Together with said FOI tracker, the filter of ART is introduced so that no face from other persons is wrongly enrolled into a certain database.

The choice of ART should rely on the value of two important parameters: false acceptance rate (FAR) which is the possibility of face shots from another persons wrongly enrolled and false rejection rate (FRR) which is the possibility of wrong rejection of the face shots from the same person. FAR is much more harmful for automatic procedures. With only one wrong face shot from another person enrolled, a certain database may continuously enrol more erroneous mug shots from the same wrong person. Therefore, ART should be selected that a low enough FAR (e.g. bigger than 99%) is achieved.

Furthermore, when comparing a single frame with a certain database, the probability of being wrong is relatively higher than comparing several images. A simple filter is consequently designed. If the following condition is fulfilled, the current face is identified: ∑ai*Sv, i > ART where a± is the coefficient, n is the filter length, Sv,i denotes each similarity value of a certain image compared with a database.

Besides the basic rule according to low FAR, ART is also set adaptive and different for different database. A database of more numbers of face shots enrolled corresponds to a higher ART. The adaptive recognizing threshold (ART) is introduced to guarantee the purity of a certain database even with multiple faces showing up at the same time. Purity means that the database for a certain person does not contain face images from any other persons.

The second filter defined with an adaptive updating threshold (AUT) is introduced to determine whether to update a certain database with the selected current face. If ISV is less than AUT, the current face is used to update the corresponding database. AUT is always bigger than ART, and is inversely proportional to the size of a certain database. The purpose of ART is to keep the variety of a certain database so that the same person under different conditions can be correctly identified. To save computing power, if one database achieves its maximum number capacity, it does not continue to be updated based on AUT but according to a time threshold.

Additionally, the third filter of time parameter is used to keep updating with the person's latest visage based on a predefined time parameter threshold, e.g. hour or day unit etc.

The database merger is used to merge redundant face databases. Said FOI tracker and ART filter assures the purity of a certain database, but they might in turn create redundant databases, i.e. there are more than one databases existing for one person. The merger calculates mutual similarity values (MSV) between each database and combines those databases with MSV bigger than a predefined threshold.

The database remover can be used to delete the occasional databases with only few face images enrolled as well as the databases without being updated for a long time.

According to the face recognition system of the present invention, the construction of an automatic and online face database is made passively and should be able to manage people with random motion. The constructed databases are designed to have the following features: Purity — no face shot from any other person is allowed; Variety — a database only enrols various enough face shots; Rapidity — at the beginning of building a new database, a rapid growth of the database is always required; Updatability — the database should be able to keep up with recent views of persons; Uniqueness — each person should have an only one database, multiple databases for one person might lead to confused identification.

The variety is apparently required for a database to be complete enough to identify a person in different views, head poses or facial expressions. It is crucial to keep the low false rejection rate (FRR) . But there are two distinguished states to be noticed. One is the initialised state the other is a stable state. In the initialised state when a new database is created, a rapid growth is important. In principle, before saturation, the larger the number of the face shots of a certain database, as long as they are not the same, the lower false rejection rate is achieved. A less strict rule should be hence used to enrol more face shots in this state. In the stable state, however, the selection of enrolled face shots should be more concentrated on their variety. More various face shots are to replace the previously enrolled more similar face shots. An adaptive updating threshold (AUT) is used to make the selection in a floating way. One mug shot is enrolled if the following equation is fulfilled: ART2 < Sv < AUTr and ne < nth Where AUT is decreasing for each database growing from an initialised state to a stable state, ART2 denotes a threshold which is only slightly smaller than ART, ne is the current number of enrolled face shots in a certain database, and nth is the threshold number of enrolled face shots which indicates the saturation of a database. The database in saturation is assumed to have enough face shots enrolled for identifying the same person.

The introduction of ART2 contributes to the database purity. Purity is the most important feature of the databases. A robust recognition technique is needed. Hence, when the image-based face recogniser fails, but a face is still identified, the enrolment should be careful enough. Face shots with Sv much smaller than ART are discarded from enrolment to avoid bad quality face shots that might result in failure. 'As mentioned earlier in the recognition procedure, the filter with several Sv buffered for recognition also make contributions to the purity.

Another important feature for a successful database is to keep up with recent views of faces. Face shots tested from different days statistically have much more difference than face shots from the same day. Enrolling a few positively recognized face shots from every day can improve the FRR. A time parameter is introduced to trigger the update of a saturate database. To keep the information from old days, only part of the databases is updated, i.e. only a certain number of face shots are selected to be replaced. The replaced face shots are from the oldest days and have most similar characterizing values when compared to others.

Since the violation of uniqueness is less harmful than that of the purity, databases are tolerant to this during their construction. But those databases are to be merged after careful estimation. The mutual similarity values (MSV) between each database pairs are calculated. If MSV is bigger than ART, each face shots from one database are further checked with the other. If an enough percentage of face shots in one database is identified with the other, the two databases are merged. This further check avoids a wrong merge. Since the calculation for merge needs much more processing power than other procedures, it is only enabled during an idle period when no faces are detected for a certain length of time.

As shown in Fig.l, the conception of the face recognition system according to the present invention is explained in a schematic flow diagram. New video images are continuously acquired from live videos 60 as indicated in step 62. The detector combined with the face of interest (FOI) tracker are introduced in step 64 to detect, track and select multiple coexisting faces in the received video sequences. The face of interest (FOI) tracker combined with the face recogniser of the processor are used to determine whether the selected faces are known or unknown faces, as indicated in step 66. For known faces, two online adaptive thresholds are applied as an update rule for updating the corresponding face databases. The face recogniser outputs instant similarity values (ISV) between the current face and the corresponding face databases. If it is determined in step 68 that the outputs instant similarity values (ISV) of a face fulfil the updated rules, then the known face databases are updated in step 70. Subsequently, a time parameter threshold together with 'an adaptive updating threshold is also used to update the corresponding databases with the most recent face images. When unknown faces are detected in step 66, new databases are accordingly created by the database creator in step 72. The creation of a certain database is based on selective face images to reduce redundant information. During an idle period determined in step 74, when no face is detected for a predefined period of time, if redundant databases are determined existing in step 76, then all redundant databases are merged by the database merger through calculating the mutual similarity value (MSV) between every face database, and occasional databases are removed by the database remover based on the database size and the updating frequency in step 78, therefore, the redundant databases are merged and occasional databases are removed.

Now the recognition method of the inventive system will be illustrated in detail through an exemplary embodiment system referring to the schematic flow diagram as shown in Fig.2.

In step 12, the detector continuously acquires new images from a live video 10. Here the detection technique of FaceVACS is applied as the image-based detection step. The detection method is rule-based. Each image is compared with a predefined face template to decide whether it contains any face(s) and eyes. With positive results, searches for the eye positions are performed.

From the detected eye positions, an actual face region of a current face can be estimated by the following equation, and thus being extracted in step 14 for further processing. The face extractor of the FOI tracker may perform this extraction based on the proportion between the eye distance and the face width and face height:

Face width = 2.5 Eye Distance; Face Height = 3.5 Eye Distance;

After extraction of detected faces, in step 16, the face region separator divides.the current image into several face check regions. Then a face region movement is used to detect a sequence change between two successive frames by means of the face boundary detector in step 18.

In everyday life, most people do not move extremely fast. The average walking speed of an adult can be reasonably estimated as 4-5 km/hour, equalling to l.lm~1.4m/s. It is assumed that the face recognition system can process images with 20 frames per second (fps) ; the walking speed is then corresponding to 5.5cm~7cm/frame. The face movement boundary of a certain person between two successive images can be defined as Mob. Mob is accordingly set to 5.5cm-7cm/image. The vast majority of adults have interpupillary distance in the range 5cm-7.5cm. M&, can be roughly estimated as the eye distance, denoted by the following equation:

Mfb≤ Deye

Fig. 3 depicts the estimation of a boundary for a possible face region in the next frame from the face location of the current frame. The possible face region for the next frame is defined as a face check region (marked in light grey) . Each extracted face defines a corresponding face check region.

When the image-based detector notifies that no face is found in a certain check region, motion detection inside this check region is applied in step 18. The face boundary detector is introduced to detect whether a current frame contains any faces in the boundary corresponding to the face check regions defined in the last preceding frame image. If yes, the current face is considered to probably remain the same as in the last frame, and if not, either the face might be occluded with another object, or due to a too much rotation, a sudden lighting change etc. that the face extractor fails to extraction. In this case, the face region is reserved for the next frame image and is to be further examined in the downstream frame filter.

In this detection, small motion in this area indicates that the face still exists, and only big motion agrees with the output from the image-based detector. To be computationally efficient and thus fit for a real time purpose, motion detection is based on the simple image difference method which calculates the motion pixels in percentage of the whole pixel numbers.

Suppose that In-I1 and In denote successive images containing the check region only. Subtraction is calculated by:

1Ci=W»-l where n is the frame number of a certain sequence.

A pixel is defined as a motion pixel when the intensity change of the pixel is bigger than a predefined threshold Ith.

The motion parameter mf can be calculated by:

mf=—4—xl00%, J "total Where rid represents the number of motion pixels, and ntotai represents the total number of pixels in the face region. If nif is below a certain predefine threshold, the face region is supposed to contain the same face independently from the image-based face detector results. For example, when a head rotates to a certain degree or shows only a profile, many image-based face detection algorithms fail to detect it while the temporal-based motion detection does work.

Afterwards, in step 20, the face verifier is used to further compare the current examined face with the corresponding face is the last frame. If the similarity between both faces is no less than a predefined verifying similarity threshold (VST) , they are determined to be the same face. If the similarity is below the threshold, the face is further checked in the downstream frame filter. In step 22, the frame filter, with a length of certain frames (e.g. 7 frames), is intended to buffer the results of the face movement detector and the face verifier, and filter them to finally decide how many faces there are in those frames and which is which.

Next, in step 24 the quality selector is used to decide whether to select a certain face in the current frame for further processing in the processor. Since each check region is independently calculated from each detected face, the proposed temporal-based method has no trouble in handling multiple faces.

The face recognition is performed through step 26 to 28. Any robust image-based face recognition method with fast speed might be used in the invention. In an exemplary embodiment, the technique of FaceVACS of Cognitec Systems GmbH is chosen. It is a feature-based method which extracts local features from a face shot and transforms them into one vector. Each database is a vector set which consists multiple vectors. The recogniser outputs instant similarity value (ISV) Sv (lying between 0 and 1, with 0 meaning "no similarity at all" and 1 meaning "the same") between a current face shot and the most similar face in the database.

In step 28, a filter defined with an adaptive recognizing threshold (ART) is provided. Intuitively, if the instant similarity value (ISV) Sv is no less than the ART, the current mug shot is identified.

In the enrolment steps, if the current selected face is identified as unknown face, then in step 40, a new face database is enrolled through the database creator. If the current selected face is identified as known face, then it goes to step 30, which further determines whether or not to update a certain database with the current selected face. In this step, a filter defined with an adaptive updating threshold (AUT) is applied to keep enough variety of a certain database and thus maintaining the robustness of face recognition.

If the database for a certain face dose not reach the limited maximum number as determined in step 32, more face shots will be continuously enrolled into the database as in step 36. When a sufficient number of face shots are enrolled into a certain database, as determined in step 34, a filter with a threshold of time parameter is used to keep updating the databases with the person' s latest visage based on predefined time period parameter. If the time parameter do not fulfil the threshold, no updating will be performed and the current face shot will be discarded as in step 38. If the time parameter fulfil the threshold, the face database will be updated with the recent views of the face, as it performed in step 36. In such a way, the system keeps on updating the database according to a time period parameter.

In steps 12 and 14, when there is no face being detected and extracted, it goes to step 42. In step 42, if the processor determines that the no-face-extracted status has lasted for a predefined period of time, then it is determined as an idle period in step 44. If it is determined that redundant databases are existed in step 46, the redundant databases will be merged by the database merger and face databases of non-interest will be removed by the database remover.

Since standard testing sequences are difficult to be found for evaluating online face recognition from video, an exemplary system was tested with more than 20 image sequences, each sequence with random numbers of people and random length. The result is compared with a system based on FaceVACS technology only, as seen in Fig.4. Since FaceVACS performed best in recent FVRT tests, the comparison indicates the performance of the inventive face recognition system.

For the simplicity of comparison and due to the limitation of the FaceVACS detection technology we have applied, the inventive system is running to recognize one salient face for each frame without losing the generality because multiple people may exist. The result shows that the inventive system can successfully run with multiple people in free arbitrary motion compared to FaceVACS which produced less satisfactorily results.

Overall, the invention explores the general ways to further compensate for face detection and face recognition limitations by using the temporal information from video sequences. The invention further solves existing challenges mentioned above and presents a system which can automatically and passively recognize persons with free and arbitrary motion and runs in a completely unsupervised manner.