Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD OF MATCHING A SKETCH IMAGE TO A FACE IMAGE
Document Type and Number:
WIPO Patent Application WO/2017/174982
Kind Code:
A1
Abstract:
The performance of automated facial forensic sketch matching is improved by learning from examples of facial forgetting over time. Forensic facial sketch recognition is a key capability for law enforcement, but remains an unsolved problem. It is extremely challenging because there are three distinct contributors to the domain gap between forensic sketches and photos: The well-studied sketch-photo modality gap, and the less studied gaps due to (i) the forgetting process of the eye-witness and (ii) their inability to elucidate their memory. A database of forensic sketches created at different time-delaysis used train a model to reverse the forgetting process. Surprisingly, this enables a model to systematically "un-forget" facial details. This model is applied to dramatically improve forensic sketch recognition in practice.

Inventors:
SONG YI-ZHE (GB)
HOSPEDALES TIMOTHY (GB)
Application Number:
PCT/GB2017/050951
Publication Date:
October 12, 2017
Filing Date:
April 05, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV LONDON QUEEN MARY (GB)
International Classes:
G06K9/00
Other References:
"Network and Parallel Computing", vol. 9004, 1 January 2015, SPRINGER INTERNATIONAL PUBLISHING, Cham, ISBN: 978-3-642-38997-9, ISSN: 0302-9743, article SHUXIN OUYANG ET AL: "Cross-Modal Face Matching: Beyond Viewed Sketches", pages: 210 - 225, XP055384858, 032548, DOI: 10.1007/978-3-319-16808-1_15
CHARLIE FROWD ET AL: "A Decade of Evolving Composite Techniques: Regression-and Meta-Analysis", 12 April 2015 (2015-04-12), XP055385330, Retrieved from the Internet [retrieved on 20170627]
PIYUSH RAI ET AL: "Multi-label Prediction via Sparse Infinite CCA", NIPS 2009, 1 January 2009 (2009-01-01), XP055385101, Retrieved from the Internet [retrieved on 20170626]
GONG YUNCHAO ET AL: "A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics", INTERNATIONAL JOURNAL OF COMPUTER VISION, KLUWER ACADEMIC PUBLISHERS, NORWELL, US, vol. 106, no. 2, 2 October 2013 (2013-10-02), pages 210 - 233, XP035362706, ISSN: 0920-5691, [retrieved on 20131002], DOI: 10.1007/S11263-013-0658-4
SUN SHILIANG ED - ZENG ZHIGANG ET AL: "A survey of multi-view machine learning", NEURAL COMPUTING AND APPLICATIONS, SPRINGER LONDON, LONDON, vol. 23, no. 7, 17 February 2013 (2013-02-17), pages 2031 - 2038, XP035353972, ISSN: 0941-0643, [retrieved on 20130217], DOI: 10.1007/S00521-013-1362-6
H. S. BHATT; S. BHARADWAJ; R. SINGH; M. VATSA: "Memetically optimized MCWLD for matching sketches with digital face images", TIFS, 2012
E. V. BONILLA; K. M. A. CHAI; C. K. I. WILLIAMS: "Multi-task gaussian process prediction", NIPS, 2008
J. CHOI; A. SHARMA; D. W. JACOBS; L. S. DAVIS: "Data insufficiency in sketch versus photo face recognition", CVPR, 2012
C. FROWD, INTRODUCTION TO APPLIED PSYCHOLOGY, CHAPTER EYEWITNESSES AND THE USE AND APPLICATION OF COGNITIVE THEORY, 2011
BRITISH JOURNAL OF PSYCHOLOGY, 2007
C. FROWD; W. ERICKSON; J. LAMPINEN; F. SKELTON; A. MCINTYRE; P. HANCOCK: "A decade of evolving composite techniques: Regression-and meta-analysis.", JOURNAL OF FORENSIC PRACTICE, 2015
H. GALOOGAHI; T. SIM: "Inter-modality face sketch recognition", ICME, 2012
G. HU; Y. YANG; D. YI; J. KITTLER; W. CHRISTMAS; S. Z. LI; T. M. HOSPEDALES: "When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition", ICCV WORKSHOPS CHALEARN LOOKING AT PEOPLE, 2015
R. G. U. JR; N. DA VICTORIA LOBO: "A framework for recognizing a facial image from a police sketch", CVPR, 1996
B. F. KLARE; A. K. JAIN: "Heterogeneous face recognition using kernel prototype similarities", TPAMI, 2013
B. F. KLARE; Z. LI; A. K. JAIN: "Matching forensic sketches to mug shot photos", TPAMI, 2011
A. KUMAR; H. D. III: "Learning task grouping and overlap in multi-task learning", ICML, 2012
N. KUMAR; A. C. BERG; P. N. BELHUMEUR; S. K. NAYAR: "Attribute and simile classifiers for face verification", ICCV, 2009
Z. LEI, M; PIETIKAINEN; S. Z. LI: "Learning discriminant face descriptor", TPAMI, 2014
P. LUO; X. WANG; X. TANG: "A deep sum-product architecture for robust facial attributes analysis", ICCV, 2013
S. OUYANG; T. HOSPEDALES; Y.-Z. SONG; X. LI; C. C. LOY; X. WANG: "A Survey on Heterogeneous Face Recognition: Sketch, Infra-red, 3D and Low-resolution", IMAGE AND VISION COMPUTING, 2016
S. OUYANG; T. M. HOSPEDALES; Y.-Z. SONG; X. LI: "Cross-modal face matching: Beyond viewed sketches", ACCV, 2014
C. E. RASMUSSEN; C. K. I. WILLIAMS, GAUSSIAN PROCESSES FOR MACHINE LEARNING. IN GAUSSIAN PROCESSES FOR MACHINE LEARNING, 2006
A. SHARMA; D. W. JACOBS: "Bypassing synthesis pls for face recognition with pose, low-resolution and sketch", CVPR, 2011
C. VONDRICK; A. KHOSLA; T. MALISIEWICZ; A. TORRALBA: "Hoggles: Visualizing object detection features", ICCV, 2013
N. WANG; D. TAO; X. GAO; X. LI; J. LI: "A comprehensive survey to face hallucination", IJCV, 2014
X. WANG; X. TANG: "Face photo-sketch synthesis and recognition", TPAMI, 2009
Y. YANG; T. M. HOSPEDALES: "A unified perspective on multi-domain and multi-task learning", ICLR, 2015
A. W. YOUNG; D. HAY; K. H. MCWEENY; B. M. FLUDE; A. W. ELLIS: "Matching familiar and unfamiliar faces on internal and external features", PERCEPTION, 1985
J. ZHANG; N. WANG; X. GAO; D. TAO; X. LI: "Face sketch-photo synthesis based on support vector regression", ICIP, 2011
Attorney, Agent or Firm:
LEEMING, John Gerard (GB)
Download PDF:
Claims:
CLAIMS

1. A method of training a machine-learning algorithm using a training database comprising a plurality of records, each record comprising data representing:

a photographic image of a person; and

a plurality of sketches of the photographic image or the person, wherein the sketches include delayed sketches made after different time intervals from viewing the photographic image or person. 2. A method according to claim 1 wherein the delayed sketches include a first delayed sketch made between about 30 minutes and 2 hours after viewing and a second delayed sketch made between 12 and 36 hours after viewing.

3. A method according to claim 1 or 2 wherein the plurality of sketches include a viewed sketch made whilst viewing the photographic image or person.

4. A method according to claim 1, 2 or 3 wherein the plurality of sketches include an unviewed sketch made without viewing the photographic image or person on the basis of a description of the photographic image or person.

5. A method according to any one of the preceding claims wherein the plurality of sketches of one record are made by the same artist.

6. A method according to any one of the preceding claims wherein the plurality of sketches are made freehand or using composition software.

7. A method according to any one of the preceding claims wherein the machine learning algorithm is a Gaussian Process Regression algorithm.

8. A method according to any one of the preceding claims wherein the machine learning algorithm is a multi-task learning algorithm.

9. A method according to claim 8 wherein the multi-task learning algorithm is trained to match a second delayed sketch to a first delayed sketch, the second delayed sketch having been created after a longer time interval from viewing. 10. A method according to claim 8 or 9 wherein the multi-task learning algorithm is trained to match a delayed sketch to a viewed sketch.

11. A method according to claim 8, 9 or 10, wherein the multi-task learning algorithm is trained to build a single model trained to span a long-term memory gap, and to span at least one of a short-term memory gap, a modality gap and a communication gap as auxiliary tasks.

12. A method of matching an input sketch to a database of photographic images, the method comprising:

using a machine learning algorithm trained by the method of any one of the preceding claims to generate a reconstructed sketch from the input sketch; and

comparing the reconstructed sketch to the database of photographic images.

13. A method according to claim 11 wherein comparing the reconstructed sketch to the database of photographic images comprises a probabilistic comparison of the reconstructed sketch and the database of photographic images.

14. A method according to claim 11 wherein comparing the reconstructed sketch to the database of photographic images comprises performing nearest neighbour matching between the reconstructed sketch and the database of photographic images.

15. A method according to claim 13 wherein the probabilistic comparison takes account of the posterior confidence of a Gausian Process.

16. A method according to any one of claims 11 to 14 wherein the photographic images and the reconstructed sketch are represented by a Histogram of Gradients.

Description:
METHOD OF IMAGE COMPARISON

Field of the Invention

[ 0001 ] The present invention relates to image comparison and image searching. In particular it relates to methods of matching artist's sketches to photographs.

Background

[ 0002 ] Facial sketch recognition is an important law enforcement tool for determining the identity of criminals where only an eyewitness account of the suspect is available. In this situation, a forensic sketch artist renders the face of the suspect by hand or with compositing software based on eyewitness description. The facial sketch is then disseminated in the media, but it would also be desirable to identify the suspect by matching it against a photo mugshot database.

[ 0003 ] Motivated by this, the computer vision [reference 12] and biometrics [reference 2] fields have extensively studied sketch to photo face matching. However, practical matching of forensic sketches to photo databases remains an unsolved question. This is because studies have primarily focused on matching viewed sketches (i.e. sketches created whilst the sketch artist views the matching mugshot) rather than the rarer forensic sketches

(sketches created by a sketch artist on the basis of a witness's description of the subject). Viewed sketches such as those in the popular CUHK [reference 23] database are accurate renditions of the subject. The cross-modal sketch-photo gap is thus small, and viewed sketches are relatively easy to match - resulting in benchmark performance saturated at near-perfect [references 1, 2, 4, 12].

[ 0004 ] Because forensic sketches are drawn based on eyewitness description, possibly days after the event, they are relatively inaccurate and forensic sketch matching remains both relatively unstudied and unsolved. Matching forensic sketches to photos is a much harder and unsolved problem due to the sketch-photo gap being widened by: (i) forgotten / inaccurate memory of facial details (the memory gap)[reference 7], and (ii) imperfect communication to the police artist (the communication gap)[reference 5](whether to a human sketch-artist or software compositor).

[ 0005 ] In computer vision, facial sketch-photo matching has been studied extensively using a variety of approaches including invariant feature engineering [references 1, 2, 4, 12], cross-modal regression/synthesis [references 22, 23] and shared subspace learning

[reference 20]. These contributions address the sketch/photo modality gap, but do not address the issues of forgotten memory and imperfect communication. In contrast, psychology [reference 25] and forensic psychology [reference 6] have studied the reliability of different facial features in human face matching, and the fading of memory with time (i.e., forgetting) [reference 7]. This has provided some insights into human recognition (internal facial features are more important overall), and the reliability of human memory, for example that memory fidelity drops rapidly after a few hours [reference 7]. This means that forensic sketches are very inaccurate in practice, because they are usually taken days after the event [references 6, 7].

[ 0006 ] Studies on matching facial sketches to photos can be classified based on the type of sketches used: viewed, semi-forensic and forensic, and whether the sketches are hand drawn, or computer composited. The majority of previous studies have focused on viewed sketches due to being an easier task with accessible benchmark databases. Representative approaches to viewed sketch recognition include bridging the gap with MRF-based photo-sketch synthesis [reference 23], learning common subspace for comparison with PLS [reference 20], or engineered new invariant descriptors [reference 8]. For further details, we refer the reader to the survey in [reference 17]. Recognition rates on the main viewed sketch benchmarks [reference 23] have reached 100% [reference 8], so viewed sketch recognition can be considered solved.

[ 0007 ] One of the earliest studies to discuss automatically matching forensic sketches with photos was [reference 10]. It highlighted the importance, as well as complexity and difficulty of forensic sketch based face recognition. The first significant demonstration of automated forensic sketch matching was [reference 12], which combined feature engineering (SIFT and LBP) with a discriminative (LFDA) method to learn a weighting that maximised

identification accuracy. Later studies such as [reference 2] improved these results, again combining feature engineering (Weber and Wavelet descriptors) plus the discriminative learning (genetic algorithms) strategy to maximise matching accuracy.

[ 0008 ] Unlike viewed sketches, forensic sketch databases are few and small in size. The main sketch/photo databases are 159 pairs identified by [reference 12], and 190 pairs in the IIIT-D database [reference 2]. A realistic evaluation of sketch-based face matching should also include a large pool of mugshots to match against, in addition to the true photo corresponding to each sketch. Despite this, only a few studies have evaluated forensic sketch matching algorithms in this way. Notably [reference 12], which trained a matching model on viewed sketches and then tested matching 159 forensic sketches against corresponding photos and a 10,030 mugshot database. [ 0009 ] Regression models are widely used in cross-domain face recognition [reference 17]. For facial sketch matching, regression models may provide facial sketch ->photo synthesis [reference 22] to support matching, for example via support vector regression (SVR)

[reference 26]. Alternatively, Partial Least Squares (PLS) models may be used to map images in each modality to a common subspace where they are more comparable [reference 20]. Although widely and effectively used, all prior work has focused on regression modelling to tackle the modality-gap problem rather than the memory-gap problem.

[ 0010 ] Study of facial attributes [references 14, 16] is a topical problem in computer vision. It is also relevant to forensic sketch recognition because encoding sketches and photos in terms of facial attributes can help to bridge the sketch/photo modality gap [reference 18], or prune the matching space [reference 12]. However, attributes are vulnerable to forgetting as well, so the attributes of a sketch may mismatch those of the corresponding photo even if they are perfectly detectable by computer vision techniques.

[ 0011 ] Studies have shown the ability of individuals to recognise faces depends on different facial features according to the level of familiarity [reference 25]. Internal facial features are important for identification of familiar faces, and external features for unfamiliar faces

[reference 6]. With regards to the forgetting process, forensic psychology studies have found that memory fidelity drops dramatically between the first hour and first 24 hours after witnessing a face. However, in practice forensic sketches are rarely made within the first day [reference 7]. Thus, any mechanism capable of bridging this gap automatically is expected to both have a large impact on quantitative recognition performance and forensic police work in practice.

SUMMARY

[ 0012 ] It is an aim of the present invention to improve matching of sketches, e.g. of faces, to photographs of the corresponding subjects. In particular the present invention aims to improving matching of forensic sketches to photographs.

[ 0013 ] According to the present invention, there is provided a method of training a machine- learning algorithm using a training database comprising a plurality of records, each record comprising data representing:

a photographic image of a person; and

a plurality of sketches of the photographic image or the person, wherein the sketches include delayed sketches made after different time intervals from viewing the photographic image or person. [ 0014 ] Desirably the delayed sketches include at least some of: a first delayed sketch made between about 30 minutes and 2 hours after viewing; a second delayed sketch made between 12 and 36 hours after viewing; a viewed sketch made whilst viewing the photographic image or person; and an unviewed sketch made without viewing the photographic image or person on the basis of a description of the photographic image or person.

[ 0015 ] Desirably the plurality of sketches of one record are made by the same artist.

[ 0016 ] The plurality of sketches can be made freehand or using composition software.

[ 0017 ] In an embodiment the machine learning algorithm is a Gaussian Process Regression algorithm and desirably a multi-task learning algorithm.

[ 0018 ] The multitask learning algorithm is desirably trained to match a second delayed sketch to a first delayed sketch, the second delayed sketch having been created after a longer time interval from viewing; and/or to match a delayed sketch to a viewed sketch.

[ 0019 ] According to the present invention, there is also provided a method of matching an input sketch to a database of photographic images, the method comprising:

using a machine learning algorithm trained by the method of any one of the preceding claims to generate a reconstructed sketch from the input sketch; and

comparing the reconstructed sketch to the database of photographic images.

[ 0020 ] In an embodiment comparing the reconstructed sketch to the database of

photographic images comprises performing nearest neighbour matching between the reconstructed sketch and the database of photographic images.

[ 0021 ] In an embodiment comparing the reconstructed sketch to the database of

photographic images comprises a probabilistic comparison of the reconstructed sketch and the database of photographic images.

[ 0022 ] In an embodiment the probabilistic comparison takes account of the posterior confidence of a Gausian Process.

[ 0023 ] In an embodiment the photographic images and the reconstructed sketch are represented by a Histogram of Gradients.

[ 0024 ] The present inventors have determined that the memory gap is the key underlying problem to improving matching of sketches, in particular forensic sketches, to photographs. The present invention applies learning and computer vision techniques to ameliorate the memory gap problem. To disentangle the three factors (cross-modal, forgetting, and imperfect communication) in the forensic sketch/photo gap, the present invention provides a new approach to training a machine learning algorithm using an approach to construction of a training database comprising, for each subject: a photo; a viewed sketch; at least one delayed sketch (e.g. a 1-hour delay sketch and a 24-hour delay sketch); and an unviewed sketch.

[ 0025 ] Memory transience could be random (i.e., all memory errors are equally likely), or there might be systematicity in the forgetting process (i.e., misremembered details occur with some kind of predictable pattern that can be exploited). Somewhat surprisingly, the present inventors have determined that it is possible for a machine learning model to input a forensic sketch, and to some extent reverse the forgetting process to produce a more accurate sketch that is easier to match.

[ 0026 ] Based on a memory gap database and model, the present invention aims to improve forensic sketch to mugshot matching: by modelling the photo-sketch modality gap, imperfect communication gap and by modelling a map from memories of old to recently seen faces to correct misremembered facial details. Since forgetting dynamics differ across time periods [reference 7], it is unclear how to model the memory gap data: a single model covering forgetting across different time-periods is too coarse, but a distinct model of the forgetting in time-slice of the database is too specific. Similarly, the overall forensic sketch matching task spans modality, communication and memory gaps. An intuitive approach would therefore be to apply in sequence multiple models trained to span each of these gaps.

[ 0027 ] While a sequence of multiple models is effective, the present invention proposes a better practical solution of applying multi-task learning [reference 24] to build a single model trained to span the longer 24h memory gap, but with the other gaps (short-term memory, modality and communication) as auxiliary tasks.

[ 0028 ] In embodiments of the invention, Gaussian Process regression is utilised to address both the memory-gap and the modality-gap components in forensic sketch matching. BRIEF DESCRIPTION OF THE DRAWINGS

[ 0029 ] The present invention will be described further below with reference to exemplary embodiments and the accompanying drawings, in which:

Figure 1 is an overview of the database and approach to matching of an

embodiment of the invention;

Figure 2 illustrates facial regions;

Figure 3 is a chart comparing root mean square error (RMSE) averaged across full face for learned reconstruction and original sketches;

Figure 4 shows qualitative results of matching in forensic sketch database;

Figure 5 shows qualitative results of MTL-GPR model; Figure 6 shows CMC curves for matching Good (49) / All (195) forensic sketches against corresponding photos and 10,030 FSMD database mugshots; and

Figure 7 depicts a matching system according to an embodiment of the invention. DESCRIPTION OF EMBODIMENTS

[0030 ] The overall scheme of an embodiment of the invention is shown in Figure 1. First a projection for "un-forgetting" is learnt, as well as modality and description gap (top). This projection is applied to improve (un-forget) forensic sketches before matching against photos (below). The reconstructed sketch (dotted box) is a closer match to the true photo (bottom left) than the input forensic sketch (bottom right) (visualisation with HOGgles [21]).

[0031 ] The reconstructed sketch is derived by a machine-learning algorithm trained using a new memory gap facial sketch database with 100 subjects each with a photo and four sketches that disentangle different aspects of the forensic sketch gap (400 sketches in total). The present inventors have determined that there is systematicity in facial forgetting, and so inaccurate forensic facial sketches can be automatically improved by the machine learning algorithm trained to recover 'recent' from Old' face memories. Thus an embodiment of the present invention can significantly outperform the previous state of the art [references 11, 12, 15] at matching forensic sketches against corresponding photos and a large 10,030 mugshot database.

[0032 ] The forensic sketch-photo matching task is complicated by three distinct challenges. Photo/sketch modality change, forgetting, and communication (to sketch artist/compositing software) issues all contribute. The present invention uses a dataset designed to disentangle these issues. It contains N subjects, with photos DP :::: i. x " :: i and sketches drawn with different conditions l)s f — (v)iewed, (1) hour, (24) hour and (u)nviewed. The 1 and 24 hour sketches do not need to have been created at exactly those intervals from viewing the photo; some variation as allowable but consistency is desirable. Other intervals can also be used. Each image is assumed to be represented by a c/-dimensional feature vector x. The task of nearest-neighbour (NN) matching a viewed sketch x t=v to a photo database would be i N * N = argmin|r (1)

[0033 ] Studies focusing on bridging the modality gap by linear regression-based synthesis or linear subspace projection aim to solve a similar task, after learning a suitable regression matrix W v or projections V v and W respectively:

map ~ argmm\ W v x v - WP (2) [ 0034 ] Memory Modelling: Making use of the memory-gap database, we can separate contributing components of the forensic-sketch gap. For example, training W V→ P in

W v→p = ARG MIN ∑ \ \ X P _ W→P^ | | 2 (3) is the conventional task of learning to bridge the modality gap between photos and viewed sketches. Training W U→V would be learning to correct the communication gap. While training

W i→v

is learning to correct 24 hours worth of transience, independent of the modality or

communication gap. Given the conditions in the memory-gap database, there are a variety of potential tasks (10 in total) including: correcting the modality v→ p or short term memory gap 1→ v reducing or completely correcting the long-term memory gap 24→ 1 or 24→ v respectively; and full forensic sketch matching u→ p (a full list is presented below).

[ 0035 ] Mapping Strategy: Rather than the most common linear projection approach to these learning tasks [reference 20], the present invention uses Gaussian Process Regression (GPR) [reference 19]. This approach is taken because: (i) GPR provides a more flexible nonlinear mapping, and importantly (ii) as a Bayesian regression framework, GPR provides a distribution over the reconstruction rather than a single point estimate. This uncertainty metric at each point of the reconstruction turns out to be important to improve matching

performance, by automatically weighting each feature according to its reliability.

[ 0036 ] Exploiting Multiple Models: As mentioned earlier, the memory-gap database provides 10 potential modelling tasks. The most obvious ways to use these for practical forensic sketch matching would be: (i) apply the model learned for direct forensic sketch- photo matching u→ p, or (ii) given multiple models trained to correct the different sources of error, sequentially apply them to correct each source of error in turn, e.g., u→24→l→v→ p.

[ 0037 ] Clearly some of these tasks are related (e.g., tasks 1→ v, 24→ 1, 24→ v span different steps of forgetting). So an alternative approach that will turn out to be better is to learn all the tasks together in a multi-task learning framework. In this way each task shares information with (i.e. is regularised by) the others. Specifically, the present invention jointly learns the tasks with Multi-Task Gaussian Process Regression (MTL-GPR).

[ 0038 ] Single Task Modelling: GP regression can be applied to cross-modal/memory-gap problems such as those in Eqs. 2-4, but learning a non-linear projection. Denoting now features in input and target conditions as x and y respectively, the database provides training pairs D = {y,x}. For any query point x * the GPR prediction for y * is:

p(y* \x * , D) ~ Λ ' ( k ' i ' A ' ~ k.„ - k i - 'k,.) (5) where matrix K is the covariances at all pairs of train points, vector k * is the train-test covariances, k * = [K X*,XI)...K X*,XN]] and k ** = κ χ * * ) . The most common squared- exponential kernel K i x< " "' w ( x ~~~ is used, and the kernel hyper parameter / can be tuned by gradient on the marginal likelihood [reference 19].

[ 0039 ] Multi Task Modelling: In the present problem there are 10 distinct mapping tasks, which are learned together in a MTL-GPR framework. Following [reference 3], GP regression is learned with predictions for tasks / and k correlated as:

< Mx)fk(x') > - ^ < - · ' } (6)

[ 0040 ] Here / and k index any two conditions in the memory-gap database, and Kfis the 10 χ 10 PSD matrix of inter-task similarities. Standard GP predictions can then be made using this covariance. Importantly, with this approach, the key task similarity matrix Kf can also be learned along with the kernel hyper parameters / via the marginal likelihood [3].

[ 0041 ] Correcting Inaccurate Memory: For any task provided by the memory-gap database, reconstruction is performed by computing the GP posterior of each target feature. For example, to improve an unviewed sketch u→ v, we would compute the predictive distribution ,P( X * I X * } ~ λ Γ χ „ , as given by Eq. 5. The new sketch would then be given by the mean of the posterior normal μ χ ,, and the confidence of each feature dimension by the corresponding variance ίΤ χ«.

[ 0042 ] Matching across Memory or Domain Gap: With this framework matching can be performed by calculating the likelihood of each mugshot in the gallery under the posterior predictive distribution of the probe sketch. For example, after training on the memory gap database D, we can use model u→ p to match a forensic sketch x" * against a database of mugshots X p - 'ι ^ } . Λ :: ΐ as follows:

Compute the distribution over the expected photo corresponding to the forensic sketch: p(x p |x", Z)) = N

Based on this, the matching photo ( P 8 can be found by performing nearest neighbour matching of the prediction against the photo database as: i * = argmin||^ ¾t _ |' || . where || || indicates a suitable metric, such as Euclidean distance. However, better performance is achieved by taking into account the certainty estimate of the GP predictor. To achieve this, the photo that maximizes the likelihood of the GP predictive distribution is selected: i * = argmax p(x^ \x^ , D) i

In practice, each dimension of the target is modelled independently with GPR, so this is equivalent to i * = argmin∑ f c(x fc — μ χ ^ 2 / ¾ tk ) where ¾' and ίΓ · ... respectively are the k - th dimension of the target photo, posterior predicted photo mean and variance.

The above procedure describes finding the single best match. By sorting the photos in the database by their matching probability, a ranked list of a predetermined number of the most likely putative matches can be generated for manual inspection.

[ 0043 ] The memory gap database and its creation procedure will now be described in more detail. 100 subjects were chosen from mugshots.com, which releases mugshots of real criminals. For each subject one frontal face photo is selected, and four types of sketches are drawn:

• Viewed: Sketches are drawn while the artist looks directly at the mugshot photos.

• 1 hour: Mugshot photos are viewed by the artist, and sketches are drawn one hour later. Thus, compared to viewed sketches, the sketch is 'corrupted' by one hour worth of memory transience.

• 24 hours: Mugshot photos are viewed by the artist, and drawn 24-hours later.

• Unviewed: Sketches are drawn by an artist based on the description of an eyewitness who has seen the mugshot photo immediately before (but does not view it during the sketching). The artist does not see the photo. In this case, the memory gap is negligible, but it is the only condition in the database where the communication gap of imperfect communication between the eyewitness and artist exists.

[ 0044 ] The reason for this design of the collection procedure is so that the modality and communication gaps can be isolated (in photo-viewed and viewed-unviewed respectively) from the memory gap (24h to lh to viewed). This potentially enables specific models to be built to address each contributing factor of the forensic sketch challenge.

[ 0045 ] To build the memory gap database, over 20 art students were selected to contribute as both sketch artists and eyewitness. Each artist is asked to draw all four kinds of sketches for each subject. This way the sketches for each mugshot do not have inter-artist variability, but the drawing order is such that forensic sketches are fully unviewed. [ 0046 ] Three databases were studied experimentally: the new Memory Gap Database (MGDB), in which each image is annotated with 40 binary facial attributes from the ontology provided by [reference 18]; a Forensic Composite Database with 51 forensic composite-photo pairs [reference 7], and the Forensic Sketch and Mugshot Database (FSMD). The latter consists of two parts: 195 forensic sketch-photo pairs [references 2, 12] and a large background gallery of mugshots to search against, in order to replicate a real-world scenario where a law-enforcement agency would query a large gallery of mugshot images with a forensic sketch. The same 195 sketch-photo pairs as [references 12, 18] were used. The mugshot gallery used by [references 11, 12] was not released publicly, so this was simulated as well as possible by downloading 10,030 mugshots from mugshots.com (the same source used by [reference 12]).

[ 0047 ] Memory- Aware Model Training: All sketch and photo conditions (t = photo, viewed, 1 hour, 24 hour and unviewed) were used to exhaustively construct the 10 possible reconstruction tasks. For each task, sketches corresponding to two-thirds of subjects served as training data, and the others serve as testing data. The 2/3 s training subjects and 10 tasks were used to jointly train 10 models via MTL-GPR.

[ 0048 ] Overall ten regression tasks were trained: 1) viewed sketch to photo, 2) 1 hour sketch to photo, 3) 24 hour sketch to photo, 4) unviewed sketch to photo, 5) 1 hour to viewed sketch, 6) 24 hour to viewed sketch, 7) unviewed to viewed sketch, 8) 24 hour to 1 hour sketch, 9) unviewed to 1 hour sketch and 10) unviewed to 24 hour sketch. Some of these are illustrated in Figure 1.

[ 0049 ] Features and settings: all photo and sketch images were normalised to 256x196 and aligned by normalising on interocular distance. Each image is then represented with HoG features. Dense HoG features were computed over a regular grid (16x 16 step size), which results in a feature vector of dimension 5,952 for each image. For each image, 40 attributes are also detected using SVM detectors trained using the ground-truth attributes on the training split [reference 18].

[ 0050 ] Baselines: In addition to the MTL-GPR memory-aware model, alternative regression methods that could potentially model the gaps across database contexts were considered as follows:

• Nearest Neighbour (NN): Direct matching. Ignore the gap.

• Linear Regression (LR): Linear (L2 regularised) regression is the simplest explicit mapping approach. • Polynomial Support Vector Regression (SVR): SVR was used in [reference 26] to accomplish sketch-photo synthesis.

• Polynomial Multi-Task Learning: We use the [reference 24] implementation of the popular GO-MTL [reference 13] multi-task learner. By exploiting task relatedness, this may perform better than SVR. In initial experiments we found polynomial MTL significantly better than linear, so we report the former.

• (Single Task) Gaussian Process Regression (GPR) [reference 19]: Compared to the others, GPR provides a non-parametric probabilistic prediction with an estimate of uncertainty that can be used for matching.

· Sequential GPR: As above, this is the intuitive baseline of applying a number of the

10 GPR models in sequence to correct distinct error sources.

[0051 ] In this section, the MTL-GPR reconstruction of faces, as represented by HoG features, is analysed. The analysis could in principle be done with pixels, but this would be computationally expensive due to higher dimensionality. To help interpret the results, the facial HoG feature maps are divided into external regions and internal regions: external, internal, eyes, nose, mouth and chin [reference 25], as shown in Figure 2. To investigate whether the memory model helps to bridge the gap between photo and forensic sketch, RMSE are calculated between sketch/reconstructed sketch and the corresponding photos. The results are shown broken down by facial region and averaged over tasks (Table 1) and averaged over all regions broken down by tasks (Figure 3). From these we can see that: (i) Each learned projection task in the MGDB database reduces the sketch-photo RMSE. (ii) This

demonstrates that sketches drawn at different delays contain some systematic shift that it is possible to reverse, or it would not be possible to learn a model that consistently improves RMSE. (iii) Reconstruction consistently improves RMSE for each distinct semantic facial region.

[0052 ] Table 1. RMSE of sketch/reconstruction vs photo according to regions, averaged across all ten tasks in MGDB.

Region Photo v.s. Original Sketch Photo v.s. Projected Sketch

External 0.20±0.013 0.16±0.025

Chin 0.20±0.014 0.16±0.023

I nternal 0.18±0.003 0.16±0.015

Mouth 0.17±0.007 0.16±0.012

Eyes 0.18±0.003 0.15±0.023 Nose 0.18±0.011 0.14±0.018

[ 0053 ] In this section face matching performance on the test split of the memory gap database is quantitatively evaluated. As outlined above, a variety of baselines are compared to our proposed MTL-GPR and the rank 1 (perfect match) accuracy for each of the 10 tasks is reported in Table 2. The row and column give the MGDB image pair (training task). The column gives the MGDB sketch input for testing, and the task is always to match against photos using the corresponding training model.

[ 0054 ] Table 2. Photo-sketch matching on the memory gap database (Rank 1 accuracy, %). Comparing MTL-GPR, GPR, Polynomial MTL, Polynomial SVR, Linear Regr. and NN. Sketch input is given by column and matched with the model trained on the corresponding cell of MGDB. Average accuracies over 15 random splits of 68 training and 32 testing subjects.

[ 0055 ] Efficacy of memory-aware models: From Table 2, we can draw the conclusions: (i) Sketch reconstruction with linear regression does not consistently improve on direct NN matching, suggesting that a linear projection is insufficient, (ii) Every non-linear approach to bridging the modality/memory gap performs better than direct NN matching with no memory gap model, but among the baseline memory gap models, there is no clear winner or loser, (iii) The proposed MTL-GPR is the clear winner overall, often with significant margins over the next best (e.g., 87% vs 57% in 24→ v setting), (iv) That MTL-GPR outperforms regular GPR demonstrates that there is common information in each of the distinct tasks that can be extracted and shared, (v) In some cases the gain from an explicit un-forgetting model is vast: In the 24→ v setting, performance triples from 29% to 87% comparing NN matching with MTL-GPR.

[ 0056 ] Significance of Bayesian Memory Gap Model: One of the reasons for the GP methods' good performance is their ability to account for reconstructed feature reliability in matching. This is demonstrated in Table 3, where performance with and without the use of the reconstruction variance is compared. Clearly accounting for reconstruction reliability significantly benefits performance.

[ 0057 ] Table 3. The importance of Bayesian memory modelling: Rank 1 MGDB match results (%) without/with reconstruction confidence. Average accuracies over 15 random splits of 68 training and 32 testing subjects.

Accuracy Viewed lh 24h Unviewed

photo 86 / 99 85 / 96 60 / 90 50 / 86

Viewed - 56 / 90 43 / 86 40 / 73

lh - - 38 / 69 36 / 63

24h - - - 28 / 42

[ 0058 ] Qualitative Analysis: The average variance map across the database is shown in Figure 5(right). The model confidently predicts both internal (eyes, mouth) and external (hair, chin) facial regions [reference 25], while giving less weight to skin regions (forehead, cheeks), where texture may not be predictable from the sketch.

[ 0059 ] The MTL-GPR framework also aims to discover task relatedness. The learned task relatedness matrix Kfis shown in Fig. 5(left). The clear block structure here shows that the tasks with sketches as target context are much more related to each other than those with photos as the targets. The 24→ 1 task is also noticeable as sharing structure with many of the other sketch predictors (cross structure within the block).

[ 0060 ] Matching on Forensic Sketch Database: All ten learned memory-aware models were transferred to the forensic sketch database, which includes 195 forensic sketch-photo pairs. Few experiments have been done on forensic sketch database, except [reference 18] which focused on using attributes to bridge the sketch/photo gap. To compare directly with [reference 18], we evaluate our models on the same 1/3 test split.

[ 0061 ] The results are shown in Table 4, from which we make the following observations: (i) All reconstruction models perform significantly better than 9% with HoG matching alone, and almost all outperform the 21% of [18]. (ii) Comparing STL-GPR and MTL-GPR, the models trained with photo targets perform worse when learned jointly, i.e., they suffer negative transfer from the sketch targets. However, the models trained with sketch targets generally perform better, i.e., they successfully share information about bridging the memory gap. (iii) The best model overall is MTL-GPR's 24→ 1, suggesting that the biggest single contributor to the forensic sketch gap in practice is the longer term forgetting between 1 and 24 hours. The second best is also memory related l→v.

[ 0062 ] Table 4. Matching results (Rank 1 accuracy, %) on forensic sketch database (1/3 test split) using MTL-GPR / STL-GPR. Compare: 21% from [18] and 9% by direct HoG matching. Average accuracies over 15 random splits of 68 training and 32 testing subjects.

Accuracy Viewed lh 24h Unviewed

Photo 22 / 35 22 / 34 15 / 40 18 / 41

Viewed 65 / 48 40 / 50 33 / 48

lh - 78 / 48 54 / 40

24h - - 65 / 42

[ 0063 ] An intuitive alternative way to exploit the tasks learned in MGDB for forensic sketch matching is to apply the models in sequence to correct the various sources of error in forensic sketches. This experiment was conducted for a variety of possible STL-GPR model sequences. The results in Table 5 show that while all outperform the 9% of direct matching, none of the multi-step configurations outperform the best single task of 24→ 1. Which is itself outperformed by our MTL-GPR 24→ 1 in Table 4. Based on this analysis, the preferred approaches are the two MTL-GPR memory models 1→ v and 24→ 1, which we denote Early and Late, and are reviewed in the final large-scale benchmark experiments.

[ 0064 ] Table 5. Matching results (Rank 1 accuracy, %) on forensic sketch database (1/3 test split) using sequence of STL-GPR models.

u 24 u 24 1 u 24 l^ v u 24 ^ l^ v p

54 28 20 13

24 ^ 1 24 1 v 24 ^ 1 ^ v p

56 39 16 16

[ 0065 ] Matching on Forensic Sketch and Mugshot Database: The full problem of matching forensic sketches to a large database of mugshot photos is now addressed. The results of our Early and Late-Memory MTL-GPR models are compared to the results of the state of the art LFDA [reference 12] (who also reported the results of a state of the art commercial system Face VACS), KPS [reference 11], and DFD [reference 15]. To provide an additional baseline, the best publicly available (photo) Deep face recognition model

[reference 9] is used to extract features for matching. As [reference 12] demonstrated, the value of filtering by soft biometrics, the models are also combined with predicted attributes (trained on memory gap database) with score-level fusion.

[0066 ] In order to compare directly with [reference 12], who break down results by "good" and "bad" quality sketches, results in Table 6 focus on a good quality subset of sketches. In Figure 6, a cumulative match characteristic (CMC) curve is shown, including results for both all 195 sketches as well as the 49 good quality sketches against corresponding photos and 10,030 FSMD database mugshots.

[0067 ] From the results it can be seen that: (i) the memory-gap model significantly surpasses state of the art performance, demonstrating that the model learned on the memory gap database can dramatically improve real forensic sketch matching, (ii) Of the memory- aware models, the Late-Memory model trained on the 1-24 hour memory gap performs better, reflecting forensic psychology conclusions that the first day's forgetting is significant

[reference 7], (iii) Including predicted facial attributes improves performance further, (iv) Using modern deep features with direct matching now outperforms the commercial

Face VACS result, but it is significantly worse than both LFDA [reference 12] and ours:

indicating that deep features alone are insufficient to address forensic sketch matching.

[0068 ] Table 6. State of the art comparison. Accuracy (%) of matching 49 good forensic sketches against corresponding photos and 10,030 FSMD database mugshots. *Not directly comparable, used a different 53 sketch probe set.

Accuracy Rank 1 Rank 10 Rank 50

MTL-GPR Early-Mem 23 23 33

MTL-GPR Early-Mem+Attr 25 25 35

MTL-GPR Late-Mem 33 33 39

MTL-GPR Late-Mem+Attr 38 42 45

LFDA [12] 17 23 33

LFDA [12]+ gender +race 19 27 45

FaceVACS (reported by[12]) 2 4 8

KPS [11]* 4 9 21 Deep Features [9] 2 6 15

DFD [15] 6 13 19

[ 0069 ] Qualitative Examples: Some qualitative examples of the matching process of the present invention using the forensic database are shown in Figure 4. The memory

reconstruction model trained on 24→1 hour sketches of MGDB is transferred to a forensic sketch database. Reconstruction variance improves matching by focusing on reliable features. These good sketches were both retrieved at Rank 1 of 10,225 (10,030+195). Bad sketches were retrieved at Rank 1592 and 1800 respectively. Photos and sketches are represented with HoG (Histogram of Gradients) features (visualised by HOGgles [reference 21]). The learned memory reconstruction model predicts the mean and variance of photo-HOGs. Photos are chosen by their likelihood under the predicted Gaussian distribution, allowing matching to take into account the prediction reliability of each feature.

[ 0070 ] Matching on Forensic Composite Database: Although the model of the present invention is trained on sketches rather than software composite faces, the learned model is general enough to improve forensic composite matching. Table 7 shows the results of retrieving 51 composites from among the same mugshot gallery. Clearly the model of the present invention still makes a significant impact on retrieval performance, despite the sketch- composite domain shift.

[ 0071 ] Table 7. Accuracy (%) of matching 51 forensic composites against corresponding photos and 10,030 FSMD database mugshots.

Accuracy Rank l Rank 10 Rank 50

HOG 6 14 20

DFD [15] 2 4 4

MTL-GPR Late-Mem 14 18 26

[ 0072 ] A sketch matching system according to an embodiment of the invention is shown schematically in Figure 7. An input sketch 10, which is to be matched against mugshots stored in image database 40, is processed in reconstruction unit 20, which contains a trained ML algorithm as described above, to generate a reconstructed sketch for matching purposes. The reconstructed sketch may be expressed in the form of a histogram of gradients. Matching unit 30 uses the reconstructed sketch to find one or more matching photographs in the image database 40. Metadata 11 relating to the sketch and/or the assumed subject of the sketch, can also be taken into account in the matching process. A ranked list 50 of potential matches is output.

[0073 ] The present invention addresses two problems: improving facial sketches whose quality is impacted by a large delay between seeing the face and making the sketch; and improving practical forensic sketch recognition. Embodiments of the present invention are able to improve facial sketches drawn after a time-delay, and this translates into the significantly improved performance on the important task of forensic sketch matching.

[0074 ] One limitation of the above described embodiments is that each HoG dimension is modelled independently, so cross-pixel correlation is not exploited. However, richer information sharing architectures, such as local patches, CRF smoothing, and multi-task among neighbouring pixels can also be employed. Secondly, the above described

embodiments address the cross-modal and communication gaps only implicitly via MTL sharing. A richer framework more explicitly modelling the contributing factors can also be used.

[0075 ] Having described embodiments of the present invention it will be appreciated that the invention is not limited by the foregoing description and variations of the described embodiments may be made within the scope of the appended claims.

[0076 ] References

[1] H. S. Bhatt, S. Bharadwaj, R. Singh, and M. Vatsa. On matching sketches with digital face images. In BTAS, 2010.

[2] H. S. Bhatt, S. Bharadwaj, R. Singh, and M. Vatsa. Memetically optimized MCWLD for matching sketches with digital face images. TIFS, 2012.

[3] E. V. Bonilla, K. M. A. Chai, and C. K. I. Williams. Multi-task gaussian process prediction. In NIPS, 2008. [4] J. Choi, A. Sharma, D. W. Jacobs, and L. S. Davis. Data insufficiency in sketch versus photo face recognition. In CVPR, 2012.

[5] C. Frowd. Introduction to Applied Psychology, chapter Eyewitnesses and the use and application of cognitive theory. 2011.

[6] C. Frowd, V. Bruce, A. Mclntyre, and P. Hancock. The relative importance of external and internal features of facial composites. British Journal of Psychology, 2007. [7] C. Frowd, W. Erickson, J. Lampinen, F. Skelton, A. Mclntyre, and P. Hancock. A decade of evolving composite techniques: Regression-and meta-analysis. Journal of Forensic Practice (in press), 2015.

[8] H. Galoogahi and T. Sim. Inter-modality face sketch recognition. In ICME, 2012. [9] G. Hu, Y. Yang, D. Yi, J. Kittler, W. Christmas, S. Z. Li, and T. M. Hospedales. When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition. In ICCV Workshops ChaLearn Looking at People, 2015.

[10] R. G. U. Jr. and N. da Victoria Lobo. A framework for recognizing a facial image from a police sketch. In CVPR, 1996. [11] B. F. Klare and A. K. Jain. Heterogeneous face recognition using kernel prototype similarities. TP AMI, 2013.

[12] B. F. Klare, Z. Li, and A. K. Jain. Matching forensic sketches to mug shot photos. TP AMI, 2011.

[13] A. Kumar and H. D. III. Learning task grouping and overlap in multi-task learning. In ICML, 2012.

[14] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classifiers for face verification. In ICCV, 2009.

[15] Z. Lei, M. Pietikainen, and S. Z. Li. Learning discriminant face descriptor. TP AMI, 2014. [16] P. Luo, X. Wang, and X. Tang. A deep sum-product architecture for robust facial attributes analysis. In ICCV, 2013.

[17] S. Ouyang, T. Hospedales, Y.-Z. Song, X. Li, C. C. Loy, X. Wang, A Survey on Heterogeneous Face Recognition: Sketch, Infra-red, 3D and Low-resolution, Image and Vision Computing, 2016.. [18] S. Ouyang, T. M. Hospedales, Y.-Z. Song, and X. Li. Cross-modal face matching: Beyond viewed sketches. In ACCV, 2014.

[19] C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. In Gaussian Processes for Machine Learning, 2006. [20] A. Sharma and D. W. Jacobs. Bypassing synthesis pis for face recognition with pose, low-resolution and sketch. In CVPR, 2011.

[21] C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba. Hoggles: Visualizing object detection features. ICCV, 2013. [22] N. Wang, D. Tao, X. Gao, X. Li, and J. Li. A comprehensive survey to face hallucination. IJCV, 2014.

[23] X. Wang and X. Tang. Face photo-sketch synthesis and recognition. TP AMI, 2009.

[24] Y. Yang and T. M. Hospedales. A unified perspective on multi-domain and multi-task learning. In ICLR, 2015. [25] A. W. Young, D. Hay, K. H. McWeeny, B. M. Flude, and A. W. Ellis. Matching familiar and unfamiliar faces on internal and external features. Perception, 1985.

[26] J. Zhang, N. Wang, X. Gao, D. Tao, and X. Li. Face sketch-photo synthesis based on support vector regression. In ICIP, 2011.




 
Previous Patent: PRINTING INK

Next Patent: CAPNOMETER