Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR SKIN DETECTION IN IMAGES
Document Type and Number:
WIPO Patent Application WO/2022/150874
Kind Code:
A1
Abstract:
Described herein is a subject monitoring system (100). The system (100) includes a near infrared illumination source (108) configured to illuminate a scene with infrared light having a spatial beam characteristic to generate a spatial pattern and an image sensor (106) configured to capture one or more images of the scene when illuminated by the illumination source (108). System (100) also includes a processor (112) configured to process the captured one or more images by determining a degree of presence or modification of the spatial pattern by objects in the scene within pixel sub-regions of an image. Processor (112) also classifies one or more pixel sub-regions of the image as including human skin or other material based on the degree of modification of the spatial pattern identified in that pixel sub- region.

Inventors:
EDWARDS TIMOTHY (AU)
NOBLE JOHN (AU)
LIANG XUEJUN (AU)
WHICHELLO LACHLAN (AU)
Application Number:
PCT/AU2021/051563
Publication Date:
July 21, 2022
Filing Date:
December 24, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SEEING MACHINES LTD (AU)
International Classes:
G06V40/40; B60W40/08; G01N21/359
Domestic Patent References:
WO2019038128A12019-02-28
Foreign References:
US20200394390A12020-12-17
US20190335098A12019-10-31
US20130342702A12013-12-26
Other References:
SONG, L. ET AL.: "Face Liveness Detection Based on Joint Analysis of RGB and Near-Infrared Image of Faces", ELECTRONIC IMAGING, IMAGING AND MULTIMEDIA ANALYTICS IN A WEB AND MOBILE WORLD 2018, 28 January 2018 (2018-01-28), pages 1 - 6, XP055856394, Retrieved from the Internet
ZHOU PEI, ZHU JIANGPING, YOU ZHISHENG: "3-D face registration solution with speckle encoding based spatial- temporal logical correlation algorithm", OPTICS EXPRESS, vol. 27, 12 July 2019 (2019-07-12), pages 21004 - 21019, XP055956663, Retrieved from the Internet
NOWARA, E.M.: "Camera-based Vital Signs: Towards Driver Monitoring and Face Liveness Verification", THESIS, 20 August 2018 (2018-08-20), pages 1 - 102, XP055956679, Retrieved from the Internet
TANG DI, ZHOU ZHE, ZHANG YINQIAN, ZHANG KEHUAN: "Face Flashing: a Secure Liveness Detection Protocol based on Light Reflections", ARXIV.ORG, 22 August 2018 (2018-08-22), pages 1 - 15, XP081260658, Retrieved from the Internet
BHOWMIK, M. K. ET AL.: "Thermal Infrared Face Recognition - a Biometric Identification Technique for Robust Security System", INTECH OPEN, 27 July 2011 (2011-07-27), pages 1 - 338, XP055956686, Retrieved from the Internet
Attorney, Agent or Firm:
PHILLIPS ORMONDE FITZPATRICK (AU)
Download PDF:
Claims:
- 24 -

What is claimed is:

1 . A subject monitoring system, including: a near infrared illumination source configured to illuminate a scene with infrared light having spatial beam characteristics to generate a spatial pattern; an image sensor configured to capture one or more images of the scene when illuminated by the illumination source; and a processor configured to process the captured one or more images by: determining a degree of presence or modification of the spatial pattern by objects in the scene within pixel sub-regions of an image; and classifying one or more pixel sub-regions of the image as including human skin or other material based on the degree of modification of the spatial pattern identified in that pixel sub-region.

2. The system according to claim 1 wherein determining a degree of presence or modification of the spatial pattern includes determining a spatial power spectral density of the pixel sub-regions.

3. The system according to claim 2 wherein classifying a pixel sub-region as including human skin includes determining a spatial frequency response of a spatial filter matched to the power spectral density of the spatial pattern.

4. The system according to claim 1 wherein the classification includes applying a threshold to the spatial frequency response of the spatial filter, wherein a spectral power of the spatial frequency response that is below the threshold is designated as including human skin.

5. The system according to claim 4 wherein the threshold is determined by a statistical analysis of the filter response against a database containing images of human skin versus other materials.

6. The system according to claim 2 wherein the processor includes a machine learned classifier module trained to detect human skin versus other materials based on the spatial power spectral density of pixel sub-regions under different illumination conditions. . The system according to claim 6 wherein the illumination conditions include an exposure time of the image sensor. . The system according to claim 6 wherein the illumination conditions include an intensity of the infrared light emitted from the illumination source. . The system according to any one of the preceding claims wherein the processor is configured to perform the step of determining whether or not a human subject is present in the imaged scene based on a detected level of human skin in the image. 0. The system according to any one of the preceding claims wherein the processor is configured to perform face detection to detect a facial pixel region in the one or more images and wherein the one or more pixel sub-regions includes the facial pixel region. 1 . The system according to claim 9 wherein the processor is configured to estimate a distance to the face from the image sensor and wherein the threshold of spatial filtering is modified based on the estimated distance. 2. The system according to any one of the preceding claims wherein infrared light has spectral power in the wavelength range from 650 nm to 1200 nm and wherein the spatial pattern includes a spatial wavelength in the range of 1 mm to 10 mm. 3. The system according to any one of the preceding claims wherein the illumination source includes a laser. 4. The system according to claim 13 wherein the laser is a vertical cavity surface emitting laser (VCSEL). 5. The system according to claim 13 or claim 14 wherein the spatial pattern includes a speckle pattern produced from diffuse reflection of the laser source from a surface.6. The system according to claim 15 wherein the processor is configured to control an exposure time of the image sensor and/or an illumination time of the illumination source to enhance or reduce the amount of speckle detected in the images. 7. The system according to claim 15 wherein the VCSEL is configured to be driven in different operating modes to increase or decrease an amount of speckle pattern appearing on objects imaged in the scene. The system according to claim 17 wherein the different operating modes include a multitransverse mode emission regime or an incoherent regime. The system according to any one of claims 1 to 12 wherein the illumination source includes a light emitting diode (LED) emitting a beam that is spatially encoded with the spatial pattern. The system according to any one of the preceding claims wherein the illumination source is configured to selectively adjust an intensity of the spatial pattern. The system according to any one of the preceding claims including a diffractive element disposed adjacent the output of the illumination source to generate all or part of the spatial pattern. The system according to any one of the preceding claims wherein determining a degree of modification of the spatial pattern includes capturing multiple images of the scene under different illumination conditions and comparing the spectral profiles of pixel subregions across the multiple images. The system according to any one of the preceding claims wherein determining a degree of modification of the spatial pattern includes capturing multiple images of the scene under different illumination conditions and comparing the intensity profiles of pixel subregions across the multiple images. The system according to any one of the preceding claims wherein the subject monitoring system is a driver monitoring system of a vehicle. The system according to any one of claims 1 to 23 wherein the subject monitoring system is an occupant monitoring system of a vehicle. A subject monitoring system, including: a near infrared laser source configured to illuminate a scene with infrared light; an image sensor configured to capture one or more images of the scene when illuminated by the laser source; and a processor configured to process the captured one or more images by: determining a degree of presence or modification of a laser speckle pattern by objects in the scene within pixel sub-regions of an image; and - 27 - classifying one or more pixel sub-regions of the image as including human skin or other material based on the degree of presence or modification of the speckle pattern identified in that pixel sub-region. A method of detecting human skin in images captured under illumination from an infrared light source with infrared light having spatial beam characteristics to generate a spatial pattern, the method including the steps: receiving one or more of the images; processing, at a processor, the one or more images to determine a degree of presence or modification of the spatial pattern by objects in the images within pixel sub-regions of the one or more images; and classifying one or more pixel sub-regions of the one or more image as including human skin or other material based on the degree of presence or modification of the spatial pattern identified in that pixel sub-region. The method according to claim 27 wherein the infrared light source is a laser and wherein the spatial pattern includes a speckle pattern produced from diffuse reflection of the laser source from a surface.

Description:
SYSTEM AND METHOD FOR SKIN DETECTION IN IMAGES

FIELD OF THE INVENTION

[0001 ] The present application relates to subject imaging systems and in particular to liveness detection in images.

[0002] Embodiments of the present invention are particularly adapted for detecting liveness of subjects in images such as in facial recognition systems to identify real persons from fakes. However, it will be appreciated that the invention is applicable in broader contexts and other applications.

BACKGROUND

[0003] Systems that monitor people by observing faces and/or eyes under near-infrared (NIR) lighting conditions are becoming widely used in society. A common application is a biometric system that identifies an individual’s identity from the appearance of their face, or pattern of their iris, or indeed any other measurable biometric “signature”. These systems are now available in mobile phones that allow unlocking of the phone with only the user’s face.

[0004] A second application that is becoming more common in society relates to systems to monitor the interior of vehicles such as cars, trucks, trains, aircraft, for safety, comfort and/or convenience purposes. By way of example, these systems include monitoring regions inside or outside the vehicle for occupancy and checking driver attention for signs of driver impairment. These systems are particularly important for safety as they are able to detect if the driver of a car is able to perform the driving task due adequately.

[0005] For economic reasons, these kinds of systems are more competitive if they can be achieved with only a single 2D image sensor and near-infrared light source. Additional sensing apparatus such as 3D depth sensing approaches, are typically less competitive due to additional costs of complex electronic components.

[0006] A current limitation of these infrared monitoring systems today is that they may be fooled into operating with fake faces, such as pictures of faces, masks or puppets/mannequins that show realistic face appearances when viewed on a conventional 2D image sensor.

[0007] This is a particularly common issue in the case of biometric security access control systems, where an unauthorized party may attempt to gain access by spoofing a facial recognition system. However, the issue also exists in monitoring systems that are not only concerned with individual identity, but which are intended to operate on people in general, and so conversely, should not operate (ignore) when presented with a non-person.

[0008] For example, a semi-autonomous vehicle may drive automatically under limited conditions, including where the driver is in a supervisory role and required to pay sufficient attention to the road and the car’s automatic control system to maximize safety. In this example, an infrared driver monitoring system is typically employed to monitor the driver’s attention.

[0009] A plausible scenario is a driver attempting to fool such a system into deciding that he or she is paying adequate attention to the road, when they may wish to take a nap, look at their phone etc. Such a scenario may be detected by “liveness” detection of the driver being monitored. There are already existing solutions in the category of liveness detection.

[0010] A common technique to detect liveness is to observe the person for signs of life-like movement, such as facial expressions, eye-movements, blinking. These methods may be referred to as “behaviour-based” liveness detection methods.

[001 1 ] Another class of method employs “deep-learned” neural networks trained on large quantities of real versus fake images of humans to make decisions in shorter time periods. However, this technique requires time and cost to build up the dataset and train the network.

[0012] Any discussion of the background art throughout the specification should in no way be considered as an admission that such art is widely known or forms part of common general knowledge in the field.

SUMMARY OF THE INVENTION

[0013] In accordance with a first aspect of the present invention, there is provided a subject monitoring system, including: a near infrared illumination source configured to illuminate a scene with infrared light having spatial beam characteristics to generate a spatial pattern; an image sensor configured to capture one or more images of the scene when illuminated by the illumination source; and a processor configured to process the captured one or more images by: determining a degree of modification of the spatial pattern by objects in the scene within pixel sub-regions of an image; and classifying one or more pixel sub-regions of the image as including human skin or other material based on the degree of modification of the spatial pattern identified in that pixel sub-region.

[0014] In the present specification, use of the terms “human skin” is intended to refer to live human skin being a real human surface identified in images. This is in contrast to non-live human skin such as a paper image or photograph of a human that is imaged by a camera.

[0015] In some embodiments, determining a degree of modification of the spatial pattern includes determining a spatial power spectral density of the pixel sub-regions. Classifying a pixel sub-region as including human skin may include determining a spatial frequency response of a spatial filter matched to the power spectral density of the spatial pattern.

[0016] In some embodiments, the classification includes applying a threshold to the spatial frequency response of the spatial filter, wherein a spectral power of the spatial frequency response that is below the threshold is designated as including human skin. In some embodiments, the threshold is determined by a statistical analysis of the filter response against a database containing images of human skin versus other materials.

[0017] In some embodiments, the processor includes a machine learned classifier module trained to detect human skin versus other materials based on the spatial power spectral density of pixel sub-regions under different illumination conditions. In some embodiments, the illumination conditions include an exposure time of the image sensor. In some embodiments, the illumination conditions include an intensity of the infrared light emitted from the illumination source.

[0018] In some embodiments, the processor is configured to perform the step of determining whether or not a human subject is present in the imaged scene based on a detected level of human skin in the image.

[0019] In some embodiments, the processor is configured to perform face detection to detect a facial pixel region in the one or more images and wherein the one or more pixel subregions includes the facial pixel region. In some embodiments, the processor is configured to estimate a distance to the face from the image sensor and wherein the threshold of spatial filtering is modified based on the estimated distance. [0020] Preferably, the infrared light has spectral power in the wavelength range from 650 nm to 1200 nm and wherein the spatial pattern includes a spatial wavelength in the range of 1 mm to 10 mm.

[0021 ] In some embodiments, the illumination source includes a laser. Preferably, the laser is a vertical cavity surface emitting laser (VCSEL). In some embodiments, the spatial pattern includes a speckle pattern produced from diffuse reflection of the laser source from a surface.

[0022] In some embodiments, the processor is configured to control an exposure time of the image sensor and/or an illumination time of the illumination source to enhance or reduce the amount of speckle detected in the images.

[0023] In some embodiments, the illumination source includes a light emitting diode (LED) spatially encoded with the spatial pattern.

[0024] In some embodiments, the illumination source is configured to selectively adjust an intensity of the spatial pattern.

[0025] In some embodiments, the system includes a diffractive element disposed adjacent the output of the illumination source to generate all or part of the spatial pattern.

[0026] In some embodiments, the subject monitoring system is a driver monitoring system of a vehicle. In other embodiments, the subject monitoring system is an occupant monitoring system of a vehicle.

[0027] In accordance with a second aspect of the present invention, there is provided a subject monitoring system, including: illuminating a scene with infrared light; an image sensor configured to capture one or more images of the scene when illuminated by the laser source; and a processor configured to process the captured one or more images by: determining a degree of modification of a laser speckle pattern by objects in the scene within pixel sub-regions of an image; and classifying one or more pixel sub-regions of the image as including human skin or other material based on the degree of modification of the speckle pattern identified in that pixel sub-region. [0028] In accordance with a third aspect of the present invention, there is a method of detecting human skin, the method including the steps: illuminating a scene from a light source with infrared light having spatial beam characteristics to generate a spatial pattern; controlling an image sensor to capture one or more images of the scene when illuminated by the illumination source; and processing, by a processor, the captured one or more images by: determining a degree of presence or modification of the spatial pattern by objects in the scene within pixel sub-regions of an image; and classifying one or more pixel sub-regions of the image as including human skin or other material based on the degree of presence or modification of the spatial pattern identified in that pixel sub-region.

[0029] In accordance with a fourth aspect of the present invention, there is provided a method of detecting human skin, the method including the steps: illuminating a scene from a near infrared light source with infrared light; controlling an image sensor to capture one or more images of the scene when illuminated by the laser source; and processing, by a processor, the captured one or more images by: determining a degree of modification of a laser speckle pattern by objects in the scene within pixel sub-regions of an image; and classifying one or more pixel sub-regions of the image as including human skin or other material based on the degree of modification of the speckle pattern identified in that pixel sub-region.

[0030] In some embodiments, the step of determining a presence or modification of a laser speckle pattern or an encoded spatial pattern in the pixel sub-regions includes applying a matched filter to the spectral or spatial pixel data of the pixel sub-regions. In other embodiments, this step is performed by a a machine learned classifier.

[0031 ] In some embodiments, the methods include the step of performing, by the processor, face detection on the one or more images to detect one or more facial regions and designating one or more of the pixel sub-regions as being facial pixel sub-regions which fall wholly or partially within the facial region.

[0032] In some embodiments, the methods include the step of transforming image pixel data for the pixel sub-regions to a spatial frequency domain using a Fast Fourier Transform (FFT) or other spectral transfer method.

[0033] In accordance with a fifth aspect of the present invention, there is provided a subject monitoring system, including: a near infrared illumination source configured to illuminate a scene with infrared light having a predefined spatial optical pattern; an image sensor configured to capture one or more images of the scene when illuminated by the illumination source; and a processor configured to process the captured one or more images by: determining one or more properties of the spatial optical pattern within pixel sub-regions of an image based on reflection of the infrared light from surfaces in the scene; and classifying one or more pixel sub-regions of the image as including a surface of human skin based on the one or more properties of the spatial optical pattern identified in the one or more pixel sub-regions.

[0034] In accordance with a sixth aspect of the present invention, there is provided a method of detecting human skin in images captured under illumination from an infrared light source with infrared light having spatial beam characteristics to generate a spatial pattern, the method including the steps: receiving one or more of the images; processing, at a processor, the one or more images to determine a degree of presence or modification of the spatial pattern by objects in the images within pixel sub-regions of the one or more images; and classifying one or more pixel sub-regions of the one or more image as including human skin or other material based on the degree of presence or modification of the spatial pattern identified in that pixel sub-region. [0035] In some embodiments, the infrared light source is a laser and wherein the spatial pattern includes a speckle pattern produced from diffuse reflection of the laser source from a surface.

BRIEF DESCRIPTION OF THE FIGURES

[0036] Example embodiments of the disclosure will now be described, by way of example only, with reference to the accompanying drawings in which:

Figure 1 is a perspective view of the interior of a vehicle having a driver monitoring system including a camera and a light source installed therein;

Figure 2 is a driver’s perspective view of an automobile dashboard having the driver monitoring system of Figure 1 installed therein;

Figure 3 is a schematic functional view of a driver monitoring system according to Figures 1 and 2;

Figure 4 is an image of a person next to a mannequin illustrating a laser speckle effect visible on the mannequin but not the person;

Figure 5 is an image of a person holding a piece of paper illustrating a laser speckle effect visible on the paper but not the person;

Figure 6 is a plan view of the driver monitoring system of Figures 1 to 3 showing a camera field of view and a VCSEL illumination field on a subject;

Figure 7 is a process flow diagram illustrating the primary steps in a method of detecting human skin in images based on detection of a laser speckle pattern;

Figure 8 is an illustration of an image of a subject divided into an array of pixel subregions;

Figure 9 is an illustration of the image of Figure 8 with pixel sub-regions corresponding to the subject’s face region being shaded;

Figure 10 is a plan view of a driver monitoring system showing a camera field of view and an LED beam encoded with a spatial pattern using a diffractive optical element to illuminate a subject;

Figure 1 1 is a process flow diagram illustrating the primary steps in a method of detecting human skin in images based on detection of an encoded spatial pattern; and Figure 12 is a process flow diagram illustrating the primary steps in an image processing algorithm to detect human skin in images.

DESCRIPTION OF THE INVENTION

[0037] The embodiments of the invention described herein relate to detection of skin in a scene imaged by a digital image sensor. The embodiments will be described with specific reference to subject monitoring systems such as driver monitoring systems. One example is monitoring a driver or passengers of an automobile or, for example, other vehicles such as a bus, train or airplane. Additionally, the described system and method may be applied to other systems that image humans such as biometric security systems.

System overview

[0038] Referring initially to Figures 1 to 3, there is illustrated a driver monitoring system 100 for capturing images of a vehicle driver 102 during operation of a vehicle 104. System 100 is further adapted for performing various image processing algorithms on the captured images such as facial detection, facial feature detection, facial recognition, facial feature recognition, facial tracking or facial feature tracking, such as tracking a person’s eyes. Example image processing routines are described in US Patent 7,043,056 to Edwards et al. entitled “Facial Image Processing System” and assigned to Seeing Machines Pty Ltd, the contents of which are incorporated herein by way of cross-reference.

[0039] As best illustrated in Figure 2, system 100 includes an imaging camera 106 that is positioned on or in the vehicle dash 107 instrument display and oriented to capture images of the driver’s face in the infrared wavelength range to identify, locate and track one or more human facial features.

[0040] Camera 106 may be a conventional CCD or CMOS based digital camera having a two dimensional array of photosensitive pixels and optionally the capability to determine range or depth (such as through one or more phase detect elements). The photosensitive pixels are capable of sensing electromagnetic radiation in the infrared range. Camera 106 may also be a three dimensional camera such as a time-of-flight camera or other scanning or range-based camera capable of imaging a scene in three dimensions. In other embodiments, camera 106 may be replaced by a pair of like cameras operating in a stereo configuration and calibrated to extract depth. Although camera 106 is preferably configured to image in the infrared wavelength range, it will be appreciated that, in alternative embodiments, camera 106 may image in the visible range.

[0041 ] Referring still to Figure 2, system 100, in a first embodiment, also includes an infrared light source 108 such as a Vertical Cavity Surface Emitting Laser (VCSEL), Light Emitting Diode (LED) or other light source. In further embodiments, multiple VCSELs, LEDs or other light sources may be employed to illuminate driver 102. In some embodiments, other low powered coherent infrared light sources may be used as light source 108. Light source 108 is preferably located proximate to the camera on vehicle dash 107 such as within a distance of 5 mm to 50 mm.

[0042] Light source 108 is adapted to illuminate driver 102 with infrared radiation, during predefined image capture periods when camera 106 is capturing an image, so as to enhance the driver’s face to obtain high quality images of the driver’s face or facial features. Operation of camera 106 and light source 108 in the infrared range reduces visual distraction to the driver. Operation of camera 106 and light source 108 is controlled by an associated controller 1 12 which comprises a computer processor or microprocessor and memory for storing and buffering the captured images from camera 106.

[0043] As best illustrated in Figure 2, camera 106 and light source 108 may be manufactured or built as a single unit 1 1 1 having a common housing. The unit 1 11 is shown installed in a vehicle dash 107 and may be fitted during manufacture of the vehicle or installed subsequently as an after-market product. In other embodiments, the driver monitoring system 100 may include one or more cameras and light sources mounted in any location suitable to capture images of the head or facial features of a driver, subject and/or passenger in a vehicle. By way of example, cameras and light sources may be located on a steering column, rearview mirror, center console or driver's side A-pillar of the vehicle. In the illustrated embodiment, the light source includes a single VCSEL or LED. In other embodiments, the light source (or each light source in the case of multiple light sources) may each include a plurality of individual VCSELs and/or LEDs.

[0044] Turning now to Figure 3, the functional components of system 100 are illustrated schematically. A system controller 1 12 acts as the central processor for system 100 and is configured to perform a number of functions as described below. Controller 1 12 is located within the dash 107 of vehicle 104 and may be connected to or integral with the vehicle onboard computer. In another embodiment, controller 1 12 may be located within a housing or module together with camera 106 and light source 108. The housing or module is able to be sold as an after-market product, mounted to a vehicle dash and subsequently calibrated for use in that vehicle. In further embodiments, such as flight simulators, controller 1 12 may be an external computer or unit such as a personal computer.

[0045] Controller 1 12 may be implemented as any form of computer processing device or portion of a device that processes electronic data, e.g., from registers and/or memory to transform that electronic data into other electronic data that, e.g., may be stored in registers and/or memory. As illustrated in Figure 3, controller 1 12 includes a microprocessor 114 (or multiple microprocessors, integrated circuits or chips operating in conjunction with each other), executing code stored in memory 1 16, such as random access memory (RAM), readonly memory (ROM), electrically erasable programmable read-only memory (EEPROM), and other equivalent memory or storage systems as should be readily apparent to those skilled in the art.

[0046] Microprocessor 1 14 of controller 112 includes a vision processor 1 18 and a device controller 120. Vision processor 1 18 and device controller 120 represent functional elements which are both performed by microprocessor 114. However, it will be appreciated that, in alternative embodiments, vision processor 1 18 and device controller 120 may be realized as separate hardware such as microprocessors in conjunction with custom or specialized circuitry.

[0047] Vision processor 1 18 is configured to process the captured images to perform the driver monitoring; for example to determine a three dimensional head pose and/or eye gaze position of the driver 102 within the monitoring environment. To achieve this, vision processor 1 18 utilizes one or more eye gaze determination algorithms. This may include, by way of example, the methodology described in US Patent 7,043,056 to Edwards et al. entitled “Facial Image Processing System” and assigned to Seeing Machines Pty Ltd. Vision processor 1 18 may also perform various other functions including determining attributes of the driver 102 such as eye closure, blink rate and tracking the driver’s head motion to detect driver attention, sleepiness or other issues that may interfere with the driver safely operating the vehicle.

[0048] The raw image data, gaze position data and other data obtained by vision processor 1 18 is stored in memory 1 16. [0049] Device controller 120 is configured to control camera 106 and to selectively actuate light source 108 in a sequenced manner in sync with the exposure time of camera 106. In the case where two VCSELs or LEDS are provided, the light sources may be controlled to activate alternately during even and odd image frames to perform a strobing sequence. Other illumination sequences may be performed by device controller 120, such as L,L,R,R,L,L,R,R... or L,R,0,L,R,0,L,R,0... where “L” represents a left mounted light source, “R” represents right mounted light source and “0” represents an image frame captured while both light sources are deactivated, light source 108 is preferably electrically connected to device controller 120 but may also be controlled wirelessly by controller 120 through wireless communication such as Bluetooth™ or WiFi™ communication.

[0050] Thus, during operation of vehicle 104, device controller 120 activates camera 106 to capture images of the face of driver 102 in a video sequence, light source 108 is activated and deactivated in synchronization with consecutive image frames captured by camera 106 to illuminate the driver during image capture. Working in conjunction, device controller 120 and vision processor 1 18 provide for capturing and processing images of the driver to obtain driver state information such as drowsiness, attention and gaze position during an ordinary operation of vehicle 104.

[0051 ] Additional components of the system may also be included within the common housing of unit 1 1 1 or may be provided as separate components according to other additional embodiments. In one embodiment, the operation of controller 1 12 is performed by an onboard vehicle computer system which is connected to camera 106 and light source 108.

Skin detection based on laser speckle detection

[0052] Lasers such as VCSELs are highly spatially and temporally coherent and can produce monochromatic light. When incident onto a diffuse or rough surface, laser beams can produce a “speckle” pattern. A speckle pattern is produced as a result of mutual interference of coherent wavefronts reflected off the diffuse surface. At some points, the coherent wavefronts constructively interfere, producing bright spots, while, at other points, the coherent wavefronts destructively interfere, producing dark spots. The resultant speckle pattern is observed as a random pattern of bright and dark spots across a diffuse surface being imaged under laser light. [0053] Although other operating regimes are possible, VCSELs typically emit optical power at wavelengths in the range of 650 nm to 1200 nm, which is predominantly in the near infrared (NIR) range. Thus, by illuminating a scene with a laser such as a VCSEL, the scene is illuminated with near infrared light having an inherent coherent spatial pattern which, when incident onto different surfaces, produces different optical effects.

[0054] Reflection of NIR light from biological tissues such as skin includes penetration and scattering from sub-surface tissues in addition to surface tissues. This multilayer reflection means frequencies of spatially encoded information in the light (such as a projected noise pattern) are averaged out and effectively low-pass filtered by the reflection process. The process of scattering decoheres (blurs) the information encoded in the light. Furthermore, the reflections off multiple layers of surface and sub-surface tissue acts to average out any produced speckle patterns that are produced by each diffuse layer of tissue. Thus, biological tissues such as human skin have a low pass filtering effect. Spatial frequencies that are similar in dimension to the penetration depth of the light will be more highly attenuated by this physical process. The penetration depth of NIR light in human skin is typically in the range of 1 -5 mm but may be up to 10 mm.

[0055] The inventors have identified that this low-pass filtering effect can be utilised to distinguish human skin from other surfaces, which has practical applications in detecting fakes such as masks in subject monitoring systems. This effect is illustrated in Figure 4, in which a real person and a dummy are illuminated with 940 nm light from a VCSEL. A speckle effect can be clearly seen on the surface of the dummy but not on the skin of the real person. Similarly, Figure 5 illustrates a real person holding a piece of paper to demonstrate that the speckle effect can also be observed in paper. In both examples, the 940 nm wavelength light penetrates the skin layers to several mm and this greatly reduces the speckle intensity in the received image for human skin versus materials that NIR does not penetrate.

[0056] Embodiments of the present invention leverage this low-pass filtering effect of NIR wavelengths, as described below.

[0057] Referring to Figure 6, there is illustrated a plan view of subject monitoring system 600 configured to classify image regions as containing human skin or other predefined material. System 600 includes many features similar to system 100 and corresponding features are designated with like reference numerals for simplicity. [0058] System 600 is configured to perform a method 700 of detecting human skin as illustrated in Figure 7 and the operation of system 600 will be described herein with reference to the steps of method 700. However, it will be appreciated that system 600 may be capable of performing a wide variety of other functions such as drowsiness and attention monitoring based on detection of eye gaze vectors, facial features and/or head pose vectors of a subject detected in images.

[0059] In system 600, the light source is a VCSEL 108a. VCSEL 108a represents a near infrared laser source that, at step 701 , is configured to illuminate a scene, including subject 102 with highly coherent infrared light such as NIR in the range of 650 nm to 1200 nm. The infrared light from VCSEL has spatial beam characteristics that produce a speckle pattern on diffuse surfaces due to the effect described above. Typically these spatial beam characteristics include a high spatial coherence which is inherent to VCSELs. In some embodiments, the infrared light includes, as spatial beam characteristics a spatial pattern such as a spatially encoded pattern or natural spatial pattern inherent to the VCSEL.

[0060] At step 702, camera 106 includes an image sensor configured to capture one or more images of the scene including subject 102 when illuminated by VCSEL 108a. Vision processor 1 18 is configured to process the captured images at steps 703 and 704. The image processing includes, at step 703, determining a degree of presence or modification of the speckle pattern by objects in the scene within predefined pixel sub-regions of an image. The presence of the speckle pattern may refer simply to the positive detection of a speckle pattern with a pixel sub-region or it may refer to an amount (e.g. an intensity) of speckle pattern detected. In some embodiments, a threshold intensity of speckle pattern may be set before vision processor 1 18 determines that a speckle pattern is present in that pixel sub-region. This is discussed in more detail below. A modification of the speckle pattern may include a change in intensity or spatial distribution of the speckle pattern relative to other pixel sub-regions or relative to one or more predefined or reference speckle pattern parameters.

[0061 ] In some embodiments, the pixel sub regions include regions of the image divided up by pixel position. For example, as illustrated in Figure 8, an image 800 may be divided into a grid of square or rectangular sub-regions 801 and a degree of presence or modification of the speckle pattern in each sub-region is performed. In some embodiments, vision processor 1 18 is to perform face detection to detect a facial pixel region in the one or more images and one or more of the pixel sub-regions can be characterised as being within or outside the facial pixel region. The facial pixel region may be represented as a single pixel sub-region or divided into multiple pixel sub-regions corresponding to the detected facial region. Figure 9 illustrates schematically an example of a plurality of pixel sub-regions being characterised as being within a facial pixel region as indicated by the shaded pixel regions.

[0062] Finally, at step 704, vision processor 1 18 classifies one or more pixel sub-regions of the image as including human skin or other material based on the degree of presence or modification of the speckle pattern identified in that pixel sub-region. Processor 1 18 classifies a pixel sub-region as containing human skin when a sufficient presence of a speckle pattern is detected within the pixel sub-region. This may be determined by a threshold value or a confidence measure, which may involve pattern recognition and/or spectral analysis as described below.

[0063] Determining a degree of presence or modification of the speckle pattern at step 703 may include determining a spatial power spectral density of the pixel sub-regions in the spatial domain by vision processor 1 18. The power spectral density may be obtained by a number of known image processing techniques such as the Fast Fourier Transform method. This spectral technique involves determining the power distribution (or pixel intensity) as a function of spatial frequency across the pixels corresponding to the or each pixel sub-region. The speckle pattern will be imaged as a spatial noise pattern across the pixels and the size, intensity and location of the speckle dots will be dependent on:

• The wavelength of incident light;

• The angle of the incident light onto a surface of the object(s) that are being imaged;

• The distance from the camera to the object(s) being imaged;

• The intensity of the incident light; and

• The structure and composition of the object(s) that are being imaged.

[0064] The speckle pattern will have power at a range of spatial frequencies or wavelengths and the filtering effect of human skin will have prominent effects on wavelengths in the range of 1 mm to 10 mm. By way of example, the speckle pattern may be characterised by a white spatial noise to the image having a substantially flat power spectral density or a pink spatial noise to the image having a 1/k shaped power spectral density (where k represents a spatial frequency parameter). Thus, determining a presence of the speckle pattern may include determining a level or shape of the noise component of the power spectral density present in a pixel sub-region across a range of spatial frequencies or wavelengths.

[0065] Determining a modification of a speckle pattern may include capturing multiple images of the scene under different illumination conditions (e.g. illumination intensity, illumination time, camera exposure time or driving VCSEL 108a in different modes) and comparing the power spectral density of the different pixel sub-regions across the different images.

[0066] The classification at step 704 may be performed in a number of different ways. In some embodiments, classifying a pixel sub-region as including human skin includes developing an image filter that is matched to the speckle induced noise pattern observed in the received image (or at least the one or more pixel sub-regions). In this manner, a frequency response of the spatial filter can be matched to the power spectral density of the spatial noise pattern resulting from the speckle pattern. The image filter may be implemented in code as part of the image processing algorithm.

[0067] The filter may be configured to have a reduced output spatial frequency response (versus a majority of alternate materials) when the image region contains human skin. A threshold may be applied to the output spatial frequency response of the image filter that determines if the received image is human skin or not. Here, a detected spectral power of the spatial frequency response that is below the threshold may be designated as including human skin. The threshold may be pre-determined by statistical analysis of the filter’s response against a database containing images of human skin versus other materials under the specific illumination conditions of the system, in order to improve the classification performance.

[0068] In other embodiments, vision processor 118 includes a machine learned classifier module trained to detect human skin versus other materials based on the spatial power spectral density of pixel sub-regions under different illumination conditions. This machine learned classifier is used to perform the classification at step 704. In some embodiments, the classifier may be an artificial neural network, decision tree or probabilistic classifier such as a Bayesian network. The classifier may be trained by feeding a dataset of images of actual human faces (including facial hair and sunglasses), images of face masks, photographs and images of other materials commonly used to spoof a human such as paper, plastics and fabrics. The images may be captured under differing illumination conditions such as at different wavelengths of light, angles of incidence, ambient light conditions, intensity of emitted light and exposure time of the image sensor.

[0069] The detection and classification may also be performed in the spatial domain. The presence of a speckle pattern may be characterised as an intensity modulation added to the image of the actual scene or objects within the scene. Determining a presence or modification of the speckle pattern may include detection of the amplitude of this intensity modulation across the pixel sub-regions.

[0070] Given that the prominence of speckle in images is dependent on the illumination conditions, in some embodiments, device controller 120 is configured to control an exposure time of the image sensor of camera 106 and/or an illumination time of VCSEL 108a to enhance or reduce the potential amount of speckle detected in the images. In addition, device controller 120 may also be configured to control an illumination intensity of VCSEL 108a to enhance or reduce the potential amount of speckle detected in images.

[0071 ] In some embodiments, at step 704, a confidence or likelihood estimate is also calculated. This confidence estimate may be based on the number and/or location of the pixel sub-regions classified as including human skin. The output of classification step 704 may also include this confidence measure and, if the confidence measure is below a threshold, classification step 704 may output a negative result on the basis that the confidence is too low to confirm the presence of human skin.

[0072] In some embodiments, a threshold number of pixels or pixel sub-regions are required to classify the image as including human skin to achieve a threshold confidence level. In some embodiments, the location of the pixel sub-regions classified as including human skin are taken into account to determine a likelihood that human skin has been detected. For example, if a cluster of adjacent pixel sub-regions are classified as including human skin, then a higher confidence is output.

[0073] As mentioned above, vision processor 1 18 may be configured to perform face detection to detect a facial pixel region in the one or more images. Detection of this facial region may be used as an initial filter to only focus the classification on pixels or pixel subregions within this facial region (see Figure 9 for an exemplary facial region).

[0074] Alternatively, detection of the facial region may be used as a subsequent step to improve the classification of pixel sub-regions being human skin or not. In these embodiments, the classification of skin versus non-skin in pixel sub-regions is performed independently of any facial regions and any pixel sub-regions classified as including human skin are then correlated with their position relative to a face detected in the image. If a pixel sub-region that is classified as including human skin is located within a region identified as a facial region, then that pixel sub-region may be designated with a higher likelihood of being human skin. Alternatively, if a pixel sub-region classified as including human skin is located outside a region identified as a facial region, then that pixel sub-region may be designated with a lower likelihood of being human skin.

[0075] In some embodiments, vision processor 1 18 is configured to estimate a distance to a face or other object identified in an image. As the speckle pattern is an interference pattern, this varies with distance to an imaged object and information on the distance may be fed to the classifier or image filter for improved classification of the object material based on the presence of modification of a speckle pattern. In some embodiments, the threshold of spatial filtering may be modified based on the estimated distance.

[0076] In some embodiments, the output of method 700 may solely be used to perform a liveness assessment to detect the presence of a real human in the captured images. In other embodiments, the output of method 700 provides an input to a broader liveness detection algorithm or system that is configured to determine whether or not a human subject is present in the imaged scene. In this regard, the broader liveness detection algorithm may take in other inputs such as eye and head movements of the subject over a sequence of images and collate the inputs to provide an estimate or probability that there is a human present in the images.

[0077] As the speckle pattern generally adds spatial noise to an image, it is advantageous to reduce speckle when performing subject monitoring so that facial features can be more clearly distinguished. For this reason, it is desirable to be able to actively switch the VCSEL between a high speckle mode for detecting human skin and a low speckle mode for performing subject monitoring. This can be achieved as described below.

[0078] VCSELs are coherent laser light sources. They emit light from the surface of a semiconductor. A single VCSEL device typically has many individual sources. In any VCSEL device design, these sources can be intentionally designed to be synchronized and in-phase, or synchronized and out-of-phase (by some degree), or unsynchronized (incoherent). [0079] In this regard, it is possible to reduce the speckle by driving the VCSELs into a multi-transverse mode emission regime to reduce the amount of mode overlap, or even into an incoherent regime where transverse modes break down and individual emitters support multiple beamlets. In such a mode, the coherence of the laser output is reduced, which reduces the coherent backscattering that results in the speckle effect. The exposure time of camera 106 and/or illumination pulse time of VSCEL 108a may also be controlled to selectively increase or decrease the amount of speckle effect from diffuse surfaces being imaged. Therefore, by selectively controlling these parameters, system 100 may be switched between a high speckle skin detection mode and a low speckle subject monitoring mode.

Skin detection based on encoded spatial pattern

[0080] The incoherent light (having random phase) produced by LEDs makes them a good ambient illumination source, particularly for subject monitoring systems.

[0081 ] Broadband light sources such as LEDs have a low coherence and produce individual speckle or noise patterns for each wavelength. The different speckle patterns tend to average each other out and, as a result, no distinct speckle pattern can generally be observed in images illuminated by LEDs. However, noting that biological tissues perform a low-pass filtering effect for the reasons mentioned above, it is possible to design a system in which light from an incoherent LED can be used to distinguish human skin from other materials in images.

[0082] To achieve this, the LED may be driven by device controller 120 to actively encode a spatial and/or temporal pattern into an illumination beam. This avoids the need to use a laser as the light source and LEDs may be used in place of light source 108. Such a system 1000 is illustrated in Figure 10 wherein light source 108 is represented as NIR emitting LED 108b. System 1000 is configured to perform method 1 100 illustrated in Figure 1 1.

[0083] In system 1000, a diffractive optical element 1002 such as a pattern generator is positioned in the path of LED beam 1004 to produce a structured beam 1006 having an encoded spatial pattern. By way of example, the encoded pattern may include a grid of dots or collimated beamlets or some other periodic structure. The spatial pattern encoded into beam 1006 preferably includes a spatial structure having spatial wavelengths in the range of 1 mm to 10 mm, which is the range of primary absorption of human skin for NIR light. Although shown as a separate device, diffractive optical element 1002 may be integral with LED 108b.

[0084] In some embodiments, device controller 120 is configured to control LED 108b to temporally modulate LED beam 1004 to encode a temporal pattern to beam 1004.

[0085] During operation of system 1000, at step 1101 , the subject 102 is illuminated with NIR light from structured beam 1006 having the encoded spatial pattern. At step 1 102, device controller 120 controls the image sensor of camera 106 to image the returned light from subject 102 and the surrounding scene. At step 1 103, vision processor 118 is configured to process the captured images and determine a degree of presence or modification of the encoded spatial and/or temporal pattern by objects in the scene within pixel sub-regions of an image.

[0086] Similar processes as described above for detecting speckle may be used to detect the spatial pattern here. For example, a spectral analysis may be performed to determine the spatial power spectral density of pixel sub-regions. Filtering effects of human skin on NIR light will be apparent by a dip in spectral power spectral within the range of 1 mm to 10 mm where.

[0087] Finally, as with the speckle detection, at step 1 104, vision processor 1 18 is configured to classify one or more pixel sub-regions of the image as including human skin or other material based on the degree of presence or modification of the spatial pattern identified in that pixel sub-region. The classification may include developing an image filter that is matched to the spatial noise pattern. Alternatively or in addition, vision processor 1 18 may use a machine learned classifier module trained to detect human skin versus other materials based on the spatial power spectral density of pixel sub-regions under different illumination conditions. In some embodiments, the machine learned classifier may be trained to detect intensity variations in the spatial pattern across different pixel sub-regions in the images.

[0088] Following the above description, Figure 12 illustrates the primary steps of an exemplary algorithm 1200 to detect the presence of human skin in images. At step 1201 , device controller 120 controls camera 106 to capture images of subject 102 and the surrounding scene under illumination from controlled light source 108 such as VCSEL 108a or LED 108b described above. At step 1202, the captured images (or a subset thereof) are processed by vision processor 1 18 to determine pixel sub-regions within the captured images. At step 1203, optionally face detection is performed to detect a facial region of subject 102 and designate one or more of the pixel sub-regions as being facial pixel sub-regions which fall wholly or partially within the facial region.

[0089] At step 1204, optionally visional processor 1 18 may transform the image pixel data for the pixel sub-regions to the spatial frequency domain using a FFT or similar spectral transfer method. In the case where optional step1203 has determined facial pixel sub-regions, step 1204 and subsequent steps may be performed only on the facial pixel sub-regions. Where optional step 1204 is performed, subsequent steps may be performed based on the spectral data in the spectral domain.

[0090] At step 1205, a matched filter is applied to the spectral or spatial pixel data of the pixel sub-regions to capture a presence or modification of a laser speckle pattern or an encoded spatial pattern. Alternatively, a machine learned classifier could be used to perform step 1205.

[0091 ] At step 1206, threshold analysis is performed on the output of the matched filter or classification of step1205 to determine an amount of presence or modification of laser speckle or encoded spatial pattern present in the pixel sub-regions. At step 1207, the pixel sub-regions are classified as including human skin or not based on the threshold analysis of step1206. If the output of the matched filter is less than a predetermined threshold level (overall spectral power or spectral power across a predefined spectral range), then the pixel sub-region is classified as including human skin. The classification of pixel sub-regions may be performed sequentially on individual pixel sub-regions.

[0092] Finally, at step 1208, based on the detection of human skin in the pixel sub-regions, a determination is made as to whether or not a captured image includes a human. This may be based on the number and/or location of pixel sub-regions determined to include human skin and may include a confidence measure.

[0093] It will be appreciated that step 1201 of method 1200 is performed by device controller 120 while steps 1202 to 1208 are performed by vision processor 1 18, all of which may form part of broader processor 112.

[0094] The above described invention provides a new sensing technique which will benefit the real-world problem of liveness detection, through the detection of human skin vs non-skin materials, without additional sensor component costs. INTERPRETATION

[0095] The term “infrared” is used throughout the description and specification. Within the scope of this specification, infrared refers to the general infrared area of the electromagnetic spectrum which includes near infrared, infrared and far infrared frequencies or light waves.

[0096] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "processing," "computing," "calculating," “determining”, analyzing” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities into other data similarly represented as physical quantities.

[0097] In a similar manner, the term “controller” or "processor" may refer to any device or portion of a device that processes electronic data, e.g., from registers and/or memory to transform that electronic data into other electronic data that, e.g., may be stored in registers and/or memory. A “computer” or a “computing machine” or a "computing platform" may include one or more processors.

[0098] Reference throughout this specification to “one embodiment”, “some embodiments” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrases “in one embodiment”, “in some embodiments” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.

[0099] As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

[00100] In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.

[00101 ] It should be appreciated that in the above description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, Fig., or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claims require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this disclosure.

[00102] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the disclosure, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.

[00103] In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

[00104] Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limited to direct connections only. The terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. "Coupled" may mean that two or more elements are either in direct physical, electrical or optical contact, or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.

[00105] Embodiments described herein are intended to cover any adaptations or variations of the present invention. Although the present invention has been described and explained in terms of particular exemplary embodiments, one skilled in the art will realize that additional embodiments can be readily envisioned that are within the scope of the present invention.