Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM FOR GESTURE RECOGNITION
Document Type and Number:
WIPO Patent Application WO/2017/062263
Kind Code:
A1
Abstract:
In the present disclosure, accessory equipment for gesture recognition is provided. The equipment can be used with a camera on an electronic device such as a multifunction portable device. The equipment includes in part a first filtering layer and a second filtering layer. The first filtering layer filters out visible light from the ambient light to obtain invisible light. The invisible light then arrives at the second filtering layer where the spectrum of the invisible light is shifted to a range of visible spectrum. Then the shifted invisible light is received by the camera for gesture recognition. By recognizing gestures based on such images, the background noise can be significantly reduced or eliminated.

Inventors:
RUI YONG (US)
LI ZHIWEI (US)
CAI RUI (US)
Application Number:
PCT/US2016/054567
Publication Date:
April 13, 2017
Filing Date:
September 30, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MICROSOFT TECHNOLOGY LICENSING LLC (US)
International Classes:
G02B5/20; G02B23/12; G06F3/01; H01L31/14; H04N5/33
Domestic Patent References:
WO2015003721A12015-01-15
Foreign References:
US7312434B12007-12-25
US20080048936A12008-02-28
Other References:
None
Attorney, Agent or Firm:
MINHAS, Sandip et al. (US)
Download PDF:
Claims:
CLAIMS

1. A system comprising:

an optical filtering portion, including

a first filtering layer for filtering out visible light from ambient light to obtain invisible light, and

a second filtering layer for shifting spectrum of the invisible light to a range of visible spectrum;

a camera configured to generate an image based on the shifted invisible light in the range of visible spectrum;

a display configured to render content to a user; and

a processor configured to recognize a gesture performed by the user based on the generated image and to control the rendered content based on the recognized gesture.

2. The system of claim 1, wherein the first and second filtering layers are integrated as a single film.

3. The system of claim 1, wherein the first filtering layer is a first film, and wherein the second filtering layer is a second film that is different from the first film, the first and second films fitting to each other.

4. The system of claim 1, wherein at least one of the first and second filtering layers is made of glass, resin, or gel.

5. The system of claim 1, wherein at least one of the first and second filtering layers includes optical coatings with different refractive indexes for reinforcing the invisible light and interfering with the visible light.

6. The system of claim 1, wherein at least one of the first and second filtering layers includes a compound for absorbing the visible light and transmitting the invisible light.

7. The system of claim 1, wherein the display and the camera are located at opposite sides of a device.

8. The system of claim 1, wherein the content rendered on the display includes virtual reality (VR) content.

9. The system of claim 8, wherein a screen area of the display is divided into a plurality of parts for rendering the VR content.

10. Equipment for use with a camera comprising:

a first filtering layer for filtering out visible light from ambient light to obtain invisible light; and

a second filtering layer for shifting spectrum of the invisible light to a range of visible spectrum, the shifted invisible light being used by the camera to generate an image with reduced noise for recognizing a gesture.

11. The equipment of claim 10, wherein at least one of the first and second filtering layers includes a film.

12. The equipment of claim 10, wherein the first and second filtering layers are integrally formed as a film.

13. The equipment of claim 10, wherein the first filtering layer is a first film, and wherein the second filtering layer is a second film that is different from the first film, the first and second films fitting to each other.

14. The equipment of claim 10, wherein at least one of the first and second filtering layers is made of glass, resin, or gel.

15. The equipment of claim 10, wherein at least one of the first and second filtering layers includes optical coatings with different refractive indexes for reinforcing the invisible light and interfering with the visible light.

Description:
SYSTEM FOR GESTURE RECOGNITION

BACKGROUND

[0001] Gesture recognition allows a device to recognize gestures that originate from a bodily motion or state, for example, from the face or hand of a user. In this way, human-machine interface (HMI) of the device enables the user to communicate and interact with the device naturally without mechanical devices. For instance, while the user performs a gesture with his/her finger(s), the device recognizes the gesture and then acts accordingly. As an example, in response to a swipe gesture by the user, a cursor or another object may be moved on the display screen of the device.

[0002] In general, gesture recognition is done by techniques of computer vision and image processing. In gesture recognition, it is usually necessary to separate the foreground such as the finger from background. This fundamental process is very challenging for some cameras such as the RGB cameras on mobile phones, especially when the background contains abundant texture.

SUMMARY

[0003] In accordance with implementations of the subject matter described herein, there is provided accessary equipment for gesture recognition. The equipment may be used with a camera on a primary device such as a multifunction portable electronic device. The equipment includes in part a first filtering layer and a second filtering layer. The first filtering layer filters out visible light from the ambient light to obtain invisible light. The invisible light then arrives at the second filtering layer where the spectrum of the invisible light is shifted to a range of visible spectrum. Then the shifted invisible light is received by the camera for gesture recognition. That is, the imaging at the camera is performed on the basis of light in form of the visible light but containing information of the invisible spectrum. By recognizing gestures based on such images, the background noise can be significantly reduced or eliminated.

[0004] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] FIG. 1 shows a block diagram of a block diagram of an environment where implementations of the subject matter described herein can be implemented; [0006] FIGS. 2 A, 2B and 2C show a schematic diagrams of example accessory equipment in accordance with one implementation of the subject matter described herein;

[0007] FIG. 3 shows a block diagram of the optical filtering portion of the accessory equipment in accordance with implementations of the subject matter described herein;

[0008] FIG. 4 shows an example image of a user's hand which is obtained based on the light filtered by the accessory equipment in accordance with one implementation of the subject matter described herein;

[0009] FIG. 5 shows a block diagram of a device that can be used with the accessory equipment in accordance with implementations of the subject matter described herein;

[0010] FIG. 6 shows a block diagram of an integrated headset in accordance with implementations of the subject matter described herein; and

[0011] FIG. 7 shows a flowchart of a method for using the accessory equipment in accordance with one implementation of the subject matter described herein.

[0012] Throughout the drawings, the same or similar reference symbols are used to indicate the same or similar elements.

DETAILED DESCRIPTION

[0013] The subject matter described herein will now be discussed with reference to several example implementations. It should be understood these implementations are discussed only for enabling those skilled persons in the art to better understand and thus implement the subject matter described herein, rather than suggesting any limitations on the scope of the subject matter.

[0014] As used herein, the term "includes" and its variants are to be read as opened terms that mean "includes, but is not limited to." The term "or" is to be read as "and/or" unless the context clearly indicates otherwise. The term "based on" is to be read as "based at least in part on." The term "one implementation" and "an implementation" are to be read as "at least one implementation." The term "another implementation" is to be read as "at least one other implementation." The terms "first," "second," "third" and the like may refer to different or same objects. Other definitions, explicit and implicit, can be included below.

[0015] In accordance with implementations of the subject matter described herein, accessory equipment for gesture recognition is provided. The equipment may be used with a camera on an electronic device such as a multifunction portable device. The equipment includes in part a first filtering layer and a second filtering layer. The first filtering layer filters out visible light from the ambient light to obtain invisible light. The invisible light is transmitted to the second filtering layer which shifts the spectrum of the invisible light to a range of visible spectrum. Then the shifted invisible light is captured by the camera for gesture recognition. That is, the camera captures the image in form of the visible light but containing information of the invisible light. By recognizing gestures using such images, the background noise can be significantly reduced or eliminated.

[0016] FIG. 1 illustrates a block diagram of an environment where implementations of the subject matter described herein can be implemented. As shown, accessory equipment 100 can be used with a primary device 110. In this example, the primary device 110 is at least partially contained in the accessory equipment 100. In alternative implementations, the accessory equipment 100 and the primary device 110 may work together in other suitable manners. The accessory equipment 100 includes an optical filtering portion 102 and the primary device 110 includes a camera 112. In accordance with implementation of the subject matter described herein, the optical filtering portion 102 is arranged in such a way that the ambient light arrives at the camera 112 of the primary device 110 via the optical filtering portion 102. That is, the optical filtering portion 102 is located at the optical path towards the camera 112. In this way, when a user performs a gesture, the light will be first received by the optical filtering portion 102 of the accessory equipment 100. The light filtered by the optical filtering portion 102 is sensed by the camera 112 for imaging and gesture recognition, which will be discussed below.

[0017] In some implementations, the accessory equipment 100 is a container which can contain at least a part of the primary device 110. FIGS. 2A-2C show schematic diagrams of an example implementation of the accessory equipment 100. In this example, the accessory equipment 100 is a box-shaped container and the primary device 110 is a smart phone which can be contained in the accessory equipment 100. The screen area of display of the primary device 110 can be divided into two or more screen parts, for example. In the shown example, there are two screen parts 114 and 116. The accessory equipment 100 includes one or more viewing holes. In the shown example, there are two viewing holes 124 and 126 which allow the user to view the content rendered on the screen parts 114 and 116, respectively. By rendering virtual reality (VR) content for the right and left eyes on the screen parts 114 and 116, respectively, the accessory equipment 100 can be worn and used by the user as a VR headset.

[0018] As shown in FIG. 2C, the accessory equipment 100 further includes a camera hole 128 via which the camera 112 of the primary device 110 can receive light. In those implementations where the camera 112 and the display are located on opposite sides of the primary device 110, the camera hole 128 and the viewing holes 124 and 126 are located on opposite sides of the accessory equipment 100 when the accessory equipment 100 encapsulates the primary device 110. Other relative locations of the camera hole 128 and the viewing holes 124 and 126 are possible as well. It is to be understood that the primary device 110 may include more than one camera 112 in some implementations. Accordingly, the accessory equipment 100 may have more than one camera hole 128.

[0019] In some implementations, the optical filtering portion 102 is in the form of one or more thin films. In such implementations, the thin film(s) may be arranged on the accessory equipment 100 to cover the camera hole 128. In this way, the light will pass the film(s) before reaching the camera 112. Other arrangements are possible as well. It is to be understood that the optical filtering portion 102 does not necessarily have to be implemented as a film(s). For example, one or more optical filters can act as the optical filtering portion 102. Example implementations will be described in the following paragraphs.

[0020] FIG. 3 shows a block diagram of the optical filtering portion 102 of the accessory equipment 100 in accordance with implementations of the subject matter described herein. In general, the optical filtering portion 102 includes two filtering layers 310 and 320. The ambient light first arrives at the first filtering layer 310 and then reaches the second filtering layer 320. The first filtering layer 310 is a visible light filtering layer which is used to filter out the visible light from the received ambient light. As a result, the light reaching the second filtering layer 320 only contains the invisible light such as infrared light. The second filtering layer 320 is used to shift the spectrum of the received invisible light to a range of the visible spectrum. In one implementation, the wavelength of the invisible light may be modified into the range of 380nm to 760nm, for example. Various techniques for spectrum shifting, no matter currently known or to be developed in the future, which will be described below.

[0021] In some implementations, the first and second filtering layers 310 and 320 may be implemented as one or more films, as described above. In one implementation, the filtering layers 310 and 320 are implemented as two separate layers for filtering the visible light and shifting the invisible light, respectively. These layers may fit to each other. In another implementation, instead of the dual-film configuration, the first and second filtering layers 310 and 320 may be integrally formed as a single film that not only filters out the visible light but also shifts the spectrum of the remaining invisible light. Instead of or in addition to the thin film(s), in other implementations, the first and/or second filtering layers 310 and 320 may be implemented as optical filters, lens and/or other suitable optical devices.

[0022] The first and/or second filtering layers 310 and 320 can be made of any suitable material that is able to filter out the visible light of certain wavelength and/or shift the invisible spectrum. For example, in some implementations, the first and/or second filtering layers 310 and 320 may be implemented as absorptive filters to which various inorganic or organic compounds are added. The compounds are used to absorb visible light of certain wavelengths while allowing the invisible light to transmit. Examples of the compounds include oxide. In one implementation, the compounds are added on a glass substrate. Alternatively, the compounds can also be added to plastic (often polycarbonate or acrylic) to produce gel filters which are lighter and cheaper than glass-based filters. In yet another implementation, resin can be used to form the first and/or second filtering layers 310 and 320.

[0023] Alternatively, or in addition, the first and/or second filtering layers 310 and 320 can be implemented as interference filters. Optical coatings with different refractive indexes are built up upon a substrate which can be made of glass or resin, for example. The interfaces between the coating layers of different refractive index produce phased reflections, thereby selectively reinforcing the invisible light and interfering with the visible light. The coating layers can be added by vacuum deposition. By controlling the thickness and number of the coating layers, the wavelength of the passband of the filtering layers can be tuned and made as wide or narrow as desired. Any other suitable implementations are possible as well.

[0024] By filtering and processing the light by the optical filtering portion 102 as described above, the camera 112 of the primary device 110 may sense and capture light in the form of visible light but carrying the information of invisible light. In this way, even the cameras off-the-shelf can be directly used with the accessory equipment 100. As known, the conventional cameras usually include internal filters which filter out the infrared light. By shifting the infrared light into the range of visible spectrum, the cameras are able to directly sense and process the received light within the visible spectrum, such that the imaging process can be correctly completed.

[0025] In the meantime, by generating images based on the essentially invisible light, noise in the ambient light can be significantly reduced or eliminated. FIG. 4 shows an example image 400 of a user's hand which is generated by the camera 112 of the primary device 110 based on the light filtered by the optical filtering portion 102 of the accessory equipment 100. The image 400 can be considered an infrared image. It can be seen that the infrared image contains little noise. The primary device 110 can apply gesture recognition process on the image 400 to recognize the user's gesture and act accordingly. It would be appreciated that compared with normal images obtained directly based on the ambient light, the foreground and background can be separated more accurately and efficiently by use of the infrared image 400, which in turn improves accuracy and efficiency of the gesture recognition.

[0026] FIG. 5 shows a block diagram of a primary device 110 that can be used with the accessory equipment 100 in accordance with implementations of the subject matter described herein. Examples of the primary device 110 include, but are not limited, a multifunction portable device such as a mobile phone or a tablet computer, a desktop personal computer (PC), or the like. It is to be understood that the primary device 110 is not intended to suggest any limitation as to scope of use or functionality of the subject matter described herein, as various implementations may be implemented in diverse general-purpose or special-purpose computing environments.

[0027] As shown, the primary device 110 includes at least one processing unit (or processor) 510, a memory 520, storage 530, one or more input devices 540, one or more output devices 550, and one or more communication connections 560. Specifically, the input device(s) 540 may include the camera 112. The camera 112 is configured to receive light filtered by the optical filtering portion 102 of the accessory equipment 100 and to generate one or more images such as infrared images. The processing unit 510 executes computer-executable instructions and may be a real or a virtual processor. The processing unit 510 may recognize the user's gestures based on the generated one or more images. Then the processing unit 150 may control the primary device 110 to act according to the recognized gestures.

[0028] The one or more output devices 550 may include a display 552 such as a touch- sensitive screen display for rendering content to the user. In some implementations, the screen area of the display 552 may be divided into at least two parts for rendering VR content for the right and left eyes of the user, respectively. The primary device 110 can be put into the accessory equipment 100 which can be manufactured as a container, as shown in FIGS. 2A-2C. The user may wear the accessory equipment 100 as a headset and view the VR content through at least one viewing hole of the accessory equipment 100, such as the viewing holes 124 and 126. The camera 112 is aligned with the camera hole 128 covered by the optical filtering portion 102. In this way, the user may perform gestures using his/her fingers to operate the primary device 110. For example, the user is enabled to manipulate the VR content rendered on the display 552.

[0029] The storage 530 may be removable or non-removable, and may include computer-readable storage media such as flash drives, magnetic disks or any other medium which can be used to store information and which can be accessed within the primary device 110. The communication connection(s) 560 enables communication over a communication medium to another computing entity. Additionally, functionality of the components of the primary device 110 may be implemented in a single computing machine or in multiple computing machines that are able to communicate over communication connections.

[0030] In the examples described above, the accessory equipment 100 and primary device 110 are separate from one another. In alternative implementations, the accessory equipment 100 and primary device 110 can be integrated into a single device. That is, the subject matter as described herein can be implemented as either separate devices or an integrated device or system such as a headset that has all the necessary elements built therein. FIG. 6 shows a block diagram of such a headset in accordance with implementations of the subject matter described herein.

[0031] As shown, the headset 600 includes not only the camera 112 but also the optical filtering portion 102. The user can directly wear the headset 600 without assembling the primary device with the accessory equipment as in the example shown in FIGS. 1 and 2A-2C. In some implementations, the filtering layers 310 and 320 of the optical filtering portion 102 may be removably attached to lens of the camera 112. It is also possible to directly integrate the filtering layers 310 and 320 into the lens of the camera 112. Further, the headset 600 may include a display 610 for rendering content such as VR content, a processing unit (not shown) for controlling the elements of the headset 600, and any other necessary elements. In this example, the display 610 and the camera 112 are located at opposite sides of the headset 610. Other arrangements are possible depending on the shape factor of the headset 600. It is to be understood that all the features described above with reference to FIGS. 1-5 apply to the example shown in FIG. 6.

[0032] FIG. 7 shows a flowchart of a method 700 for using the accessory equipment in accordance with implementations of the subject matter described herein. As shown, in step 710, visible light is filtered out from ambient light to obtain invisible light. Next, in step 720, the spectrum of the invisible light is shifted to a range of visible spectrum. As described above, the shifted invisible light can be used by the camera to generate an image with reduced noise for recognizing a gesture. In this way, accuracy of the gesture recognition is improved. It is to be understood that all the features described above with reference to FIGS. 1-6 apply to the method 700 shown in FIG. 7.

[0033] The functionally described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-Programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

[0034] Program code for carrying out methods of the subject matter described herein may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

[0035] In the context of this disclosure, a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

[0036] Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the subject matter described herein, but rather as descriptions of features that may be specific to particular implementations. Certain features that are described in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable sub-combination.

[0037] Some example implementations of the subject matter described herein are listed below.

[0038] In some implementations, a system is provided. The system comprises an optical filtering portion including a first filtering layer for filtering out visible light from ambient light to obtain invisible light, and a second filtering layer for shifting spectrum of the invisible light to a range of visible spectrum. The system further comprises a camera configured to generate an image based on the shifted invisible light in the range of visible spectrum; a display configured to render content to a user; and a processor configured to recognize a gesture performed by the user based on the generated image and to control the rendered content based on the recognized gesture.

[0039] In some implementations, the first and second filtering layers are integrated as a single film. In some implementations, the first filtering layer is a first film, and wherein the second filtering layer is a second film that is different from the first film. The first and second films fit to each other. In some implementations, at least one of the first and second filtering layers is made of glass, resin, or gel. In some implementations, at least one of the first and second filtering layers includes optical coatings with different refractive indexes for reinforcing the invisible light and interfering with the visible light. In some implementations, at least one of the first and second filtering layers includes a compound for absorbing the visible light and transmitting the invisible light.

[0040] In some implementations, the display and the camera are located at opposite sides of a device. In some implementations, the content rendered on the display includes virtual reality (VR) content. In some implementations, a screen area of the display is divided to a plurality of parts for rendering the VR content.

[0041] In some implementations, equipment for use with a camera is provided. The equipment comprises a first filtering layer for filtering out visible light from ambient light to obtain invisible light; and a second filtering layer for shifting spectrum of the invisible light to a range of visible spectrum, where the shifted invisible light is captured and used by the camera to generate an image with reduced noise for recognizing a gesture.

[0042] In some implementations, at least one of the first and second filtering layers includes a film. In some implementations, the first and second filtering layers are integrally formed as a film. In some implementations, the first filtering layer is a first film, and wherein the second filtering layer is a second film that is different from the first film. The first and second films fit to each other. In some implementations, at least one of the first and second filtering layers is made of glass, resin, or gel. In some implementations, at least one of the first and second filtering layers includes optical coatings with different refractive indexes for reinforcing the invisible light and interfering with the visible light. In some embodiments, at least one of the first and second filtering layers includes a compound for absorbing the visible light and transmitting the invisible light.

[0043] In some implementations, the equipment comprises a container for containing a multifunction portable device including the camera, the container having a camera hole that allows the camera to capture the shifted invisible light, the first and second filtering layers covering the camera hole. In some implementations, the equipment is a wearable headset that allows a user to control the multifunction portable device by the gesture. In some implementations, the equipment further comprises at least one viewing hole that allows a user to view content rendered on a display of the multifunction portable device.

[0044] In some implementations, a method is provided. The method comprises filtering out visible light from ambient light to obtain invisible light; and shifting spectrum of the invisible light to a range of visible spectrum, the shifted invisible light being captured by the camera for recognizing a gesture.

[0045] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.