Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
WIRELESS FULL BODY MOTION CONTROL SENSOR
Document Type and Number:
WIPO Patent Application WO/2017/061890
Kind Code:
A1
Abstract:
A portable apparatus for three-dimensional sensing comprises: (i) a depth-sensing module for obtaining depth images; (ii) a processing module for processing the depth images to generate user-tracking data; (iii) a communication module configured to transmit the user-tracking data to a remote or host device; and (iv) a housing for housing the depth-sensing module, the processing module, and the communication module. The user-tracking data includes three-dimensional skeletal tracking data and user segmentation data. The user-tracking data may optionally include user gesture data. Thus, the apparatus provides internal processing of depth images to generate corresponding tracking data for further communication. It allows for reducing the bandwidth of a communication link between the apparatus and the remote device, and enhancing functionality of the apparatus, because the remote device can use the received tracking data without any further additional processing.

Inventors:
VALIK ANDREY VLADIMIROVICH (RU)
STOLYAR ALEXEY VALERIEVICH (RU)
MOROZOV DMITRY ALEKSANDROVICH (RU)
Application Number:
PCT/RU2015/000654
Publication Date:
April 13, 2017
Filing Date:
October 08, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
3DIVI COMPANY (RU)
International Classes:
G06F3/0346; G06F3/0488; G06T13/00; G06V10/422
Foreign References:
US20150212585A12015-07-30
US20080212836A12008-09-04
US20030235341A12003-12-25
Attorney, Agent or Firm:
SVETLOV, Ilya Aleksandrovich (RU)
Download PDF:
Claims:
CLAIMS

1. An apparatus for three-dimensional sensing, the apparatus comprising:

a depth-sensing module configured to obtain a set of depth images;

a processing module configured to process the set of depth images to generate user-tracking data, wherein the user-tracking data includes three-dimensional skeletal tracking data associated with one or more users, and wherein the user-tracking data further includes user segmentation data associated with each of the users;

a communication module configured to transmit the user- tracking data to at least one remote device; and

a housing, wherein the depth-sensing module, the processing module, and the communication module are secured within the housing.

2. The apparatus of claim 1, wherein the three-dimensional skeletal tracking data includes a plurality of three-dimensional skeletal joint locations associated with each of the users.

3. The apparatus of claim 1, wherein the plurality of three- dimensional skeletal joint locations are associated with at least seventeen skeletal joints of each of the users.

4. The apparatus of claim 1, wherein the three-dimensional skeletal tracking data has a frame rate of at least 30 frames per second.

5. The apparatus of claim 1, wherein the processing module is further configured to process the set of depth images to identify gestures of each of the users and generate user gesture data, wherein the user-tracking data includes the user gesture data.

6. The apparatus of claim 1, wherein the processing module is configured to identify only basic user gestures, wherein the basic user gestures include at least one of a wave gesture, a static gesture, and a virtual touch screen gesture.

7. The apparatus of claim 1, wherein the processing module is further configured to process the set of depth images to determine a three-dimensional location of a Head-Mounted Display (HMD) within a three-dimensional environment, wherein the user-tracking data includes the three-dimensional location of the HMD.

8. The apparatus of claim 1, wherein the processing module is further configured to process the set of depth images to generate six- degree of freedom (6DoF) motion data associated with the HMD, wherein the user-tracking data includes the 6DoF motion data.

9. The apparatus of claim 8, wherein the generation of the 6DoF motion data is further based on orientation data obtained from the HMD, wherein the orientation data include pitch, yaw, and roll data associated with motions of the HMD.

10. The apparatus of claim 1, wherein the processing module is further configured to process the set of depth images to generate three- dimensional reconstruction data of a three-dimensional environment within a field of view of the apparatus, and wherein the

communication module is further configured to transmit the three- dimensional reconstruction data of the three-dimensional environment to the at least one remote device.

11. The apparatus of claim 1, wherein the processing module is further configured to process the set of depth images to generate three- dimensional reconstruction data of one or more tangible objects, and wherein the communication module is further configured to transmit the three-dimensional reconstruction data of one or more tangible objects to the at least one remote device.

12. The apparatus of claim 1, wherein the processing module is further configured to compress the user-tracking data prior to transmission to the at least one remote device.

13. The apparatus of claim 1, wherein the user-tracking data includes cross-platform user interface control commands.

14. The apparatus of claim 1, wherein the user-tracking data includes control commands for virtual reality applications.

15. The apparatus of claim 1, wherein the communication module is configured to wirelessly communicate with the at least one remote device.

16. The apparatus of claim 1, wherein the communication module is configured to communicate with the at least one remote device using a Bluetooth protocol or IEEE 802.11 protocol.

17. The apparatus of claim 1, wherein the communication module is configured to communicate with the at least one remote device using Human Interface Device (HID) protocol.

18. The apparatus of claim 1, wherein the communication module is configured to connect with the at least one remote device via a wire.

19. The apparatus of claim 1, wherein the communication module is configured to transmit the set of depth images to the at least one remote device.

20. The apparatus of claim 1, further comprising a power supply module configured to supply electrical power to the depth-sensing module, the processing module, and the communication module, wherein the power supply module is secured within the housing.

21. The apparatus of claim 20, wherein the power supply module is further configured to receive one or more batteries.

22. The apparatus of claim 1, wherein the depth-sensing module includes one or more time-of-flight sensors.

23. The apparatus of claim 1, wherein the depth-sensing module includes one or more digital video cameras or one or more stereo cameras.

24. The apparatus of claim 1, wherein the depth-sensing module includes one or more projectors for projecting structured light patterns and one or more image sensors for sampling reflected light patterns.

25. The apparatus of claim 1, further comprising at least one video camera for capturing still or motion images of a three-dimensional environment or a tangible object, and wherein the communication module is further configured to transmit the still or motion images to the at least one remote device.

Description:
WIRELESS FULL BODY MOTION CONTROL SENSOR

TECHNICAL FIELD

This disclosure relates generally to human-computer interfaces involving depth sensing. More particularly, this disclosure relates to a portable apparatus for three-dimensional (3D) sensing, also referred herein to as a wireless full body motion control sensor, which is configured to internally process depth images to generate meaningful data or commands that can be further communicated to a remote (host) device without a need for the remote device to perform any additional processing.

DESCRIPTION OF RELATED ART

The approaches described in this section could be pursued, but are not necessarily approaches that have previously been conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.

The industry of human-computer interaction has been significantly evolved over the last several decades. Today, many electronic devices, such as computers and game consoles, can be controlled through a wide range of various input devices and associated interfaces. Keyboards, keypads, pointing devices, joysticks, remote controllers, and touchscreens are just some examples of input devices that can be used to interact with the electronic devices. One of the rapidly growing technologies in the field of human- computer interaction is the gesture recognition technology, which enables users to interact with electronic devices naturally, using body language rather than mechanical devices. The gesture recognition technology allows users to generate computer-readable inputs or commands using gestures or motions of hands, arms, fingers, legs, and many other body parts. For example, using the concept of gesture recognition, it is possible to point a finger at a computer screen and move the finger to cause a cursor displayed on the computer screen to move accordingly.

The gesture recognition technology is based on the use of depth maps generated by a depth-sensing device (or simply a depth sensor). The depth maps may be processed and interpreted by a control system, such as a computer or a game console, to generate various commands based on identification of user gestures or motions. The gesture recognition technology is successfully used in gaming software applications and virtual reality applications. In either example, depth maps may be processed to generate a user avatar and translate user motions or gestures into motions and gestures of the avatar being displayed on a display screen.

Conventional depth sensors are generally simple in nature, because they merely produce depth images and transmit "raw" depth data to a connected computing device for further processing. The computing device can use the raw depth data to identify users and track their motions. This approach, however, is not suitable for those instances where the depth sensor is connected to a smart phone or a table computer, because these devices may not have enough computational or memory resources to run processing algorithms against depth data, especially when the depth data is of high definition. These circumstances place significant constraints on the use of traditional depth sensors.

Another disadvantage of conventional depth sensors is that they are not fully portable, because they require permanent connection to a power supply and a computing user for transmitting obtained depth data. For these reasons, conventional depth sensors are maintained nearby computing devices, such as game consoles or smart television devices. In view of at least the foregoing drawbacks, there is still a need in the art for improvements of depth sensors.

SUMMARY

This summary is provided to introduce a selection of concepts in a simplified form that are further described in the Detailed Description below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

The present disclosure discloses a portable apparatus for three- dimensional sensing also referred herein to as a wireless full body motion control sensor or a depth sensor, for simplicity. One objective of this apparatus is to overcome at least some of the drawbacks known in the art. More specifically, one objective is to provide a portable apparatus for three- dimensional motion sensing configured to obtain depth images and process them internally to generate meaningful commands or signals such that there is no need for a recipient of these commands or signals to process the depth images. Another objective is to enhance functionality and achieve complete portability. Other objectives and benefits shall be apparent from the reading of the following description.

According to an aspect of this disclosure, the portable apparatus for three-dimensional sensing comprises: (i) a depth-sensing module configured to obtain a set of depth images; (ii) a processing module configured to process the set of depth images to generate user-tracking data, wherein the user- tracking data includes three-dimensional skeletal tracking data associated with one or more users and user segmentation data; (iii) a communication module configured to transmit the user-tracking data to at least one remote device; and (iv) a housing for housing the depth-sensing module, the processing module, and the communication module.

In some embodiments, the three-dimensional skeletal tracking data includes a plurality of three-dimensional skeletal joint locations associated with each of the users. This plurality of three-dimensional skeletal joint locations can be associated with at least seventeen skeletal joints of each of the users. Moreover, in certain embodiments, the three-dimensional skeletal tracking data can have a frame rate of at least 30 frames per second.

In some embodiments, the processing module is further configured to process the set of depth images to identify gestures of each of the users and generate user gesture data, wherein the user-tracking data includes the user gesture data. In some embodiments, the processing module is configured to identify only basic user gestures, wherein the basic user gestures include at least one of a wave gesture, a static gesture, and a virtual touch screen gesture.

In some embodiments, the processing module is further configured to process the set of depth images to determine a three-dimensional location of a Head-Mounted Display (HMD) within a three-dimensional environment, wherein the user-tracking data includes the three-dimensional location of the HMD. The processing module can be further configured to process the set of depth images to generate six-degree of freedom (6DoF) motion data associated with the HMD, wherein the user-tracking data includes the 6DoF motion data. The generation of the 6DoF motion data can be based on orientation data obtained from the HMD, wherein the orientation data include pitch, yaw, and roll data associated with motions of the HMD.

In some embodiments, the processing module is further configured to process the set of depth images to generate three-dimensional reconstruction data of a three-dimensional environment within a field of view of the apparatus, and wherein the communication module is further configured to transmit the three-dimensional reconstruction data of the three-dimensional environment to the at least one remote device.

In yet more embodiments, the processing module can be further configured to process the set of depth images to generate three-dimensional reconstruction data of one or more tangible objects, and wherein the communication module is further configured to transmit the three- dimensional reconstruction data of one or more tangible objects to the at least one remote device.

In some embodiments, the processing module is further configured to compress the user-tracking data prior to transmission to the at least one remote device.

In certain embodiments, the user-tracking data includes cross-platform user interface control commands. The user-tracking data may include control commands for virtual reality applications.

In certain embodiments, the communication module is configured to wirelessly communicate with the at least one remote device. In yet other embodiments, however, the communication module can be also configured to connect with the at least one remote device via a wire. The communication module can be configured to communicate with the at least one remote device using a Bluetooth protocol or IEEE 802.11 protocol. The communication module can be also configured to communicate with the at least one remote device using Human Interface Device (HID) protocol. The communication module can be also configured to transmit the set of depth images to the at least one remote device.

In certain embodiments, the apparatus further comprises a power supply module configured to supply electrical power to the depth-sensing module, the processing module, and the communication module. The power supply module can be secured within the housing. In some embodiments, the power supply module is further configured to receive one or more batteries.

In certain embodiments, the depth-sensing module can use one or more time-of-flight sensors, structured light projectors and image sensors for sampling reflected light patterns, stereo cameras, or any other type of technology for depth sensing

In certain embodiments, the apparatus further comprises at least one video camera for capturing color still or motion images of a three-dimensional environment or a tangible object, and wherein the communication module is further configured to transmit the color still or motion images to the at least one remote device.

The technical effects of the portable apparatus include enhancing functionality of the apparatus, making the apparatus fully portable and wireless, and decrease of the size of data transmitted via a communication link between the apparatus and a remote device. The enhancement of functionality is achieved by providing embedded processing of depth images to generate various computer-readable commands or signals rather than sending unprocessed data, by making the apparatus compatible with HMD, by ensuring that generated commands are cross-platform commands, and by making the apparatus suitable for communication with a wide range of remote or host devices.

Additional objects, advantages, and novel features of the examples will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following description and the accompanying drawings or may be learned by production or operation of the examples. The objects and advantages of the concepts may be realized and attained by means of the methodologies, instrumentalities and combinations particularly pointed out in the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS Embodiments are illustrated by way of example, and not by limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 shows a perspective view of a depth sensor according to one example embodiment;

FIG. 2 shows an example three-dimensional environment illustrating suitable uses of a depth sensor in conjunction with other electronic devices;

FIG. 3 shows an example user-centered coordinate system suitable for tracking user motions according to one embodiment;

FIG. 4 shows a simplified view of a virtual skeleton as can be generated by a depth sensor; and

FIG. 5 shows a high-level block diagram of depth sensor and computing environment.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

The embodiments of this disclosure relate to a portable apparatus for three-dimensional sensing, which can be used as a three-dimensional motion sensing input device for computer-human interactions based on gesture commands. The apparatus is configured to generate depth images of a 3D environment and process them internally to produce user-tracking data, which include 3D skeletal tracking data, user segmentation data, and optionally user gesture data. The apparatus can be completely portable, wireless, power-independent, and having a small form factor.

The user-tracking data can serve as input commands and transmitted by the apparatus to one or more remote devices, such as game consoles, head- mounted displays (HMDs), computers, servers, or computing clouds.

Moreover, the user-tracking data can be compressed before sending to the remote devices. Thus, there is no need for the remote device to perform processing of "raw" depth images received from a conventional depth sensor to produce any meaningful information for software applications. Instead, the apparatus has embedded processing capabilities and performs all necessary processing of depth images, which saves computational and storage resources of the remote devices, and also reduces the requirements for a bandwidth of the communication link between the apparatus and the remote device. The user-tracking data can be effectively used by the remote device for a computer game, virtual reality, augmented reality, or any other applicable software application.

Moreover, the apparatus can be effectively used in combination with HMDs worn by one or more users. In this case, the apparatus can determine a location and motion of the HMD and include this information into the user- tracking data for enhancing virtual reality or gaming experience. The apparatus can also be configured to identify and interpret certain user gestures to generate user commands and transmit them to the remote device and/or HMD. These user commands can be in accordance with HID protocol, which ultimately provides cross-platform interfacing. Thus, the apparatus can be used with computing devices having any operational system such as Windows ® , Android ® , and iOS ® .

In this disclosure, the term "apparatus for three-dimensional sensing" also referred herein to as a "depth sensor," for simplicity. These terms can also be substantial equivalents of the following terms: a depth-sensing device, depth sensitive camera, 3D camera, motion sensing input device, motion controller, 3D motion sensor, and so forth. The term "depth image," as used herein, may refer to an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint, i.e. from the depth sensor. This term may be also analogous to or be referred to a depth map, depth buffer, Z- buffer, Z-buffering, Z-depth, and so forth. The "Z" in these latter terms relates to a convention that the central axis of view of depth sensor is in the direction of the sensor's Z-axis (normal to XY-plane), and may not relate to the absolute Z-axis of a scene. Accordingly, in this disclosure, the depth images may include a plurality of pixel values {X, Y, Z}, where X and Y represent values of two-dimensional (2D) coordinates associated with XY orthogonal coordinate system, and where the Z represents a depth value for a given XY coordinate.

The term "remote device" or "host device," as used herein, may refer to any suitable computing system. Some examples of remote device may include a desktop computer, laptop computer, tablet computer, gaming console, audio system, video system, cellular phone, smart phone, personal digital assistant, set-top box, television set, smart television system, in- vehicle computer, infotainment system, HMD, head-coupled display, helmet- mounted display, cardboard HMD, wearable computer having a display (e.g., a head-mounted computer with a display or a smart watch), and so forth. In certain embodiments, the remote device may refer to an aggregation of multiple computing devices, such as a game console connected to a television device, and the like.

The following description includes references to the accompanying drawings. The drawings show illustrations in accordance with example embodiments. These example embodiments, which are also referred to herein as "examples," are described in enough detail to enable those skilled in the art to practice the present subject matter. FIG. 1 shows a perspective view of depth sensor 100 (also referred herein to as "apparatus for three-dimensional sensing") according to one example embodiment. As shown in this figure, depth sensor 100 can be implemented as a stand-alone assembly having a housing 105, where all internal elements are secured. Incorporation of all the elements into a single housing enhances the overall usability and functionality of the device.

Housing 105 can be of any suitable material, including, for example, polymeric materials or metals. Housing 105 is also supported by stand 110 having substantially a V-shaped design. In certain embodiments, there can be provided a joint connection between housing 105 and stand 110 such that a user can adjust a position of housing 105. The front face of housing 105 is provided with transparent plate 115 to prevent blinding or damaging of internal sensing modules or other elements. Housing 110 can also include one or more openings or cavities 120, behind which one or more optical or sensing modules can be located. As shown in this figure, cavities 120 have a cone-like shape serving as lens hoods to reduce lens flare and protect from damaging. Depth sensor 100 can also be provided with light indicators, buttons, selectors, ports, connectors, and any other interface elements.

FIG. 2 shows an example three-dimensional environment 200 (or scene) illustrating suitable uses of a depth sensor in conjunction with other electronic devices. In particular, there is shown user 205, who optionally wears HMD 210. User 2015 is present within a field of view of depth sensor 100. Since depth sensor 100 include a depth-sensing module, user 205 can be present in depth images generated by the depth-sensing module. In certain

embodiments, depth sensor 100 may optionally include a digital video camera to assist in tracking the user 205, identify his motions, emotions, etc. User 205 may stand on a floor (not shown) or on an omnidirectional treadmill (not shown). Depth sensor 100 may also receive "three-degrees of freedom" (3DoF) orientation data from HMD 210 as generated by its internal orientation sensors (e.g., gyros, accelerometers and/or magnetometers). The 3DoF data provides rotational information of HMD 210 including tilting of the HMD forward/backward (pitching), turning left/right (yawing), and tilting side-to- side (rolling).

Depth sensor 100 may be in communication with one or more remote devices, such as game console 220 and display device 230. In some embodiments, the remote devices may include, but not limited to, computers, cellular telephones, smart phones, entertainment systems, modem, router, infotainment system, and so forth. The communication between depth sensor 100 and any of the remote devices, and between depth sensor 100 and HMD 210 can be wired or wireless. For example, Bluetooth radio, Wi-Fi radio, IEEE 802.11-based or Ethernet based communication can be established.

According to yet more embodiments, user 205 may optionally hold and use one or more input devices to generate commands for depth sensor 100, game console 220, or any other remote device. As shown in the figure, user 205 may hold handheld device 240 (such as a gamepad, smart phone, remote control, etc.) to generate specific inputs or commands, for example, shooting or moving commands in case user 205 plays a video game. Handheld device 240 may also wirelessly transmit data and user inputs to depth sensor 100 and/or game console 220 for further processing.

In certain embodiments, handheld device 240 may also include one or more sensing devices (gyros, accelerometers and/or magnetometers) for generating 3DoF orientation data. The 3DoF orientation data may be transmitted to depth sensor 100 for further processing. In certain

embodiments, depth sensor 100 may determine a location and orientation of handheld device 240 within a user-centered coordinate system or any other secondary coordinate system.

In certain embodiments, depth sensor 100 may also simulate (or facilitate simulation by a remote device) a virtual reality and generate a virtual world. Depth sensor 100 may determine a location and/or orientation and/or motion of user head, arms, legs, body, and based on this information cause rendering a corresponding graphical representation of field of view and transmits it to HMD 210 for presenting to user 205. In other words, HMD 210 can display the virtual word to user 205, where the movement and gestures of the user or his body parts are tracked by depth sensor such that any user movement or gesture is translated into a corresponding movement of user 205 within the virtual world. For example, if user 205 wants to go around a virtual object, user 205 needs to make a circle movement in the real world, and while he makes this movement, depth sensor 100 acquires depth images, processes them to generate user-tracking data, and send this data to a remote device, such as game console 220 or HMD 210, for displaying the virtual reality reflecting motions of the user in real world.

This technology may also be used to generate a virtual avatar of the user based on user-tracking data generated by depth sensor 100. The avatar can be also presented to user 205 via HMD 210. Accordingly, user 205 may play third-party games, such as third party shooters, and see his avatar making translated movements and gestures from the sidelines.

In yet more embodiments, depth sensor 100 may determine a user height or a distance between HMD 210 and a floor. The information allows for more accurate simulating of a virtual floor. One should understand that the present technology may be also used for other applications or features of virtual reality simulation or augmented reality simulation. For those skilled in the art it should be clear that 3D environment 200 may include more than one user 205. Accordingly, if there are several users 205, depth sensor 100 may identify each user separately and track their movements and gestures independently.

Still referring to FIG. 2, depth sensor 100 can be configured to provide

3D reconstruction of environment 200 or any tangible objects within the field of view of depth sensor 100. The 3D reconstruction means the process of capturing the shape of any real projects or their parts. According to some embodiments, depth sensor 100 can acquire depth images of environment 200 or any objects and process them to build corresponding geometric 3D models or representations. These geometric 3D models can be then communicated to one or more remote devices. The option of 3D reconstruction enhances the overall functionality of depth sensor and makes it suitable for many applications.

FIG. 3 shows an example user-centered coordinate system 310 suitable for tracking user motions. User-centered coordinate system 310 may be created by depth sensor 100 at initial steps of depth image processing. In particular, once user 205 appeared in front of depth sensor 100 being in an activated state, depth sensor 100 processes depth images and generates a virtual skeleton (see FIG. 4) of the user and further tracks motions of the user. If the depth sensor 100 has low resolution, it may not reliably identify HMD 210 worn by user 205. In this case, the user may need to make an input (e.g., a voice command) to inform depth sensor 100 that user 205 has HMD 210. Alternatively, user 205 may need to make a gesture (e.g., a nod motion or any other motion of the user head) to indicate that there is HMD 210. In this case, depth sensor 100 processes the depth images generated by depth sensor 100 to retrieve motion data associated with the gesture ("first motion data"), and depth sensor 100 also receives from HMD 210 provides motion data related to the same gesture as generated by internal sensors of HMD 210 ("second motion data"). By comparing the first and second motion data, depth sensor 100 may unambiguously identify that the user wears HMD 210. Further, HMD 210 may be assigned with coordinates of those virtual skeleton joints that relate to the user head. Thus, the initial location of HMD 10 may be determined by depth sensor 100.

In some embodiments, depth sensor 100 may also identify an orientation of HMD 210 to improve the overall functionality. This may be performed by a number of different ways. In an example, the orientation of HMD 210 may be bound to the orientation of the user head or the line of vision of user 205. The orientation of user head may be determined by analysis of coordinates related to specific virtual skeleton joints. Alternatively, the line of vision or user head orientation may be determined by processing RGB-images of the user taken by a video camera, which processing may involve locating pupils, nose, ears, etc. In yet another example, the user may need to make a predetermined gesture such a nod motion or user hand motion. By tracking motion data associated with such predetermined gestures, depth sensor 100 may identify the user head orientation. In yet another example embodiment, the user may merely provide a corresponding input (e.g., a voice command or text input) to identify the orientation of HMD 210.

Thus, the orientation and location of HMD 210 may became known to depth sensor 100. The user-centered coordinate system 310, such as 3D Cartesian coordinate system, may be then bound to these initial orientation and location of HMD 210. For example, as shown in FIG. 3, the origin of the user-centered coordinate system 210 may be set to the instant location of HMD 210. Direction of axes of the user-centered coordinate system 310 may be bound to the user head orientation or the line of vision. For example, the axis X of the user-centered coordinate system 310 may coincide with the line of vision 320 of the user. Further, the user-centered coordinate system 310 is fixed and all successive motions and movements of user 205 and HMD 210 are tracked with respect to this fixed user-centered coordinate system 210. It should be noted that in certain embodiments, an internal coordinate system used by HMD 210 may be bound or coincide with the user-centered coordinate system 210. In this regard, the location and orientation of HMD 210 may be further tracked in one and the same coordinate system.

FIG. 4 shows a simplified view of an exemplary virtual skeleton 400 as can be generated by depth sensor 100 by processing depth images. As shown in the figure, virtual skeleton 400 comprises a plurality of virtual "joints" 410 (also known as "skeleton key points") interconnecting virtual "bones". The bones and joints, in combination, may represent user 205 in real time so that every motion, movement or gesture of the user can be corresponded to corresponding motions, movements or gestures of the bones and/or joints. As will be discussed below, coordinates of skeleton joints 410 constitute 3D skeletal tracking data, which is a part of user-tracking data.

According to various embodiments, each of joints 410 may be associated with certain coordinates in a coordinate system defining its exact location within the 3D space. Hence, any motion of the user's limbs, such as an arm or head, may be interpreted by a plurality of coordinates or coordinate vectors related to the corresponding joint(s) 410. By tracking user motions utilizing the virtual skeleton model, motion data can be generated for every limb movement. This motion data may include exact coordinates per period of time, velocity, direction, acceleration, and so forth.

Virtual skeleton 400 may generally include any reasonable number of skeletal joints. In one example, there can be at least seventeen skeletal joints. For some applications, it is more preferably, to include at least nineteen skeletal joints. These skeletal joints can represent head, neck, collar, left and right shoulders, right and left elbows, left and right wrist, waist, pelvis, right and left hips, right and left knees, right and left ankles. In other embodiments, virtual skeleton 400 includes at least nineteen skeletal joints, which include all of the above joints and right hand and left hand. In yet more embodiments, virtual skeleton 400 includes at least twenty-one skeletal joints, which include all of the above joints and also right foot and left foot. If there are multiple users 205 in the scene, depth sensor 100 can generate 3D skeletal tracking data for each of the users.

In certain embodiments, when user 205 wears HMD 210, depth sensor

100 can determine this fact and then assign the location (coordinates) of HMD 210. In this case, a corresponding label can be associated with the virtual skeleton 400. According to various embodiments, depth sensor 100 can acquire orientation data of HMD 210. As discussed above, the orientation data of HMD 210 may be determined by one or more internal sensors of HMD 210 and then transmitted to depth sensor 100. In this case, the orientation of HMD 210 may be represented as a vector 420 as shown in FIG. 4. Similarly, depth sensor 100 may further determine a location and orientation of handheld device 240 held by user 205 in one or two hands. The orientation of handheld device 240 may be also presented as one or more vectors.

FIG. 5 shows a high-level block diagram of depth sensor 100 and computing environment 500. As shown in this figure, there is provided depth sensor 100, which comprises at least one depth-sensing module 510 configured to dynamically capture depth images. In various embodiments, depth-sensing module 510 may include an infrared (IR) projector for generating modulated or structured light patterns and an IR camera for sampling reflected light patterns. Alternatively, depth-sensing module 510 may include one or more stereo cameras for sampling stereo images and calculating depth maps. In yet another embodiment, depth-sensing module 510 may include time-of-flight sensor.

In some embodiments, depth-sensing module 510 can be a high definition (HD) depth-sensing device, which provides resolution of depth images at least 600 x 600 pixels, and more preferably at least 1,280 x 720 pixels. Moreover, depth-sensing module 510 can generate depth images at the rate of 30 frames per second (fps), and more preferably at the rate of at least 60 fps. Accordingly, user-tracking data generated based on depth images can also have refreshing rate of at least 30 fps.

In some embodiments, depth sensor 100 may optionally include proximity-sensing module 520, which can detect the presence of a user within a predetermined range from depth sensor 100. When a user is detected within a predetermined distance from the depth sensor 100, depth-sensing module 510 can be activated to generate depth images. Otherwise, when there are no users within a predetermined distance from the depth sensor 100, depth- sensing module 510 can be in deactivated state to reduce power consumption.

In yet more embodiments, depth sensor 100 may optionally include one or more color video cameras 520 for capturing a series of two- dimensional (2D) RGB-images in addition to 3D imagery already created by depth-sensing module 510. The series of 2D images captured by the color video camera 520 may be used to facilitate identification of users, user gestures, facilitate identification of user emotions, and so forth. In yet more embodiments, only color video camera 520 can be used without depth- sensing module 510.

Furthermore, depth sensor 100 may also comprise processing module

540 such as a computer, processor, controller, Central Processing Unit (CPU), ASIC logic, and the like . Processing module 540 may also include a memory for storing processor-readable instructions or software configured to implement the functionality described herein. Processing module 540 is generally configured to process depth images, 3DoF data, user inputs, voice commands, and determining 6D0F location and orientation data of HMD 210 and optionally location and orientation of handheld device 240 as described herein. According to various embodiments of this disclosure, processing module 540 is also configured to receive and process, in real time, depth images to produce user-tracking data, which includes 3D skeletal tracking data, user segmentation data, and optionally user gesture data. Accordingly, in some embodiments, processing module 540 is configured to identify users and recognize their motions and gestures. Once a gesture is recognized, its descriptor or identifier is generated and included in the user-tracking data. In some examples, processing module 540 can identify only basic user gestures, which include a wave gesture, static gesture, a virtual touch screen gesture, and the like. User segmentation means the process of determination whether a pixel of depth image corresponds to a specific user, or does not correspond to a user. Accordingly, the user segmentation data may simply include an array of coordinates that correspond to an identified user. The production of user-tracking data, which includes both user segmentation and 3D skeletal tracking data, enhances the functionality of depth sensor 100.

In yet more embodiments, processing module 540 can be also configured to process depth images to perform 3D reconstruction of 3D environment or at least one tangible object being in the field of view of depth sensor 100. Accordingly, 3D reconstruction data can be generated and further included into transmission to remote devices. This option also enhances the functionality of depth sensor 100.

In some embodiments, processing module 540 can be also configured to process depth images to generate computer input commands so as to effectively be used as a computer peripheral device. In other words, processing module 540 can identify and track user gestures, identify them and generate control commands corresponding to the gestures. For example, a user hand motion from right to left can cause generating a command for opening a next page. When a user points a finger on a screen, this can cause generating a pointer track command, similar to what is generated by computer mice. It shall be understood that there can be multiple

predetermined user gestures corresponded to certain computer inputs. In some embodiments, these computer inputs are provided in the format of Human Interface Device (HID) protocol (also known as USB-HID

specification). In some embodiments, similar HID class protocols can be used, such as Bluetooth HID, Serial HID, HID over I2C, and so forth. Thus, the user- tracking data can be provided in the format of HID standard, or, alternatively, HID-based commands are generated in lieu of or in addition to the user- tracking data. Thus, user tracking data can include cross-platform interface control commands, which allows depth sensor 100 to be used in conjunction with Windows ® devices, iOS ® devices, Android ® devices, and so forth.

In yet more embodiments, processing module 540 can be also configured to pre-process user-tracking data. The pre-processing may include compression and/or encryption. Compression may be required to reduce data traffic and reduce the need for high-bandwidth communication links. Data compression can vary depending on the type of information to be

transmitted. For example, the 3D skeletal tracking data can be subjected to high compression than user segmentation data. For these purposes, any data compression algorithms known in the art can be used.

According to yet more embodiments, processing module 540 can further configured to process depth images to determine a location and/or orientation of at least one HMD within a three-dimensional environment. With reference to FIG. 5, HMD can one of remote devices 590. The location and/or orientation of HMD can be further included into the user-tracking data and sent to HMD or any other remote device 590. This location and/or orientation of HMD can be further used in generating virtual or augmented reality.

Furthermore, processing module 540 can be configured to generate six- degree of freedom (6DoF) motion data associated with HMD 210. The 6DoF data can provide or assist in 360-degree full-body virtual reality simulation. Thus, the 6DoF data allow, for example, translating user motions and gestures into corresponding motions of a user's avatar in the simulated virtual reality.

The 6DoF data can be generated by processing module 540 as follows.

Processing module 540 dynamically receives and processes depth images generated by depth-sensing module 510. As a result of this processing, processing module 540 may identify one or more users present within the 3D scene, generate a virtual skeleton of each user, and optionally identify location and orientation of HMDs (if used). In some embodiments, HMD's location and orientation may be identified by processing depth images and/or RGB-images of the scene. In other embodiments, a user with HMD may need to make certain actions to assist processing module 540 to determine a location and orientation of the HMD. For example, the user may be required to make a user input or make a predetermined gesture informing processing module 540 of that there is HMD worn by the user. In certain embodiments, when a predetermined gesture is made, the depth images may provide corresponding first motion data related to this gesture, while HMD may send to depth sensor 100 corresponding second motion data related to the same gesture sensed by internal sensors. By comparing the first and second motion data, processing module 540 may identify that HMD is worn by the user, and thus known location of user head may be assigned to HMD. In other words, it may be established that the location of HMD is the same as the location of the user head. For these ends, coordinates of those virtual skeleton joints that relate to the user head may be assigned to HMD. Thus, the location of HMD may be dynamically tracked within the 3D environment by mere processing of depth images and corresponding 3DoF location data of the HMD may be generated. In particular, the 3DoF location data may include heave, sway and surge data related to a move of HMD within the 3D environment.

Further, processing module 540 may dynamically (i.e., in real time) combine the 3DoF orientation data and the 3DoF location data to generate 6DoF data representing the location and orientation of HMD within the 3D environment. The 6DoF may be then used by HMD or any other remote device 590 in simulation of virtual reality and rendering corresponding field of view images/video that can be displayed on the display device worn or attached to the user. In one example embodiment, the 3DoF orientation data and the 3DoF location data may relate to two different coordinate systems. In another example embodiment, both the 3DoF orientation data and the 3DoF location data may relate to one and the same coordinate system. In the latter case, processing module 540 may establish and fix the user-centered coordinate system prior to many operations discussed herein. For example, processing module 540 may set an origin of the user-centered coordinate system in the location of initial position of the user head based on the processing of depth images. The direction of the axes of this coordinate system may be set based on a line of vision of the user or user head orientation, which may be determined by a number of different approaches.

In one example, processing module 540 may determine an orientation of the user head based on depth images or RGB-images, which may be used for assuming the line of vision of the user. One of the coordinate system axes may be then bound to the line of vision of the user. In another example, a virtual skeleton of the user may be generated based on depth images. A relative position of two or more virtual skeleton joints (e.g., pertained to user shoulders) may be used for selecting directions of the coordinate system axes. In yet another example, the user may be prompted to make a gesture such as a motion of his hand in the direction from his head towards depth sensor 100. The motion of the user causes producing of motion data, which in turn may serve a basis for selection directions of the coordinate system axes. In yet another example, RGB-images can be used to identify user head elements such as pupils, nose, ears, etc. Based on position of these elements, processing module 540 may determine the line of vision and then set directions of the coordinate system axes based thereupon. Accordingly, once the user-centered coordinate system is set, all other motions of HMD may be tracked within this coordinate system making it easy to utilize 6DoF data generated later on.

More specifically, processing module 540 may generate 6DoF data of HMD based on a combination of 3DoF orientation data acquired from HMD and 3DoF location data obtained by processing of depth images. In one example, the depth images are processed to retrieve heave data, sway and surge data. The depth maps may be processed so as to create a virtual skeleton of the user including multiple skeletal joints associated with user legs and at least one skeletal joint associated with the user head. Accordingly, when the user walks/runs, the skeletal joints associated with user legs may be dynamically tracked and analyzed so as sway and surge data (2DoF location data) can be generated. Similarly, the skeletal joint(s) associated with the user head may be dynamically tracked and analyzed by processing of depth images so as heave data (IDoF location data) may be generated. Thus, the computing unit may combine heave, sway, and surge data to generate 3DoF location data. As discussed above, the 3DoF location data may be combined with the 3DoF orientation data acquired from HMD to create 6DoF data. The 3DoF orientation data includes pitch, yaw, and roll data associated with motions of HMD.

In some embodiments, processing module 540 may also generate virtual reality, i.e. render 3D images of virtual reality simulation, which images can be shown to a user via HMD. In yet other embodiments, processing module 540 may implement a computer game or any other software application. Further, processing module 540 may also generate a virtual avatar of a user and present it to the user via HMD.

Referring back to FIG. 5, depth sensor 100 also includes

communication module 550 configured to communicate with one or more remote devices 590, which can refer to at least one of the following: a host device, game console, HMD 210, handheld device 240, and so forth. The communication can be wired or wireless. More specifically, communication module 550 may be configured to receive orientation data from HMD, orientation data from handheld device, and transmit user-tracking data to one or more remote devices 590.

Depth sensor 100 may also include power supply module 560 configured to supply electrical power to all electrical modules of depth sensor 100. Power supply module 560 may be configured to receive external electrical power, transform it to levels and the forms suitable for consumption by modules of depth sensor 100, and distribute to the modules of depth sensor 100. For example, power supply module 560 may receive an alternating current 110 Volts and transform it into direct current 9 Volts. In another example, power supply module 560 may receive direct current 9 Volts and simply distribute it to the modules of depth sensor 100.

In certain embodiments, power supply module 560 may include a battery module for receiving one or more batteries (e.g., AA batteries). Thus, in this embodiment, depth sensor 100 may be power-independent, which further enhances functionalities of the depth sensor.

In yet more embodiments, power supply module 560 may include a voltage regulator circuit comprising at least a rectifier circuit for accepting power, filtering and converting the power into a direct current to be outputted. Power supply module 560 may also include uninterruptible power supply (UPS) module to provide emergency power, when the input power source fails.

Referring back to FIG. 5, depth sensor 100 may also include

communication bus 570 for interconnecting depth-sensing module 510, color video camera 520, proximity-sensing module 530, processing module 540, and communication module 550.

As discussed above, communication module 550 is configured to communicate with one or more remote devices 590 and/or one or more web servers 580. As discussed above, communication module 550 communicates user-tracking data, which include skeletal tracking data, user segmentation data, and optionally user gesture data. Communication module 550 can also communicate 3DoF or 6DoF data associated with HMD. Moreover, in some embodiments, communication module 550 can communicate computer control commands generated by processing module 540 in the HID format.

Communication module 550 can also communicate 3D reconstruction data. In some embodiments, some or all of the data to be communicated can be compressed by processing module 540 before transmission. Communication module 550 can also receive information from one or more remote devices 590 and/or one or more web servers 580. These may include 3DoF location or orientation data, control commands, setting instructions, and so forth.

According to some embodiments, depth sensor 100 can be also configured to transmit "raw" data to one or more remote devices 590 and/or one or more web servers 580. In one example, communication module 550 can transmit to a remote device 590 unprocessed depth images generated by depth-sensing module 510. This option can be used for setting or calibrating purposes.

In yet more embodiments, communication module 550 can transmit to one or more web servers 580 unprocessed depth images to allow web servers 580 processing the depth images to produce any meaningful information as discussed herein and deliver it to one or more remote devices 590. For example, depth sensor 100 produce depth images of a three-dimensional environment (e.g., a room) and send them to web server 580, which processes these images to generate 3D reconstruction data such as 3D geometric model of the three-dimensional environment.

Moreover, in some embodiments, communication module 550 can transmit to remote devices 590 and/or web servers 580 still or motion images captured by color video camera 530. These still or motion images (also referred to as RGB-images) can be unprocessed or pre-processed (e.g., compressed) before transmission. Thus, in this example, depth sensor 100 enhances its functionality by serving as a stand-alone video camera.

Communication module 550 can communicate with remote devices 590 using any suitable data transmission protocols. For example, HID protocol or proprietary protocol can be used for data transmission.

Still referring to FIG. 5, remote devices 590 can refer, in general, to any electronic device configured to receive and use user-tracking data received from depth sensor 100. Some examples of remote devices 590 include, but are not limited to, host devices, computers (e.g., laptop computers, tablet computers, desktop computers), cellular phones, smart phones, wearable computers, HMD, electronic handheld devices, smart watches, fitness trackers, displays, audio systems, video systems, gaming consoles, entertainment systems, home appliances, infotainment devices, robots, and so forth. The communication between depth sensor 100, remote devices 590, and and/or web servers 580 can be performed via a communications network. The communications network can be a wireless or wired network, or a

combination thereof. For example, the communications network may include, for example, the Internet, local intranet, PAN (Personal Area Network), LAN (Local Area Network), WAN (Wide Area Network), MAN (Metropolitan Area Network), virtual private network (VPN), DSL (Digital Subscriber Line) connection, Ethernet connection, ISDN (Integrated Services Digital Network) line, cable modem, ATM (Asynchronous Transfer Mode) connection, or an FDDI (Fiber Distributed Data Interface) or CDDI (Copper Distributed Data Interface) connection. Furthermore, communications may also include links to any of a variety of wireless networks including WAP (Wireless Application Protocol), GPRS (General Packet Radio Service), GSM (Global System for Mobile Communication), CDMA (Code Division Multiple Access) or TDMA (Time Division Multiple Access), CDPD (cellular digital packet data), Bluetooth radio, or an IEEE 802.11-based radio frequency network.

Thus, a portable apparatus for three-dimensional sensing has been described. One of the most important key differences of depth sensor 100 from prior art solution is its embedded processing capability. Unlike depth sensors known in the art, which require connection to a host device, such as a personal computer or game console, the depth sensor disclosed herein includes internal processing module. The processing module is configured to process, in real time, high definition depth images generated by the depth- sensing module and produce user tracking data, which include full body 3D skeletal tracking data (including, for example, 17 or more skeletal joints), user segmentation data, optionally user gesture data, and optionally HID-based user commands. The user tracking data can be provided at the rate of at least 30 fps and can include data related to two or more users.

Thus, the depth sensor described herein has enhanced functionality, and it can be fully portable and wireless. The depth sensor can be applied in mobile virtual reality, in Android and iOS-based TV entertainment, and in digital-out-of-home (DOOH) advertising applications. The depth sensor can also enable 6DoF HMD location tracking and full body avatar animation in addition to gesture control. In DOOH applications, the depth sensor can enable audience analytics and interactivity for digital signage.

Although embodiments of this disclosure have been described with reference to specific example embodiments and drawings, it will be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. Moreover, the embodiments described herein can be combined, other embodiments can be utilized, or structural, logical, and electrical changes can be made without departing from the scope of what is claimed. In this document, the term "or" is used to refer to a nonexclusive "or," such that "A or B" includes "A but not B," "B but not A," and "A and B," unless otherwise indicated.