Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
THREE DIMENTIONAL HAND POSE ESTIMATOR
Document Type and Number:
WIPO Patent Application WO/2023/281299
Kind Code:
A1
Abstract:
An apparatus is provided to generate a three-dimensional representation of a pose. The apparatus includes a communications interface to receive raw data from an external source. The raw data includes a representation of an object. In addition, the apparatus includes a memory storage unit to store the raw data. Furthermore, the apparatus includes a pre-processing engine to identify a plurality of visible keypoints from the raw data. The apparatus also includes a keypoint analysis engine to identify a connector between a first visible keypoint from the plurality of visible keypoints and a second visible keypoint from the plurality of visible keypoints. The apparatus further includes a pose estimation engine to generate a vector to represent the connector. The vector has a normalized magnitude.

Inventors:
GONZALEZ GARCIA ABEL (CA)
Application Number:
PCT/IB2021/056150
Publication Date:
January 12, 2023
Filing Date:
July 08, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HINGE HEALTH INC (US)
International Classes:
G06T7/00; G06T13/40
Other References:
JAN WÖHLKE; SHILE LI; DONGHEUI LEE: "Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 2 July 2018 (2018-07-02), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081110251
LI TIANXING; XIONG XI; XIE YIFEI; HITO GEORGE; YANG XING-DONG; ZHOU XIA: "Reconstructing Hand Poses Using Visible Light", PROCEEDINGS OF THE ACM ON INTERACTIVE, MOBILE, WEARABLE AND UBIQUITOUS TECHNOLOGIES, ACMPUB27, NEW YORK, NY, USA, vol. 1, no. 3, 11 September 2017 (2017-09-11), New York, NY, USA , pages 1 - 20, XP058484802, DOI: 10.1145/3130937
Attorney, Agent or Firm:
COLEMAN, Brian et al. (US)
Download PDF:
Claims:
What is claimed is:

1. An apparatus comprising: a communications interface to receive raw data from an external source, wherein the raw data includes a representation of an object; a memory storage unit to store the raw data; a pre-processing engine to identify a plurality of visible keypoints from the raw data; a keypoint analysis engine to identify a connector between a first visible keypoint from the plurality of visible keypoints and a second visible keypoint from the plurality of visible keypoints; and a pose estimation engine to generate a vector to represent the connector, wherein the vector has a normalized magnitude.

2. The apparatus of claim 1 , wherein the plurality of visible keypoints is a plurality of joints, wherein each joint of the plurality of joints represents a rotation point of the object.

3. The apparatus of claim 1 or 2, wherein the keypoint analysis engine uses a keypoint definition to identify the connector.

4. The apparatus of any one of claims 1 to 3, wherein the keypoint analysis engine uses a known structure to identify the connector. 5. The apparatus of any one of claims 1 to 4, wherein the keypoint analysis engine identifies a plurality of connectors associated with the object from the plurality of visible keypoints.

6. The apparatus of any one of claims 1 to 5, wherein the pre-processing engine estimates a position of an invisible keypoint.

7. The apparatus of claim 6, wherein the keypoint analysis engine estimates a connection between the invisible keypoint and a visible keypoint selected from the plurality of visible keypoints.

8. The apparatus of claim 7, wherein the keypoint analysis engine estimates the connection between the invisible keypoint and the visible keypoint based on a kinematic chain.

9. The apparatus of any one of claims 1 to 8, further comprising a pose generator to apply a template to generate a pose of the object based on the vector.

10. A method comprising: receiving raw data from an external source via a communications interface, wherein the raw data includes a representation of a hand; storing the raw data in a memory storage unit; identifying a plurality of visible keypoints from the raw data with a pre-processing engine; identifying a connector between a first visible keypoint from the plurality of visible keypoints and a second visible keypoint from the plurality of visible keypoints; and generating a vector to represent the connector, a direction of the connector with a pose estimation engine, wherein the vector has a normalized magnitude.

11. The method of claim 10, wherein the plurality of visible keypoints is a plurality of joints, wherein each joint of the plurality of joints represents a rotation point of the hand.

12. The method of claim 10 or 11, wherein identifying the connector comprises using a keypoint definition.

13. The method of any one of claims 10 to 12, wherein identifying the connector comprises is based on a known structure of the hand.

14. The method of any one of claims 10 to 13, further comprising identifying a plurality of connectors associated with the hand from the plurality of visible keypoints.

15. The method of any one of claims 10 to 14, further comprising estimating a position of an invisible keypoint.

16. The method of claim 15, further comprising estimating a connection between the invisible keypoint and a visible keypoint selected from the plurality of visible keypoints.

17. The method of claim 16, wherein estimating the connection between the invisible keypoint and a visible keypoint is based on a kinematic chain. 18. The method of any one of claims 10 to 17, further comprising apply a template to generate a pose of the hand based on the vector.

19. A non-transitory computer readable medium encoded with codes, wherein the codes are to direct a processor to: receive raw data from an external source via a communications interface, wherein the raw data includes a representation of a hand; store the raw data in a memory storage unit; identify a plurality of visible keypoints from the raw data with a pre-processing engine; identify a connector between a first visible keypoint from the plurality of visible keypoints and a second visible keypoint from the plurality of visible keypoints; and generate a vector to represent the connector, a direction of the connector with a pose estimation engine, wherein the vector has a normalized magnitude.

20. The non-transitory computer readable medium of claim 19, wherein the plurality of visible keypoints is a plurality of joints, wherein each joint of the plurality of joints represents a rotation point of the hand.

21. The non-transitory computer readable medium of claim 19 or 20, wherein the codes are to direct the processor to identify the connector comprises using a keypoint definition. 22. The non-transitory computer readable medium of any one of claims 19 to 21, wherein the codes are to direct the processor to identify the connector based on a known structure of the hand.

23. The non-transitory computer readable medium of any one of claims 19 to 22, wherein the codes are to direct the processor to identify a plurality of connectors associated with the hand from the plurality of visible keypoints.

24. The non-transitory computer readable medium of any one of claims 19 to 23, wherein the codes are to direct the processor to estimate a position of an invisible keypoint.

25. The non-transitory computer readable medium claim 24, wherein the codes are to direct the processor to estimate a connection between the invisible keypoint and a visible keypoint selected from the plurality of visible keypoints.

26. The non-transitory computer readable medium of claim 25, wherein the codes are to direct the processor to estimate the connection between the invisible keypoint and a visible keypoint based on a kinematic chain.

27. The non-transitory computer readable medium of any one of claims 19 to 26, wherein the codes are to direct the processor to apply a template to generate a pose of the hand based on the vector.

Description:
THREE DIMENSIONAL HAND POSE ESTIMATOR

BACKGROUND

[0001] Pose estimation is used to extract a pose of an object from an image. For example, an image of a hand may show it in one of many possible poses. For example, the hand may be clenched in a fist, the hand may be held open, a finger may be pointing, etc.

In this example, the pose of the hand may be described using keypoints representing joints and connectors between some keypoints representing bones. The pose may then be used in a three-dimensional model to animate the object.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Reference will now be made, by way of example only, to the accompanying drawings in which:

[0003] Figure 1 is a schematic representation of the components of an example apparatus to generate a three-dimensional representation of a pose;

[0004] Figure 2 is an example of raw data representing an image received at the apparatus of figure 1 ;

[0005] Figure 3 is a two-dimensional keypoint map of the image of figure 2;

[0006] Figure 4 is a three-dimensional vector representation of a pose of the image of figure 2;

[0007] Figure 5 is a schematic representation of a to generate a three- dimensional representation of a pose;

[0008] Figure 6 is a schematic representation of the components of another example apparatus to generate a three-dimensional representation of a pose; and

[0009] Figure 7 is a flowchart of an example of a method of generating a three-dimensional representation of a pose.

DETAILED DESCRIPTION

[0010] As used herein, any usage of terms that suggest an absolute orientation (e.g. “top”, “bottom”, “up”, “down”, “left”, “right”, “low”, “high”, etc.) may be for illustrative convenience and refer to the orientation shown in a particular figure. However, such terms are not to be construed in a limiting sense as it is contemplated that various components will, in practice, be utilized in orientations that are the same as, or different than those described or shown.

[0011] Computer animation is used in a broad range of different sectors to provide motion to various objects, such as people. In many examples of computer animation, a three-dimensional representation of an object in one or more poses is created and moved.

In some examples, the object may have multiple poses. The poses are not particularly limited and may be dependent on what the object is and the expected motions and range of motion about certain keypoints. For example, if the object is a hand of a person, the hand may have multiple poses and may be animated to move between different poses. Furthermore, it is to be appreciated by a person of skill with the benefit of this description that a hand may not be identical among different characters. However, the hand may be expected to have similar poses that may be considered to look natural. In other examples, the object may be another part of the body, such as a foot. In further examples, the object may not be a person and may be any other type of object having multiple poses.

[0012] Accordingly, the object to be animated is to be represented by a plurality of keypoints. The position as well as the available range of motion at each keypoint will provide the object with the appearance of natural poses. However, in some applications, a pose may be applied to different objects with different dimensions or scales. Continuing with the example above, a pose of a hand may be extending the index finger with the remaining fingers to signal the number one. To apply this pose to a different hand having a different size or a different proportion between the keypoints, such as a hand with longer fingers, a model representing the pose in a scale agnostic manner.

[0013] An apparatus and method of generating a three-dimensional pose for an object using normalized direction vectors to represent the pose in a manner that is agnostic to the scaling and object. The scale agnostic representation of the pose focuses on an actual three- dimensional pose of the object rather than its actual scale or shape. Accordingly, the scale agnostic representation of the pose may be subsequently used in a downstream process such as rotation estimation about a keypoint or application to similar objects having different scales.

[0014] In the present description, the models and techniques discussed below are generally applied to a hand of a person. It is to be appreciated by a person of skill with the benefit of this description that the examples described below may be applied to other objects, such as other parts of a person or to animals and machines.

[0015] Referring to figure 1, a schematic representation of an apparatus to generate a three-dimensional representation of a pose is generally shown at 50. The apparatus 50 may include additional components, such as various additional interfaces and/or input/output devices such as indicators to interact with a user of the apparatus 50. The interactions may include viewing the operational status of the apparatus 50 or the system in which the apparatus 50 operates, updating parameters of the apparatus 50, or resetting the apparatus 50. In the present example, the apparatus 50 is to receive raw data, such as raw data represent an image 100 as shown in figure 2, and to process the raw data to generate a three-dimensional representation of the pose of the object in the image 100. In the present example, the apparatus 50 includes a communications interface 55, a memory storage unit 60, a pre-processing engine 65, a keypoint analysis engine 70, and a pose estimation engine 75.

[0016] The communications interface 55 is to communicate with an external source to receive raw data representing an object. In the present example, the communications interface 55 may communicate with external source over a network, which may be a public network shared with a large number of connected devices, such as a WiFi network or cellular network. In other examples, the communications interface 55 may receive data from an external source via a private network, such as an intranet or a wired connection with other devices. As another example, the communications interface 55 may connect to another proximate device via a wired connection, a Bluetooth connection, radio signals, or infrared signals. In particular, the communications interface 55 is to receive raw data from the external source to be stored on the memory storage unit 60.

[0017] The external source from which the raw data is received is not particularly limited. For example, the external source may be an image capturing system such as a camera, smartphone, or tablet. In other examples, the external source may be a database of the object in a pose. Furthermore, the external source may be substituted with computer generated pose of the object.

[0018] The memory storage unit 60 is to store the raw data received via the communications interface 55. In particular, the memory storage unit 60 may store raw data including two-dimensional images representing objects from which a three-dimensional representation of a pose is to be generated. The representation of the object received by the communications interface is not particularly limited. In the present example, the object is a hand of a person. It is to be appreciated by a person of skill with the benefit of this description that a hand may have many different poses. Accordingly, the image 100 is to capture the hand in one of those poses, such as with the index finger extended from a fist position. In other examples, the image 100 may be substituted with a hand in another pose. In further examples, the object may be another body part of a person such as a foot or face. In another example, the object may not be part of a person and may be another type of object such as an animal or machine.

[0019] The memory storage unit 60 may be also used to store addition data to be used by the apparatus 50. For example, the memory storage unit 60 may store various reference data sources, such as templates and model data. Accordingly, it is to be appreciated that the memory storage unit 60 may be a physical computer readable medium used to maintain multiple databases, or may include multiple mediums that may be distributed across one or more external servers, such as in a central server or a cloud server. In some examples, the memory storage unit 60 may be preloaded with data, such as training data or instructions to operate components of the apparatus 50. In other examples, the instructions may be loaded via the communications interface 55 or by directly transferring the instructions from a portable memory storage device connected to the apparatus 50, such as a memory flash drive.

[0020] In the present example, the memory storage unit 60 includes a non-transitory machine -readable storage medium that may be any electronic, magnetic, optical, or other physical storage device. The memory storage unit 60 may be used to store information such as data received from external sources via the communications interface 55, template data, training data, pre-processed data from the pre-processing engine 65, results from the keypoint analysis engine 70, or results from the pose estimation engine 75. In addition, the memory storage unit 60 may be used to store instructions for general operation of the apparatus 50. For example, the memory storage unit 60 may store an operating system that is executable by a processor to provide general functionality to the apparatus 50 such as functionality to support various applications. The memory storage unit 60 may additionally store instructions to operate the pre-processing engine 65 and the neural network engine 70. The memory storage unit 60 may also store control instructions to operate other components and any peripheral devices that may be installed with the apparatus 50, such cameras and user interfaces.

[0021] The pre-processing engine 65 is to identify keypoints from the raw data to generate a keypoint map 105 from the image 100 as shown in figure 3. In particular, the pre-processing engine 65 is to identify the keypoints that are visible in the raw data image 100 to generate keypoint coordinates in two-dimensions on the raw data image 100. The manner by which the keypoints are identified is not particularly limited. For example, the keypoints may be extracted from the two-dimensional image based on matching a set of predefined keypoint definitions from a known structure of the object.

[0022] In particular, the pre-processing engine 65 may generate a keypoint heatmap for each keypoint. The keypoint heatmap generated by the pre-processing engine 65 is to generally provide representation of the position of a point on the object. In the present example, the keypoint heatmap is a two-dimensional map. The point of interest on the object is a keypoint which may correspond to a joint or position where the object carries out relative motions between different portions of the object. Continuing with the present example of a hand as the object, a keypoint may represent a joint in the hand, such as a knuckle or finger joint. The keypoint heatmap includes a confidence value for each pixel of the image 100 to indicate likelihood of whether the pixel is where the keypoint of interest is located. Accordingly, the keypoint heatmap typically shows a single hotspot where the pre processing engine 65 has determined the predefined keypoint of interest to be located. It is to be appreciated that in some examples, the pre-processing engine 65 may be part of an external system providing pre-processed data or the pre-processed data may be generated by other methods, such as manually by a user.

[0023] It is to be appreciated by a person of skill with the benefit of this description that a separate keypoint heatmaps are generated each of the predefined keypoints. In the specific example of a hand of a person, multiple keypoints may be predefined and represent points where the hand may have relative motion, such as a joint which rotates bones connected to it. It is to be further understood that for each keypoint, a certain range of motion or characteristics about the keypoint may be approximated. For example, a keypoint representing a knuckle may have a predefined range of motion, such as 90 degrees and a limited degree of freedom to be within a plane. In the present example, it is also to be understood by a person of skill with the benefit of this description that more predefined keypoints identified for an object allows for a more accurate and realistic depiction of the hand.

[0024] In the present specific example, the pre-processing engine 65 is configured to identify and locate 21 keypoints in a human hand. In particular, the 21 keypoints include the joints listed in Table 1 below represent the keypoints used in for a human hand.

TABLE 1 [0025] The generation of each keypoint heatmap is not particularly limited and may involve various image processing engines. In the present example, a computer vision-based human pose system such as the wrnchAI engine is used to identify each keypoint and to assign a confidence value to the position of the keypoint in the raw data. In other examples, other types of human pose systems may be used such as OpenPose, Google Blaze Pose, Mask-R CNN, or other human pose systems such as Microsoft Kinect or Intel RealSense.

In further examples, the human pose may alternately be annotated by hand in an appropriate key-point annotation tool such as Keymakr.

[0026] In some examples of poses, it is to be appreciated by a person of skill with the benefit of this description that some of the predefined keypoints for which the pre processing engine 65 is to identify may not be visible or present in the image 100 of the raw data. For example, the keypoint may be a predefined joint that is invisible or occluded by other portions of the object. In such examples, the pre-processing engine 65 may estimate the position of the invisible keypoint. The manner by which the pre-processing engine 65 estimates the position the invisible keypoint is not particularly limited. In one example, the invisible keypoint may be inferred using a known kinematic chain if proximate keypoints in the kinematic chain are visible.

[0027] Upon determining the heatmaps for the keypoints in two-dimension, the three- dimensional position of the keypoints are inferred. The manner by which the three- dimensional positions of the keypoints are determined is not particularly limited. In the present example, a lifting approach may be used to regress a depth value for each keypoint. [0028] In other examples, a depth heatmap may be generated for each of the keypoint heatmaps. Each depth heatmap estimator may be determined using a trained machine learning model, such as a convolutional neural network, or a recurrent neural network, a random forest model, or a deep neural network. In a specific example of a convolutional neural network, the architecture may include three-dimensional convolutional layers that extract features (such as object boundaries, occlusions, spatial orderings, texture and color patterns, lighting features) from the raw data image 100 to generate the depth heatmap. Upon determination of the depth heatmap, the depth heatmap may be combined with the associate keypoint heatmap to generate a three-dimensional heatmap of the position of the associated predefined keypoint. The three-dimensional position of the predefined keypoint may then be chosen to be a position of high probability.

[0029] The keypoint analysis engine 70 is to identify a connector between two keypoints identified by the pre-processing engine 65. The manner by which the keypoint connectors are identified is not particularly limited. For example, the keypoint analysis engine 70 may use the definitions for each of the keypoints identified by the pre-processing engine 65 to assign a connector between two keypoints. The definitions for each keypoint may include information about neighboring keypoints. As a specific example, each keypoint may represent a joint in a human hand. Accordingly, the definition of a joint may include its relative position in the hand and the neighboring joint or joints connected by a bone. In particular, the keypoint 106 may be associated with a distal interphalangeal joint in the hand and defined to be connected to a metacarpophalangeal joint and a distal interphalangeal joint. The keypoint analysis engine 70 will then connect the keypoint 106 to the keypoint 107 associated with the metacarpophalangeal joint and the keypoint 108 associated with the distal interphalangeal joint based. This process may be iterated until all connections in the keypoint definitions have been made based on the keypoint heatmaps generated by the pre-processing engine 65. In other examples, the keypoint analysis engine 70 may use a known structure to identify the keypoint connectors identified by the pre processing engine 65 instead of making a determination at each keypoint.

[0030] In the present example, the keypoint analysis engine 70 is to identify keypoint connectors that are completely visible in the image 100. A keypoint connector is deemed visible when the end keypoints of the keypoint connector are both visible. In some examples, the invisible keypoints may be used to estimate a keypoint connector between a visible keypoint and an invisible keypoint. In other examples, a keypoint connector may be estimated between two invisible keypoints as determined by the pre-processing engine 65. The manner by which invisible keypoint connectors are estimate is not particularly limited. For example, a keypoint connector may be estimated based on a calculation from a break on the kinematic chain to estimate the keypoint connector to complete the kinematic chain. In the example of a hand of a person, it is to be appreciated by a person of skill with the benefit of this description that it may be common self-occlusion in various poses. For example, only the keypoints corresponding to distal joints of the fingers may be seen when a hand is holding a large object or clenched in a fist.

[0031] The pose estimation engine 75 is to generate a vector for each keypoint connector to represent the keypoint connector. The vector includes a direction and is to be normalized to a magnitude, such as one. Each vector is to start from the same reference point, for example, by subtracting its three-dimensional position. Accordingly, the pose estimation engine 75 to generates a three-dimensional representation 110 of a pose with a plurality of vectors as shown in figure 4. The vectors generated by the pose estimation engine will be a scale-agnostic representation from which the object may be manipulated, such as via animation of the different connectors. The visual representation may be reconstructed using the vectors and any scaling for the object.

[0032] It is to be appreciated by a person of skill with the benefit of this description that this scale-agnostic representation of the pose of an object, such as a hand of a person, may be used to generate a set of 3D vectors of unit length that represent the pose from the image 100. The manner by which the apparatus 50 generates this representation is not particularly limited. For example, a neural network with a combination of convolutional and fully connected layers similar to those used for direct regression of the three-dimensional keypoints may be used.

[0033] By using the scale-agnostic representations, a corresponding pose may be generated by applying a structure in the associated keypoint connector definitions. Starting at a root and following each kinematic chain, the vectors corresponding to the hand may be accumulated along the direction of each vector. Since the scale-agnostic representation has unitary vectors, it will result in an estimated pose in which all connectors have unit length, which does not represent an anatomically correct hand. The pose estimation engine 75 may then scale the keypoint connectors in accordance with a set of bone lengths defined in a template of the object. When employing these realistic scales, the resulting pose will be visually pleasing. The template is not particularly limited and it is to be appreciated that the template may be modified to achieve different shapes and scales.

[0034] Referring to figure 5, a schematic representation of a computer network system is shown generally at 200. It is to be understood that the system 200 is purely exemplary and it will be apparent to those skilled in the art that a variety of computer network systems are contemplated. The system 200 includes the apparatus 50 to generate and assign a mechanical weight index based on a single two-dimensional image for mesh skinning, a plurality of external sources 20- 1 and 20-2 (generically, these external sources are referred to herein as “external source 20” and collectively they are referred to as “external sources 20”), and a plurality of content requesters 25-1 and 25-2 (generically, these content requesters are referred to herein as “content requesters 25” and collectively they are referred to as “content requesters 25”) connected by a network 210. The network 210 is not particularly limited and may include any type of network such as the Internet, an intranet or a local area network, a mobile network, or a combination of any of these types of networks. In some examples, the network 210 may also include a peer to peer network.

[0035] In the present example, the external sources 20 may be any type of computing device used to communicate with the apparatus 50 over the network 210 for providing raw data such as an image 100 of an object, such as a hand of person in a pose. For example, the external source 20-1 may be a smartphone. It is to be appreciated by a person of skill with the benefit of this description that the external source 20- 1 may be substituted with a laptop computer, a portable electronic device, a gaming device, a mobile computing device, a portable computing device, a tablet computing device or the like. In some examples, the external source 20-2 may be a camera to capture an image. The raw data may be generated from an image or video received or captured at the external source 20. In other examples, it is to be appreciated that the external source 20 may be a personal computer or smartphone, on which content may be created such that the raw data is generated automatically from the content. The content requesters 25 may also be any type of computing device used to communicate with the apparatus 50 over the network 210 for receiving three-dimensional representation of a pose to animate. For example, content requesters 25 may be a computer animator searching for scale-agnostic representation of a pose to apply to an object, such as a hand.

[0036] Referring to figure 6, another schematic representation of an apparatus 50a to generate a three-dimensional representation of a pose is generally shown. Like components of the apparatus 50a bear like reference to their counterparts in the apparatus 50, except followed by the suffix “a”. In the present example, the apparatus 50a includes a communications interface 55a, a memory storage unit 60a, and a processor 80a. In the present example, the processor 80a includes a pre-processing engine 65a, a keypoint analysis engine 70a, a pose estimation engine 75a, and a pose generator 77a.

[0037] In the present example, the memory storage unit 60a maintains databases to store various data used by the apparatus 50a. For example, the memory storage unit 60a may include a database 300a to store raw data images received from an external source, a database 310a to store the data generated (i.e. identified keypoints) by the pre-processing engine 65 a, a database 320a to store the keypoint connectors identified by the keypoint analysis engine 70a, a database 330a to store the normalized representations of the pose of the object generated by the pose estimation engine 75a, and a database 335a to store rendered poses from the normalized representation generated by the pose generator 77a. In addition, the memory storage unit may include an operating system 340a that is executable by the processor 80a to provide general functionality to the apparatus 50a. Furthermore, the memory storage unit 60a may be encoded with codes to direct the processor 80a to carry out specific steps to perform a method described in more detail below. The memory storage unit 60a may also store instructions to carry out operations at the driver level as well as other hardware drivers to communicate with other components and peripheral devices of the apparatus 50a, such as various user interfaces to receive input or provide output.

[0038] In the present example, the processor 80a is to operate the pose generator 77a. The pose generator 77a is to apply a template to generate or render a pose of an object a based on the normalized representation of the pose. The manner by which the pose of an object is rendered from the normalized representation is not particularly limited. For example, the normalized three-dimensional representation of a pose may be fit to a template skeleton. A root of the object may be defined as a starting point for rendering the pose of an object. The root is not particularly limited and may be chosen to be a keypoint corresponding to a wrist of a hand. Accordingly, by following each directional vector from the root along a kinematic chain, the vectors may accumulate to represent the corresponding keypoint connector directions. This will result in an estimated pose in which all keypoint connectors have the same length, which may not represent an anatomically correct pose. Accordingly, the keypoint connectors may be scaled in accordance with the definitions of the end keypoints of each connector or bone. When employing these realistic scales, the resulting pose may be aesthetic and anatomically correct as the original three-dimensional pose, even if the scale and proportions of the object may be different from the original object.

[0039] Referring to figure 7, a flowchart of an example method of generating and assigning a mechanical weight index based on a single two-dimensional image for mesh skinning is generally shown at 300. In order to assist in the explanation of method 300, it will be assumed that method 300 may be performed by the apparatus 50. Indeed, the method 300 may be one way in which the apparatus 50 may be configured. Furthermore, the following discussion of method 300 may lead to a further understanding of the apparatus 50 and it components. In addition, it is to be emphasized, that method 300 may not be performed in the exact sequence as shown, and various blocks may be performed in parallel rather than in sequence, or in a different sequence altogether.

[0040] Beginning at block 310, the apparatus 50 receives raw data from an external source via the communications interface 55. In the present example, the raw data includes a two-dimensional representation of a hand. However, in other examples, the raw data may include a representation of another object capable of having multiple poses. Once received at the apparatus 50, the raw data is to be stored in the memory storage unit 60 at block 320. [0041] Block 330 involves identifying a plurality of visible keypoints in the image with the pre-processing engine 65. The visible keypoints is to generally provide a representation of the of the person in the raw image. The manner by which the keypoints are identified is not particularly limited. For example, the keypoints may be extracted from the two- dimensional image based on matching a set of predefined keypoint definitions from a known structure of the object. At block 340, the keypoint analysis engine 70 is to identify connectors between each keypoint identified by the pre-processing engine 65.

[0042] Block 350 comprises the pose estimation engine generating a vector for each keypoint connector. The vector includes a direction and is to be normalized to a magnitude, such as one. Accordingly, the normalized vectors may be combined to generate a normalized representation of the pose of the hand from the image received at block 310. [0043] Various advantages will not become apparent to a person of skill in the art. For example, the scale-agnostic representation of a pose may be used to estimate the pose of an object such as a hand. In particular, the input for the apparatus 50 may be an image containing a hand and the output may be a set of three-dimensional vectors of unit length that represent the directions of the connectors in an object, such as bones in the hand. Furthermore, the normalized representation may be used to render a three-dimensional annotation or estimated by the model, a corresponding pose may be generated by applying the predefined structure in the associated keypoint definition.

[0044] It should be recognized that features and aspects of the various examples provided above may be combined into further examples that also fall within the scope of the present disclosure.