Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD, APPARATUS AND SYSTEM FOR VIRTUAL CLOTHES MODELLING
Document Type and Number:
WIPO Patent Application WO/2014/081394
Kind Code:
A1
Abstract:
There is provided a method for virtual clothes modelling including receiving a plurality of frames of an input image stream of a user, generating an avatar based on one or more characteristics of the user, superimposing virtual clothes on the avatar, and synthesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes, wherein generating the avatar includes configuring the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user includes locating facial features of the user, extracting a colour of a facial area of the user, and applying a colour to the avatar computed based on the colour of the facial area, and/or wherein the avatar is three-dimensional and the input and output image streams are two-dimensional, and synthesizing the plurality of frames with the clothed avatar to generate the output image stream includes aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames. There is also provided an apparatus and a system for virtual clothes modelling.

Inventors:
YUAN MIAOLONG (SG)
KHAN ISHTIAQ RASOOL (SG)
FARBIZ FARZAM (SG)
Application Number:
PCT/SG2013/000496
Publication Date:
May 30, 2014
Filing Date:
November 22, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AGENCY SCIENCE TECH & RES (SG)
International Classes:
G06T19/20; G06T13/40
Domestic Patent References:
WO2012110828A12012-08-23
Foreign References:
US20120287122A12012-11-15
EP1959394A22008-08-20
US6546309B12003-04-08
Attorney, Agent or Firm:
SPRUSON & FERGUSON (ASIA) PTE LTD (Robinson Road Post Office, Singapore 1, SG)
Download PDF:
Claims:
CLAIMS

1. A method for virtual clothes modelling comprising:

receiving a plurality of frames of an input image stream of a user;

generating an avatar based on one or more characteristics of the user;

superimposing virtual clothes on the avatar; and

synthesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes,

wherein said generating the avatar comprises configuring the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user comprises:

locating facial features of the user;

extracting a colour of a facial area of the user; and

applying a colour to the avatar computed based on the colour of the facial area, and/or

wherein the avatar is three-dimensional and the input and output image streams are two-dimensional, and said synthesizing the plurality of frames with the clothed avatar to generate the output image stream comprises aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames.

2. The method according to claim 1 , wherein the one or more characteristics further include a body size of the user.

3. The method according to claim 1 or 2, wherein superimposing the virtual clothes on the avatar comprises:

segmenting the virtual clothes into a plurality of sections;

scaling each section of the virtual clothes to match a size of each corresponding section of the avatar; and

aligning the scaled virtual clothes with respect to the avatar to superimpose the virtual clothes on the avatar.

4. The method according to any one of claims 1 to 3, wherein said applying a colour to the avatar comprises:

converting the colour of the facial, area and an initial colour of the avatar into respective CIELAB colour spaces, each including a lightness (/) channel, a first colour axis (a) channel and a second colour axis (b) channel;

computing a mean and a standard deviation of each channel of the colour of the facial area and the initial colour of the avatar; and

determining the colour to apply to the avatar based on said mean and said standard deviation of each of said channels.

5. The method according to claim 4, wherein said determining the colour to apply to the avatar comprises:

subtracting said mean computed from each corresponding channel of the initial colour

where: u denotes initial co our o t e avatar;

/„ , au , and bu denote the / channel, a channel, and b channel of the initial colour of the avatar, respectively,

( < {au) < anc' {Κ) denote said mean values of the / channel, a channel and b channel of the initial colour of the avatar, respectively,

1

lu , au , and bu denote intermediate values of the / channel, a channel and b channel of the colour of the avatar, respectively, and

comput

where: / denotes the colour of the facial area;

If , af , and bf denote the / channel, a channel, and b

channel of the colour of the facial area, respectively, af' , afa , and cr* denote said standard deviation of the /

channel, a channel and b channel of the colour of the facial area, respectively,

σ , aua , and auh denote said standard deviation of the / channel, a channel and b channel of the colour of the avatar, respectively,

■ (lf , (af , and (bf denote said mean values of the / channel, a channel and b channel of the colour of the facial area, respectively, and

. lu' , au' , and bu' denote the / channel, a channel, and b channel of the colour to apply to the avatar, respectively.

6. The method according to any one of "Claims 1 to 5, wherein the facial area is a cheek area of the face of the user.

7. The method according to any one of claims 1 to 6, wherein said aligning the clothed avatar comprises computing a projection matrix (P) for projecting each of the three-dimensional points of the avatar to a two-dimensional point on a current frame (j) of the plurality of frames corresponding to each respective two-dimensional point of the user.

8. The method according to claim 7, wherein the three-dimensional points of the avatar and the two-dimensional points of the user comprise points located at joints and along bones of a skeleton model of the avatar and the user.

9. The method according to claim 8, wherein the projection matrix (P) for the j1 frame is computed by minimising residual errors according to:

l|2

mmMtjEj +wBEB + λ

PJ

Where: ~

Mi eJAva,ar >mi eJUser

Mi denotes the three-dimensional point of the avatar;

ζ{Ρ> ,M ) denotes the projection of the three-dimensional points of the avatar given the project matrix P' for the jth frame;

] Avatar and Juser denote points located at said joints from the skeleton models of the avatar and the user, respectively;

^Avatar and Buser denote points located along said bones of the skeleton models of the avatar and the user, respectively;

λ denotes a predetermined value functioning as a regularizer; and wj and WB are predetermined weights for Ej and EB , respectively.

10. The method according to claim 8 or 9, wherein the three-dimensional points of the avatar and the two-dimensional points of the user further comprise points located at facial features and silhouette contours of the avatar and the user.

11. The method according to any one of claims 1 to 10, configured to provide one or more of the following options:

the avatar is invisible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the user wearing the virtual clothes,

the avatar is visible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the clothed avatar resembling the user wearing the virtual clothes, or

a body portion of the avatar is visible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the body portion of the avatar wearing the virtual clothes while a head portion of the user remains in the output image stream aligned on the body portion of the avatar.

12. The method according to any one of claims 1 to 11 , wherein the image stream generated is a real-time image stream of the user trying on the virtual clothes. 13. An apparatus for virtual clothes modelling comprising:

a receiver module for receiving a plurality of frames of an input image stream of a user;

an avatar generator module for generating an avatar based on one or more characteristics of the user;

a superimposing module for superimposing virtual clothes on the avatar; a synthesizing module for sythesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes,

wherein the avatar generator module is operable to configure the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user comprises:

locating facial features of the user;

extracting a colour of a facial area of the user; and

applying a colour to the avatar computed based on the colour of the facial area, and/or

wherein the avatar is three-dimensional and the input and output image streams are two-dimensional, and said synthesizing the plurality of frames with the clothed avatar to generate the output image stream comprises aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames

1 . The apparatus according to claim 13, wherein the one or more characteristics further include a body size of the user.

15. The apparatus according to claim 13 or 14, wherein the superimposing module is configured to:

segment the virtual clothes into a plurality of sections; scale each section of the virtual clothes to match a size of each corresponding section of the avatar; and

align the scaled virtual clothes with respect to the avatar to superimpose the virtual clothes on the avatar.

16. The apparatus according to any one of claims 13 to 15, wherein said applying a colour to the avatar comprises:

converting the colour of the facial area and an initial colour of the avatar into respective CIELAB colour spaces, each including a lightness (/) channel, a first colour axis (a) channel and a second colour axis (b) channel;

computing a mean and a standard deviation of each channel of the colour of the facial area and the initial colour of the avatar; and

determining the. colour to apply to the avatar based on said mean and said standard deviation of each of said channels.

17. The apparatus according to claim 16, wherein said determining the colour to apply to the avatar comprises:

subtracting said mean computed from each corresponding channel of the initial colour o

where: u denotes initial colour of the avatar;

/„ , au , and bu denote the / channel, a channel, and b channel of the initial colour of the avatar, respectively,

(/„) , (au) , and (bu) denote said mean values of the / channel, a channel and b channel of the initial colour of the avatar, respectively, „ , au , and bu denote intermediate values of the / channel, a channel and b channel of the colour of the avatar, respectively,

and

computing the colour to apply to the avatar according to:

where: /denotes the colour of the facial area;

If , af , and bf denote the / channel, a channel, and b channel of the colour of the facial area, respectively,

afl , ofa , and afb denote said standard deviation of the / channel, a channel and b channel of the colour of the facial area, respectively,

σί > σ > and σ* denote said standard deviation of the / channel, a channel and b channel of the colour of the

avatar, respectively,

(lf^ . {« ) . ar>d (bf ^ denote said mean values of the / channel, a channel and b channel of the colour of the facial area, respectively, and

lu' , au' , and bu' denote the / channel, a channel, and b channel of the colour to apply to the avatar, respectively.

18. The apparatus according to any one of claims 13 to 17, wherein the facial area is a cheek area of the face of the user.

19. The apparatus according to any one of claims 13 to 18, wherein said aligning the clothed avatar comprises computing a projection matrix (P) for projecting each of the three-dimensional points of the avatar to a two-dimensional point on a current frame (j) of the plurality of frames corresponding to each respective two-dimensional point of the user.

20. The apparatus according to claim 19, wherein the three-dimensional points of the avatar and the two-dimensional points of the user comprise points located at joints and along bones of a skeleton-model of the avatar and the user, respectively.

21.. The apparatus according to claim 20, wherein the projection matrix (P) for the jth frame is computed by minimising residual errors according to:

Where

Mj denotes the three-dimensional point of the avatar;

ζ(Ρ] ,Mt) denotes the projection of the three-dimensional points of the avatar given the project matrix PJ for the jth frame;

1 Avat r and Juser denote points located at said joints from the skeleton models of the avatar and the user, respectively;

BAvatar and BUser denote points located along said bones of the skeleton models of the avatar and the user, respectively;

λ denotes a predetermined value functioning as a regularizer; and

Wj and wB are predetermined weights for Ej and EB , respectively.

22. The apparatus according to claim 20 or 21 , wherein the three-dimensional points of the avatar and the two-dimensional points of the user further comprise points located at facial features and silhouette contours of the avatar and the user, respectively.

23. The apparatus according to any one of claims 13 to 22, configured to provide one or more of the following options:

the avatar is invisible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the user wearing the virtual clothes, the avatar is visible, therefore, synthesizing -the plurality of frames with the clothed avatar generates the output image stream that on display shows the clothed avatar resembling the user wearing the virtual clothes, or

a body portion of the avatar is visible, therefore, synthesizing the pluralit of frames with the clothed avatar generates the output image stream that on display shows the body portion of the avatar wearing the virtual clothes while a head portion of the user remains in the output image stream aligned on the body portion of the avatar. 24. The apparatus according to any one of claims 3 to 23, wherein the output image stream generated is a real-time image stream of the user trying on the virtual clothes.

25. A system for virtual clothes modelling comprising:

the apparatus according to any one of claims 13 to 24,

an image capturing unit communicatively coupled to the apparatus for capturing an image stream of the user and sending the image stream to the apparartus, and

a display communicatively coupled to the image capturing unit for displaying the image stream generated by the apparatus to the user.

- 26. A computer program product, embodied in a computer-readable storage medium, comprising instructions executable by a computer processor to perform a method for virtual clothes modelling comprising:

receiving a plurality of frames of an input image stream of a user;

generating an avatar based on one or more characteristics of the user;

superimposing virtual clothes on the avatar; and

synthesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes,

wherein said generating the avatar comprises configuring the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user comprises:

locating facial features of the user; extracting a colour of a facial area of the user; and

applying a colour to the avatar computed based on the colour of the facial area, and/or

wherein the avatar is three-dimensional and the input and output image streams are two-dimensional, and said synthesizing the plurality of frames with the clothed avatar to generate the output image stream comprises aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames.

Description:
METHOD, APPARATUS AND SYSTEM FOR VIRTUAL CLOTHES MODELLING

FIELD OF INVENTION

The present invention generally relates to a method for virtual clothes modeling such as for virtual try-pn of clothes by a user in retail shopping. The present invention also relates to an apparatus and a system for virtual clothes modeling. BACKGROUND

Physical try-on of clothes can be a time consuming procedure in retail shopping. It typically takes several tries before a shopper can decide on the design, color and size of the apparel that suits the shopper. Virtual try-on of clothes can help to speed-up this process as the shopper is able to view the clothes on his/her body without physically wearing them. This may eliminate the need for physical try-on or at least narrow down his/her selections before physical try-on.

Some conventional systems use image processing techniques, such as changing the color of the user's clothes worn by the user or segmenting the user's clothes worn by the user and re-texturing them using extracted shading and shape deformation information based on the clothes which the user would like to try on. Some other conventional systems reconstruct a look-alike 3D model of the user and the clothes are simulated on the reconstructed 3D model. In this regard, scanning or computer vision-based systems can be used to reconstruct the user's 3D model. For conventional retexturing-based systems, a major disadvantage is that the user must physically wear the clothes and that only the clothes color/texture can be changed. Furthermore, such retexturing-based systems would also likely produce undesirable side effects, such as unintentionally changing the colour and texture of other parts of the image (e.g., the user's face, body, etc.), due to the typical inaccuracy of image segmentation algorithms resulting in a poor quality or unpleasant image. For conventional 3D reconstruction-based system, it is difficult and time consuming to reconstruct a user look-alike 3D model. Typically, the reconstructed 3D model does not look like the actual user, and in particular, lacks realistic facial animation, eye-contact, etc.

In recent years, some interactive virtual try-on systems using augmented reality (AR) technique have been reported. However, a major challenging issue in AR based systems is the requirement of accurate pose recovery for fitting virtual clothes or other accessories on the real user's body image. More recently, sensing technologies capable of providing high quality synchronized videos of both color and depth have been released such as MICROSOFT KINECT produced by MICROSOFT. Some conventional systems using the MICROSOFT KINECT use deformable 3D cloth models which follow the user's movements with related cloth simulation. However, MICROSOFT KINECT is normally used for games and other applications, and such conventional systems using MICROSOFT KINECT have not been found to be sufficiently accurate for alignment of the virtual clothes with the user's body.

A need therefore exists to provide a method, an apparatus and a system with enhanced virtual clothing experience which seek to overcome, or at least ameliorate, one or more of the deficiencies of the conventional art mentioned above. It is against this background that the present invention has been developed.

SUMMARY

According to a first aspect of the present invention, there is provided a method for virtual clothes modelling comprising:

receiving a plurality of frames of an input image stream of a user;

generating an avatar based on one or more characteristics of the user;

superimposing virtual clothes on the avatar; and

synthesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes,

wherein said generating the avatar comprises configuring the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user comprises:

locating facial features of the user; extracting a colour of a facial area of the user; and

applying a colour to the avatar computed based on the colour of the facial area, and/or

wherein the avatar is three-dimensional and the input and output image streams are two-dimensional, and said synthesizing the plurality of frames with the clothed avatar to generate the output image stream comprises aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames.

Preferably, the one or more characteristics further include a body size of the user.

Preferably, superimposing the virtual clothes on the avatar comprises:

segmenting the virtual clothes into a plurality of sections;

scaling each section of the virtual clothes to match a size of each corresponding section of the avatar; and

aligning the scaled virtual clothes with respect to the avatar to superimpose the virtual clothes on the avatar. Preferably, said applying a colour to the avatar comprises:

converting the colour of the facial area and an initial colour of the avatar into respective CIELAB colour spaces, each including a lightness (/) channel, a first colour axis (a) channel and a second colour axis (b) channel;

computing a mean and a standard deviation of each channel of the colour of the facial area and the initial colour of the avatar; and

determining the colour to apply to the avatar based on said mean and said standard deviation of each of said channels.

Preferably, said determining the colour to apply to the avatar comprises:

subtracting said mean computed from each corresponding channel of the initial colour of the avatar according to:

K = l u - ( l u ) , a u * = u - (a u ) , b u * = b u - (b u )

where: u denotes initial colour of the avatar; „ , a u , and b u denote the / channel, a channel, and b channel of the initial colour of the avatar, respectively,

(l u ) , (a u ) , and denote said mean values of the / channel, a channel and b channel of the initial colour of the avatar, respectively,

/„ , a u , and b u denote intermediate values of the / channel, a channel and b channel of the colour of the avatar, respectively, and

computing the colour to apply to the avatar according to:

/; = ¾ * +(//) . = ¾ +(«/) . K = b ; + {b f )

u u u where: / denotes the colour of the facial area;

If , a f , and b f denote the / channel, a channel, and b channel of the colour of the facial area, respectively,

' a f l , a f a , and σ* denote said standard deviation of the / channel, a channel and b channel of the colour of the facial area, respectively,

a u ', σ° , and denote said standard deviation of the / channel, a channel and b channel of the colour of the

avatar, respectively,

a nc ' {bf ) denote said mean values of the / channel, a channel and b channel of the colour of the facial area, respectively, and

l u ' , a u ' , and b u ' denote the / channel, a channel, and b channel of the colour to apply to the avatar, respectively.

Preferably, the facial area is a cheek area of the face of the user. Preferably, said aligning the clothed avatar comprises computing a projection matrix (P) for projecting each of the three-dimensional points of the avatar to a two- dimensional point on a current frame (j) of the plurality of frames corresponding to each respective two-dimensional point of the user.

Preferably, the three-dimensional points of the avatar and the two-dimensional points of the user comprise points located at joints and along bones of a skeleton model of the avatar and the user.

Preferably, the projection matrix (P) for the f* frame is computed by minimising residual errors according to:

Where

M.i^ J Avatar , Misuser

M i denotes the three-dimensional point of the avatar;

ζ Ρ' , ,.) denotes the projection of the three-dimensional points of the avatar given the project matrix P J for the j th frame;

J Avatar and Juser denote points located at said joints from the skeleton models of the avatar and the user, respectively;

B Avatar and B User denote points located along said bones of the skeleton models of the avatar and the user, respectively;

λ denotes a predetermined value functioning as a regularizer; and

W j and w B are predetermined weights for E j and E B , respectively. Preferably, the three-dimensional points of the avatar and the two-dimensional points of the user further comprise points located at facial features and silhouette contours of the avatar and the user. Preferably, the method is configured to provide one or more of the following options: the avatar is invisible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the user wearing the virtual clothes,

the avatar is visible, therefore,- synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the clothed avatar resembling the user wearing the virtual clothes, and

a body portion of the avatar is visible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the body portion of the avatar wearing the virtual clothes while a head portion of the user remains in the output image stream aligned on the body portion of the avatar. Preferably, the image stream generated is a real-time image stream of the user trying on the virtual clothes.

According to a second aspect of the present invention, there is provided an apparatus for virtual clothes modelling comprising:

a receiver module for receiving a plurality of frames of an input image stream of a user;

an avatar generator module for generating an avatar based on one or more characteristics of the user;

a superimposing module for superimposing virtual clothes on the avatar; a synthesizing module for sythesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes,

wherein the avatar generator module is operable to configure the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user comprises:

locating facial features of the user;

extracting a colour of a facial area of the user; and

applying a colour to the avatar computed based on the colour of the facial area, and/or wherein the avatar is three-dimensional and the input and output image streams are two-dimensional, and said synthesizing the plurality of frames with the clothed avatar to generate the output image stream comprises aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames

Preferably, the one or more characteristics further includes a body size of the user. Preferably, the superimposing module is configured to:

segment the virtual clothes into a plurality of sections;

scale each section of the virtual clothes to match a size of each corresponding section of the avatar; and

align the scaled virtual clothes with respect to the avatar to superimpose the virtual clothes on the avatar.

Preferably, said applying a colour to the avatar comprises:

converting the colour of the facial area and an initial colour of the avatar into respective CIELAB colour spaces, each including a lightness (/) channel, a first colour axis (a) channel and a second colour axis (b) channel;

computing a mean and a standard deviation of each channel of the colour of the facial area and the initial colour of the avatar; and

determining the colour to apply to the avatar based on said mean and said standard deviation of each of said channels.

Preferably, said determining the colour to apply to the avatar comprises:

s ing channel of the initial co

where: u denotes initial colour of the avatar;

/„ , a u , and b u denote the / channel, a channel, and b channel of the initial colour of the avatar, respectively, W 201

8

(/„) , (¾) , and (b u ) denote said mean values of the / channel, a channel and b channel of the initial colour of the avatar, respectively,

/„ , a u , and b u denote intermediate values of the / channel, a channel and b channel of the colour of the avatar, respectively,

and

where: /denotes the colour of the facial area;

, a f , and b f denote the / channel, a channel, and b channel of the colour of the facial area, respectively,

a f ' , < ° , and <x* denote said standard deviation of the / channel, a channel and b channel of the colour of the facial area, respectively,

σ[ , σ° , and * denote said standard deviation of the / channel, a channel and b channel of the colour of the avatar, respectively,

(l f , {a f ) , and b^ denote said mean values of the / channel, a channel and b channel of the colour of the facial area, i respectively, and

u , a u ' , and b u ' denote the / channel, a channel, and b channel of the colour to apply to the avatar, respectively.

Preferably, the facial area is a cheek area of the face of the user.

Preferably, said aligning the clothed avatar comprises computing a projection matrix (P) for projecting each of the three-dimensional points of the avatar to a two- dimensional point on a current frame (j) of the plurality of frames corresponding to each respective two-dimensional point of the user. Preferably, the three-dimensional points of the avatar and the two-dimensional points of the user comprise points located at joints and along bones of a skeleton model of the avatar and the user, respectively.

Preferably, the projection matrix (P) for the j th frame is computed by minimising residual errors according to:

Where: ~

M i Mi B Avatar ,mi B User

M t denotes the three-dimensional point of the avatar;

ζ(Ρ ] , , ) denotes the projection of the three-dimensional points of the avatar given the project matrix P j for the j th frame;

JAvatar ar| d Juser denote points located at said joints from the skeleton models of the avatar and the user, respectively;

^Avatar and Buser denote points located along said bones of the skeleton models of the avatar and the user, respectively;

λ denotes a predetermined value functioning as a regularize^ and and w B are predetermined weights for E j and E B , respectively.

Preferably, the three-dimensional points of the avatar and the two-dimensional points of the user further comprise points located at facial features and silhouette contours of the avatar and the User, respectively. Preferably, the apparatus is configured to provide one or more of the following options: the avatar is invisible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the user wearing the virtual clothes,

the avatar is visible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the clothed avatar resembling the user wearing the virtual clothes, and

a body portion of the avatar is visible, therefore, synthesizing the plurality of frames with the clothed avatar generates the output image stream that on display shows the body portion of the avatar wearing the virtual clothes while a head portion of the user remains in the output image stream aligned on the body portion of the avatar.

Preferably, the output image stream generated is a real-time image stream of the user trying on the virtual clothes.

According to a third aspect of the present invention, there is provided a system for virtual clothes modelling comprising:

the apparatus according the second aspect of the present invention described herein,

an image capturing unit communicatively coupled to the apparatus for capturing an image stream of the user and sending the image stream to the apparartus, and

a display communicatively coupled to the image capturing unit for displaying the image stream generated by the apparatus to the user.

According to a fourth aspect of the present invention, there is provided a computer program product, embodied in a computer-readable storage medium, comprising instructions executable by a computer processor to perform a method for virtual clothes modelling comprising:

receiving a plurality of frames of an input image stream of a user;

generating an avatar based on one or more characteristics of the user;

superimposing virtual clothes on the avatar; and synthesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes,

wherein said generating the avatar comprises configuring the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user comprises:

locating facial features of the user;

extracting a colour of a facial area of the user; and

applying a colour to the avatar computed based on the colour of the facial area, and/or

wherein the avatar is three-dimensional and the input and output image streams are two-dimensional, and said synthesizing the plurality of frames with the clothed avatar to generate the output image stream comprises aligning the clothed avatar with each of the plurality of frames based on a set of three- dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames.

BRIEF DESCRIPTION OF THE DRAWINGS Embodiments of the present invention will be better understood and readily apparent to one of ordinary skill in the art from the following written description, by way of example only, and in conjunction with the drawings, in which:

Fig. 1 depicts a flow diagram generally illustrating a method for virtual clothes modelling according to an example embodiment of the present invention;

Fig. 2 depicts a schematic block diagram of an example system including an example apparatus for virtual clothes modelling configured to perform the method shown in Fig. 1 according to an example embodiment of the present invention;

Figs. 3A to 3C depict three exemplary try-on scenarios according to embodiments of the present invention; Fig. 4 depicts a block diagram illustrating a detailed architecture of an apparatus for virtual clothes modelling according to an example embodiment of the present invention; Fig. 5A depicts an example generic avatar generated by the generic avatar generator module;

Figs. 5B to 5F depict example customized avatars configured to have various sizes based on the body size of different users;

Figs 6A and 6B depict the clothed avatars generated for two different users, respectively;

Figs. 7A to 7C illustrate a technique for extracting a facial area of interest to be used as a suitable exemplar patch for colour transfer to the avatar;

Figs. 8A to 8C illustrate an example of aligning a 3D avatar with a 2D image of the user according to an example embodiment of the present invention; Figs. 9A and 9B illustrate another example of aligning a 3D avatar with a 2D image of the user according to another example embodiment of the present invention;

Fig. 10 illustrates a conventional general purpose computer which may be used to implement the method described herein in example embodiments of the present invention;

Figs. 11 to 13 illustrate exemplary output images for various poses of a first; a second and a third scenario produced by the system according to example embodiments of the present invention.

DETAILED DESCRIPTION

As described in the background of the present specification, physical try-on of clothes can be a time consuming procedure in retail shopping. It typically takes several tries before a shopper can decide on the design, color and size of the apparel that suits the shopper. Virtual try-on of clothes can help to speed-up this process as the shopper is able to view the clothes on his/her body without physically wearing them. This may eliminate the need for physical try-on or at least narrow down his/her selections before physical try-on.

Embodiments of the present invention seek to provide a method, an apparatus and a system for virtual clothes modelling which can be used, e.g., for virtual try-on of clothes by a user/customer in retail shopping. Details of the method, apparatus and system according to exemplary embodiments of the present invention will be described hereinafter.

Some portions of the description which follows are explicitly or implicitly presented in terms of algorithms and functional or symbolic representations of operations on data within a computer memory. These algorithmic descriptions and functional or symbolic representations are the means used by those skilled in the data processing arts to convey most effectively the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities, such as electrical, magnetic or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated.

Unless specifically stated otherwise, and as apparent from the following, it will be appreciated that throughout the present specification, discussions utilizing terms such as "scanning", "calculating", "determining", "replacing", "generating", "initializing", "outputting", or the like, refer to the action and processes of a computer system, or similar electronic device, that manipulates and transforms data represented as physical quantities within the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices.

The present specification also discloses apparatus for performing the operations of the methods. Such apparatus may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer. The algorithms and displays presented herein are not inherently related to any particular computer or other . apparatus. Various general purpose machines may be used with programs in accordance with the teachings herein. Alternatively, the construction of more specialized apparatus to perform the required method steps may be appropriate. The structure of an exemplary conventional general purpose computer will be described later with reference to Fig. 10.

In addition, the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in the art that the individual steps of the method described herein may be put into effect by computer code. The computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein. Moreover, the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.

Furthermore, one or more of the steps of the computer program may be performed in parallel rather than sequentially. Such a computer program may be stored on any computer readable medium. The computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer. The computer program when loaded and executed on such a general-purpose computer effectively results in an apparatus that implements the steps of the method described herein.

The invention may also be implemented as hardware modules. More particular, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the system can also be implemented as a combination of hardware and software modules. Fig. 1 depicts a flow diagram generally illustrating a method 100 for virtual clothes modelling according to an example embodiment of the present invention. The method 100 comprises a step 104 of receiving a plurality of frames of an input image stream of a user, a step 108 of generating an avatar based on one or more characteristics of the user, a step 112 of superimposing virtual clothes on the avatar, and a step 116 of synthesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes. It will be appreciated to a person skilled in the art that the above- described steps may be performed in another order as appropriate and are not limited to the order presented. Furthermore, the above steps are not intended to be construed to necessitate individual/separate steps and may be combined as or performed in one step where appropriate without deviating from the scope of the present invention.

In the example embodiment, the user is a person/customer who is trying on the virtual clothes (virtual try-on) and wishes to see how the clothes would look on himself/herself when worn. The virtual clothes may be digital versions of the physical clothes at a retail shop and the user may indicate the physical clothes that he/she are interested in trying and tries on the corresponding virtual clothes instead of the physical clothes. In another embodiment, the user may browse through a„digital collection of virtual clothes (e.g., online shopping) and tries on any of the virtual clothes of interest. The image stream may be generated by an image capturing unit such as a digital camera/sensor, preferably a RGB-D camera/sensor such as MICROSOFT KINECT produced and made commercially available by MICROSOFT. The avatar is a digital three-dimensional object representing or resembling the user or a part thereof.

In a preferred embodiment, the avatar, is configured to match one or more characteristics of the user, including a skin colour of the user. Preferably, configuring the avatar to match the skin colour of the user includes locating facial features of the user, extracting a colour of a facial area of the user, and applying a. colour to the avatar computed based on the colour of the facial area. In addition, the avatar may also be configured to match a body size of the user. The manner in which the avatar is configured will be described in further detail later below. In another preferred embodiment of the present invention, the avatar is three- dimensional and the input and output image streams are two-dimensional, whereby synthesizing the plurality of frames with the clothed avatar to generate the output image stream includes aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames. This alignment technique will be described in further detail later below.

An apparatus 200 for virtual clothes modelling configured to perform the method 100 described above according to an example embodiment of the present invention is schematically illustrated in Fig. 2. In the example embodiment, the apparatus 200 comprises a receiver module 204 for receiving a plurality of frames of an input image stream 203 of a user 202, an avatar generator module 208 for generating an avatar based on one or more characteristics of the user 202, a superimposing module 212 for superimposing virtual clothes on the avatar, a synthesizing module 216 for synthesizing the plurality of frames with the clothed avatar to generate an output image stream 220 that on display appears to resemble the user 202 trying on the virtual clothes. It will be " appreciated to a person skilled in the art that the apparatus 200 for virtual clothes modelling may be specially constructed for the required purposes, or may be implemented in a general purpose computer or other devices. For example, each of the above modules may be software modules realised by a computer program or a set of instructions executable by a computer processor 1018 to perform the required functions, or may be hardware modules being a functional hardware unit designed to perform the required functions. It will also be appreciated that a combination of hardware and software modules may be implemented in the apparatus 200.

Fig. 2 also schematically illustrates a system 250 for virtual clothes modelling comprising the apparatus 200, an image capturing unit 254 communicatively coupled to the apparatus 200 (in particular, the receiver module 204) for capturing an input image stream 203 of the user 202 and sending the image stream 203 to the apparatus 200, and a display unit 258 communicatively coupled to the apparatus 200 (in particular, the synthesizing module 216) for displaying the output image stream 220 generated by the apparatus 200 to the user 202. In an embodiment, the image capturing unit 254 is digital camera/sensor and preferably a RGB-D camera/sensor such as MICROSOFT KINECT produced and made commercially available by MICROSOFT. The display unit 258 can be any device with a display screen capable of receiving and displaying an image stream, such as but not limited to a TV, a projector, or a computer device with a display screen.

For clarity and illustration purposes, the above exemplary embodiments will now be described in further detail with respect to preferred examples. However, it will be appreciated to a person skilled in the art that the present invention is not limited to the preferred examples, and certain aspects of the preferred examples may be modified or varied as appropriate while still falling within the scope of the exemplary embodiments as described in Figs. 1 and 2.

Figs. 3A to 3C illustrate three exemplary try-on scenarios preferred in the embodiments of the present invention. In a first try-on scenario illustrated in Fig. 3A, the virtual clothes 304 are arranged so as to appear on a body of the user 202. In this scenario, the virtual clothes 304 are superimposed on the avatar of the user 202 but the avatar is made invisible in the output image 220 thus providing the effect that the user is wearing the virtual clothes. In a second try-on scenario illustrated in Fig. 3B, the avatar wearing the virtual clothes 304 (i.e., clothed avatar 308) is visible and shown entirely in the output image, including the body and head portions. In this scenario, the user 202 is able to see the clothed avatar 308 resembling the user 202 wearing the virtual clothes 304. In the third scenario illustrated in Fig. 3C, a headless clothed avatar 312 (i.e., the clothed avatar 308 as shown in Fig. 3B but without the head portion 309) is shown and is aligned/superimposed with an image of the user's head portion. That is, in the output image 220 generated, the body portion 312 of the clothed avatar 308 is shown while the head portion 309 of the avatar 308 is hidden. Therefore, the head portion 316 of the user 202 remains in the output image 220 aligned on the body portion 312 of the clothed avatar 308. Accordingly, in these examples, the virtual clothes 304 are simulated on the avatar 308 and follow the user's 202 movements in real-time to provide an enhanced virtual try-on experience close to in a real life situation where the user stands in front of a mirror. The apparatus 200 or the system 250 may be configured to provide one or more of the above-described try-on scenarios. An apparatus 400 will now be described according to a preferred embodiment with reference to Fig. 4 which efficiently integrates the above three try-on scenarios together to advantageously provide the user or operator with the options of selecting one or more of the try-on scenarios for the virtual try-on experience. Although preferred, it will be appreciated that the apparatus 200 may be configured to provide more or less try-on options/scenarios without deviating from the scope of present invention.

Fig. 4 depicts a block diagram illustrating an architecture of the apparatus 400 according to the example embodiment. The apparatus 400 comprises an avatar generator module 404 including a generic avatar generator module 406 for generating a generic/original avatar, an avatar body customization module 408 for customizing a body size of the avatar based on the body size of the user 202, and a skin tone matching module 410 for customizing the colour of the avatar based on a skin colour of the user 202. The apparatus 400 also comprises a receiver module 420 for receiving a plurality of frames of an input image stream 203 of the user 202. The receiver module 420 includes a body measurement module 422 for determining a body size of the user 202 based on the input image 203 of the user 202 received from the image capturing unit 426 and sending the measurements to the avatar body customization module 408, and a user segmentation module 424 for extracting a skin area of the input image of the user 202 and sending the extract skin area to the skin tone matching module 410. Accordingly, a customized avatar 412 may be generated based on a body size and a skin colour of the user 202. The apparatus 400 further comprises a superimposing and synthesizing module 430 for superimposing virtual clothes on the customized avatar, and synthesizing the plurality of frames of the input image stream 203 with the clothed avatar 308 to generate an output image stream 220 that on display appears to resemble the user trying on the virtual clothes in real time. In particular, the superimposing and synthesizing " module 430 comprises a user pose detection module 432 for detecting a pose of the user 202, a two- dimensional to three-dimensional (2D-3D) alignment module 434 for aligning the customized avatar from the avatar generator module 404 with the pose of the user 202, and a clothes simulation module 436 for superimposing the virtual clothes 304 to the customized avatar 412. With the customized avatar aligned with respect to the user 202 and the virtual clothes 304 superimposed on the customized avatar, the output image stream 220 may be generated by a rendering module 438 that on display appears to resemble the user trying on the virtual clothes in real time. For enhanced realism, the image capturing unit 426 may also capture a background image 320 behind the user 202 and the superimposing and synthesizing module 430 may comprise a background reconstruction module 442 for reconstructing the background image 320 in the output image stream 220.

In the example embodiment, the superimposing and synthesizing module 430 may further comprise a first avatar modification module 446 for hiding the avatar (i.e., making the avatar invisible) as described above in the first try-on scenario illustrated in Fig. 3A such that the virtual clothes 304 appear to the user 202 to be on his/her body, and a second avatar modification module 450 for hiding a part of the avatar 308, in particular, the head portion 309 as described above in the third try-on scenario illustrated in Fig. 3C such that the body portion 312 of the clothed avatar 308 is shown while the head portion 316 of the user 202 remains in the output image 220 aligned on the body portion 312 of the clothed avatar 308. For the second try-on scenario illustrated in Fig. 3B, the output image stream 220 from the rendering module 438 does not need to be modified as shown in Fig. 4 since the entire clothed avatar 308 resembling the user 202 is displayed. In all three scenarios, the virtual clothes 304 on the avatar follow the user's movements, therefore giving the user 202 a perception of trying on the virtual clothes 304 in front of a mirror in real time.

Fig. 5A illustrates an example generic/original avatar generated by the generic avatar generator module 406. Figs. 5B to 5F illustrate example customized/modified avatars configured to have various sizes based on the body size of different users 202. In the example embodiment, the size of the customized avatar is configured so as to match to body size of the user 202, i.e., the same size or in proportion thereto. For further details of one technique for configuring the avatar based on the body size of the user 202, reference is made to A. Niswar, I. Khan, and F. Farbiz, "Avatar Customization based on Human Body Measurements", Proceeding of the SIGGRAPH Asia 2012 Posters, the contents of which are hereby incorporated by cross-reference. In a preferred embodiment of the present invention, a scaling technique is implemented which scales different sections of the virtual clothes separately and then suitably translating them to retain the connectivity of the different sections. This is because scaling of the virtual clothes to appear to fit well on the user's body is an important requirement for a realistic virtual try-on. In this regard, a global scaling generally would not provide a satisfactory result since different users usually have different body shapes. Accordingly, in the preferred embodiment, superimposing the virtual clothes 304 on the avatar to produce a clothed avatar 308 comprises segmenting the virtual clothes 304 into a plurality of sections, scaling each section of the virtual clothes 304 to match a size of each corresponding section of the avatar, and aligning the scaled virtual clothes 304 with respect to the avatar to superimpose the virtual clothes 304 on the avatar. For example, increase in a circumference of the avatar at each point along vertical axis is applied to scale the virtual clothes 304 worn by the avatar. As a result, the virtual clothes 304 are scaled to fit on the avatar to generate a clothed avatar 308. The clothed avatar 308 may be generated in a setup or an offline stage (e.g., each of the plurality of virtue clothes 304 is pre-fitted/superimposed onto a respective avatar before the user virtual try-on to produce a plurality of clothed avatars 308). Therefore, when the avatar is customized according to the size of the user during the user virtual try-on, the virtual clothes 304 on the avatar is also automatically scaled proportionally since the virtual clothes 304 have already been fitted on the avatar. For illustration, Figs 6A and 6B show the clothed avatars 308 generated for two different users 202, respectively. In particular, the clothed avatar 308 is customised according to the size of the user 202 and thus the virtual clothes 304 on the user 202 are also scaled proportionally.

The functions of skin tone matching module 410 will now be described in further detail. In the example embodiment, besides altering the mesh geometry of the avatar 308 according to the size of the user 202, skin-tone matching is also implemented to generate a look-alike avatar resembling the user 202. In a preferred embodiment, the method of transferring the user's skin tone to the avatar 308 generally comprises the following three steps. In a first step, the facial features of the user 202 are located using a facial feature extraction technique such as the Active Shape Model (ASM) technique. The ASM technique is known in the art and thus will not be described in detail herein. In a second step, piecewise linear curves are used to represent facial areas of the user 202 and one or more facial areas of interest are extracted. In a preferred embodiment, the facial area of interest is the cheek area 712 of the user 202 and the cheek patches 716 are extracted from the image of the user 202 as shown in Figs. 7A to 7C and described in further detail below. In a third step, a global colour transfer method is applied to transfer the colour of extracted facial area to the avatar 308.

Conventionally, a common technique to acquire face skin patches is to transfer RGB image of the user to the YCbCr colour space and threshold the chrominance component. In particular, the pixels of the image with their chromatic value falling in a certain range are classified as skin-color pixels. However, in uncontrolled environments, some background clothes or hair may have similar colours as the face. Moreover, the lips and the areas between the eyebrows and the upper eyelids (or even the eyebrows in some cases) may be misclassified as skin areas which are not ideal/suitable skin patches to be used for skin tone transfer. Another problem in skin tone extraction is that the appearance of the skin is usually quite different under different viewing and lighting conditions. Additionally, due to different scattering properties of skin, cellular and 3D topographic structures of the face, there are usually highlights in the forehead, nose and chin areas. In a preferred embodiment, as illustrated in Fig. 7A, it is noted that the cheek area 712 is the largest flat skin area on the face 704, and is least affected by shadows from nose, forehead and hair. The cheek area 712 generally reflects the illuminated light uniformly compared to other areas on the face 704. Therefore, the cheek area 712 is chosen as a suitable or an ideal exemplar patch 716. In the preferred embodiment, in a first step, facial features of the face 704 are detected using ASM, in which face shape can be formed, with a set of landmarks that represent distinguishable points such as eye pupils, nose and lips, etc. In a second step, piecewise linear curves 708 are used to represent facial areas 704 of the user as shown in Fig. 7B and then cheek patches 716 are extracted from the image of the face 704 as shown in Fig. 7C. In the example, a total of 76 landmarks are marked on the image of the face 704, and joined by lines 708 in a predetermined order so as to obtain a piecewise linear representation of the face 704. Fig. 7B shows 20 landmarks and their connecting lines 708 used to enclose the left and right cheek areas 712. Fig. 7C shows the two cheek patches 716 extracted from the image of the face 704. In the example embodiment, to implement the colour transfer, the RGB colour components of the extracted face skin patches (i.e., the extracted cheek patches 716) are decomposed into lightness (/) and chrominance layers (a first colour axis (a) channel, e.g., in the range from cyan colour to magenta/red colour and a second colour axis (b) channel, e.g., in the range of blue to yellow colour) by converting them to the CIELab colour space. The CIELab colour space is known in the art and thus will not be described in detail herein. The "Γ channel is the lightness layer, and the "a" and "b" channels are the colour layers. The CIELAB colour space is chosen because it performs better than other colour spaces in separating lightness from colour, and is approximately perceptually uniform. In addition, the colour (i.e., the initial skin colour) of the avatar 308 is also converted into the CIELAB colour space, having the lightness (/) channel, the first colour axis (a) channel and the second colour axis (b) channel.

Subsequently, a mean and a standard deviation of each channel of the colour of the facial area (i.e., the user's cheek patches 616) and the initial colour of the avatar

* * *

(i.e., the avatar's UV map) are computed. Intermediate values ( l u , a u , b u ) of the colour of the avatar are then computed by subtracting the mean computed from each corresponding channel of the initial skin colour of the avatar 308 according to the following equations: = l u ~( l u) , a u * = a u - (a u ) , b u * = b u - (b u ) (1 ) where: "u" denotes the initial skin colour of the avatar (i.e., UV map of the avatar); l u , a u , and b u denote the / channel, a channel, and b channel of the initial skin colour of the avatar, respectively; (l u ) , (a u ) , and (b u ) denote the mean values of the / channel,

7 * * a channel, and b channel of the initial skin colour of the avatar, respectively; /„ , a u ,

L *

and b u denote intermediate values of the / channel, a channel, and b channel of the skin colour of the avatar, respectively. W

23

The above computed intermediate values of the skin colour of the avatar are then scaled with weighting factors (i.e.,— γ ,— - , and—r ) determined by the respective standard deviations, and the mean/average values of the user's face skin are then added to the scaled intermediate values to generate new 7", "a", and "b" colour components to be converted to the RGB colour space and applied to the avatar 308. In the example embodiment, the new skin colour for the avatar is determined according to the following equation:

where: /denotes the colour of the facial area of the user; If , a f , and b f denote the / channel, a channel, and b channel of the colour of the facial area, respectively; a f ' , o f a , and a f b denote the standard deviation of the / channel, a channel, and b channel of the colour of the facial area, respectively; a u ' , σ u a , and a u h denote the standard deviation of the / channel, a channel, and b channel of the skin colour of the avatar 308 (i.e., the UV map of the avatar 308), respectively; (l f ) , (a f ) , and (b f ^ denote the mean values of the / channel, a channel and b channel of the colour of the facial area, respectively; and l u ' , a u ' , and b u ' denote the / channel, a channel, and b channel of the skin colour to apply to the avatar 308, respectively. Once the new skin colour components ( l u ' , u ' ,b u ' ) for the avatar 308 is computed, they are converted back to the RGB colour space to give a new UV map which has a similar colour tone as the face skin (i.e., the cheek area) of the user 202.

The functions of the superimposing and synthesizing module 430 will now be described in further detail according to an example embodiment of the present invention.

Real-time 3D rigid tracking is relatively easy to be achieved. However, for articulated objects, it is difficult to achieve accurate tracking results with real-time performance due to the changing poses. In the example embodiment, a technique is provided which seeks to efficiently align a 3D avatar 308 with a user's image in real-time by using a RGB-D sensor, such as MICROSOFT KINECT. For stably aligning a 3D life-size avatar 308 with a user's 2D image sequence/stream, in the example embodiment, a set of 3D-2D (i.e., 3D to 2D) point correspondences between the 3D avatar model 308 and the current frame j of the user's 2D image is established. Manual intervention is often required to obtain some initial 3D-2D point correspondences. However, automatic initialization is preferred or necessary in a virtual try-on system. To achieve this, in the example embodiment, the user 202 may stand and look straight at a display with one of a number of predetermined poses, such as a T- pose, at the beginning. Two skeleton models 804, 808 are then employed and aligned with respect to each other, i.e., a skeleton model 808 for the avatar 810 which is preferably already built-in in the 3D avatar generated, and a skeleton model 804 for the user 202 which may be obtained or generated based on the image from the RGB-D camera. After the skeleton model 808 of the avatar is aligned with the skeleton 804 of the user 202, the user 202 keeps this pose for a short period of time (e.g., a few seconds) and the current frame may then be set as the key frame for initialization. In the example embodiment, a 2D image point of the user 202 is denoted by m = (u,v) T and a 3D point of the avatar 308 is denoted by = ( ,7,Z) r . Furthermore, homogeneous vectors are used to represent the 2D image point (m) and the 3D point (M) as follows:

m = (u,v,l) T , M = (X, Y,Z,l) T (3)

The relationship between a 3D point (M) and its 2D image point projection (m) (i.e., its corresponding 2D image point) may be expressed as follows: pm = K[R I t]M ' (4) where p is an arbitrary factor, and P = [R \ t] is the projection matrix. K is the camera 254 intrinsic matrix. For the 3D-2D alignment according to the example embodiment, the parameters P J are estimated for the current incoming frame j that minimize the sum of squared distances between the projections of the 3D points of the avatar 308 and the corresponding 2D image points of the user 202. , = (X,., } Z,.,l) r is denoted as a 3D vertex on the avatar 308 and m i = ( ,,v,,l) r is denoted as the corresponding 2D image point of the user 202. In the example embodiment, P J is estimated through minimizing the actual residual errors and can be determined as follows: where E, and E R are defined as follo

In Equations (6) and (7), ζ(Ρ Μ ί ) denotes the projection of 3D avatar points given the parameters P J in the j th incoming/current frame. As shown in Figures 8A and 8B,

J Avatar anc! J user are Ρ°· η *δ 830, 828 located at the joints of the skeletons 808, 804 of the avatar 308 and the user 202, respectively. B Avatar and B User are points 838, 834 located along the bones of the skeletons 808, 804 of the avatar 810 and the user 202, respectively. In the example embodiment, only the sampling points on the upper arms, the shoulder and the spine are used due to their stable poses captured from the RGB-D camera, and the different hip positions of the two skeletons 808, 804. However, it will be appreciated that the number of sampling points may be reduced or increased as appropriate. functions as a regularizer which is used to avoid, over-fitting and λ

is set as 0.0002 in the example embodiment which is determined experimentally, λ controls the effects of the regularizer and is typically a small value. w and w B are the weights for E j and E B , respectively, and are determined experimentally. In this example, vt and w B are each set as 0.5. For example, Equation (5) can be solved using the Levenberg-Marquardt (LM) minimization method. Fig. 8C shows an example of the initial alignment based on the above-described technique. The LM minimization method is known in the art and thus will not be described in detail herein. In a further embodiment, for robustly estimating the parameters P ] under various user's poses, more 3D-2D point correspondences | ; , /W } between the 3D avatar 308 and the incoming frame j of the user 202 are established. In this embodiment, feature points and silhouette contours are used to establish more 3D-2D point correspondences.

In relation to the silhouette contours, contour correspondences are established between the projected surface of the avatar 308 and the user's 2D image silhouette by searching for the 3D vertices of the avatar 308 whose projections are nearest to user's contour points. To determine the user's contour points, the image of the user 202 from the RGB- D camera 254 is segmented and the user silhouette contours are then extracted. Various known methods may be used to extract the user silhouette contours from an image and thus will not be described in detail herein.

For the feature point correspondences, we need to establish the 2D-2D correspondences in real-time between the keyframe r and the current incoming frame

J . In the example embodiment, a machine learning-based keypoint recognition method is used, in which matching is formulated as an image classification problem, where each class is trained with all the possible appearances that an image feature may take under large perspective and scale variations. For further details of a machine learning-based keypoint recognition method, reference is made to V. Lepetit and P. Fua, "Keypoint recognition using randomized trees", IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 9, pp. 1465-1497, 2006, the contents of which are hereby incorporated by cross- reference. A major advantage of the learning-based matching method is that it is very fast for the image tracking problems and therefore suitable for the applications that require real-time performance.

However, for the alignment of the customized clothed avatar 308 with the user's image, only the feature points below the user's head are considered since the size of the head is usually different from the size of the user's actual head. In both cases, the 2D correspondences are associated with a 3D vertex V i on the avatar 308, thus yielding the

3D-2D correspondences { ,,W }. After obtaining {M ( , OT } between the avatar 308 and the incoming frame j, the above-described equation (5) may be expanded by integrating them to estimate the projection matrix P J as follows:

P J := + w B E B + w c E c + w F E F + j (8) The definitions of E c and E F are same as E y and E B , but relate to silhouette contours and feature points instead of joints and bones, respectively. We estimate P j using Equation (8) and the robust M-estimators. Figs. 9A and 9B show an example of the alignment based on the above-described technique further involving the feature points and silhouette contours.

The method of the example embodiments described herein can be implemented on a computer system 1000, schematically shown in Fig. 10. It may be implemented as software, such as a computer program being executed within the computer system 1000, and instructing the computer system 1000 to conduct the method of the example embodiment. The computer system 1000 comprises a computer module 1002, input modules such as a keyboard 1004 and mouse 1006 and a plurality of output devices such as a display 1008, and printer 1010. The computer module 1002 is connected to a computer network 1012 via a suitable transceiver device 1014, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN). The computer module 1002 in the example includes a processor 1018, a Random Access Memory (RAM) 1020 and a Read Only Memory (ROM) 1022. The computer module 1002 also includes a number of Input/Output (I/O) interfaces, for example I/O interface 1024 to the display 1008, and I/O interface 1026 to the keyboard 1004. The components of the computer module 1002 typically communicate via an interconnected bus 1028 and in a manner known to the person skilled in the relevant art.

The application program may be supplied to the user of the computer system 1000 encoded on a data storage medium such as a CD/DVD-ROM or flash memory carrier and read utilising a corresponding data storage medium drive of a data storage device 1030. The application program may also be supplied to the user of the computer system 1000 via a network such as the Internet, LAN, or WAN. The application program is read and controlled in its execution by the processor 1018. Intermediate storage of program data may be accomplished using RAM 1020.

In particular, in an embodiment, a computer program product, embodied in a computer-readable storage medium 1030, comprising instructions executable by a computer processor 1018 to perform a method for virtual clothes modelling comprising receiving a plurality of frames of an input image stream of a user, generating an avatar based on one or more characteristics of the user, superimposing virtual clothes on the avatar, and synthesizing the plurality of frames with the clothed avatar to generate an output image stream that on display appears to resemble the user trying on the virtual clothes, wherein generating the avatar comprises configuring the avatar to match the one or more characteristics of the user including a skin colour of the user, and configuring the avatar to match the skin colour of the user comprises locating facial features of the user, extracting a colour of a facial area of the user, and applying a colour to the avatar computed based on the colour of the facial area, and/or wherein the avatar is three-dimensional and the image stream is two-dimensional, and said synthesizing the plurality of frames with the clothed avatar to generate the image stream comprises aligning the clothed avatar with each of the plurality of frames based on a set of three-dimensional to two-dimensional point correspondences between the clothed avatar and the user in the plurality of frames.

In an exemplary implementation, the virtual try-on apparatus 200 of the system 250 described herein was implemented in a 2.53GHz Intel Xeon(R) with 24GB RAM running Visual Studio 2010. A MICROSOFT KINECT camera was used as the RGB-D sensor for user image collection, pose detection, body measurements, user segmentation and user's face skin color detection.

At the beginning, the user 202 stands in front of a display with a standard or predetermined pose (such as a T-pose). The system 250 automatically establishes the relevant 3D-2D correspondences for the keyframe according to the techniques as described above. In addition, various sizes of the user's body (e.g., the user's height, shoulder width, waist height and arm lengths) and the face skin color are extracted using the RGB-D camera 254. The avatar is then customized according to the extracted size and the skin color. The user 202 may also key in to the system 250 additional body sizes for a more accurate avatar customization. Next, the user 202 can select his/her favourite virtual clothes for virtually trying-on. The selected virtual clothes will be superimposed to the customized avatar and synthesized with the input image stream 203 of the user 202 in real-time such that on display appears to resemble the user 202 trying on the virtual clothes 304 and following the user's movements.

Figs. 11 to 13 illustrate exemplary output images for various poses of the first, second and third scenarios produced by the system 250, respectively. In particular, Fig. 11 illustrates the first try-on scenario where the virtual clothes 304 are arranged so as to appear on a body of the user 202 (the virtual clothes 304 are superimposed on the avatar of the user 202 but the avatar is made invisible thus providing the effect that the user is wearing the virtual clothes), Fig. 12 illustrates the second try-on scenario where the avatar 308 wearing the virtual clothes 304 (i.e., clothed avatar 308) is visible and shown entirely, including the body and head portions (the user 202 is able to see the clothed avatar 308 resembling the user 202 wearing the virtual clothes 304), Fig. 13 illustrates the third try-on scenario where a headless clothed avatar 312 (i.e., the clothed avatar 308 as shown in Fig. 3B but without the head portion) is shown and is superimposed with an image of the user's head portion.

Accordingly, the method 100 and system 250 for virtual clothes modelling described herein according to the example embodiments are efficient and sufficiently accurate in aligning the virtual clothes 304 with the user's body. This * provides an enhanced virtual clothing experience. This also advantageously avoids or at least minimise the need for physical try-ons, which may thus result in increased sales by a retail shop. Additional advantages include the possibility of side-by-side comparison of various clothes and simultaneous viewing of outfits from different angles for further enhancing user experience. Interactive virtual try-on can also be an interesting feature for advertisement and/or attracting crowd. It will be appreciated by a person skilled in the art that numerous variations and/or modifications may be made to the present invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive.