Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AN APPARATUS AND METHOD FOR ANIMATING EMOTIONALLY DRIVEN VIRTUAL OBJECTS
Document Type and Number:
WIPO Patent Application WO/2008/087621
Kind Code:
A1
Abstract:
A mobile communication terminal for allowing a user to participate in a communication session with one or more participant s. The mobile communication terminal comprises a recognition module that receives a communication stream from one or more of the participants and identifies emotional expressions in the communication stream. The mobile communication terminal further comprises a scene-animating module for managing a scene having one or more virtual objects and for applying an animation on one or more of the virtual objects according to the emotional expressions. The mobile communication terminal further comprises an output module for outputting the animated virtual scene.

Inventors:
HERSZKOWICZ KOREN (IL)
YASHINSKI YARON (IL)
Application Number:
PCT/IL2007/001289
Publication Date:
July 24, 2008
Filing Date:
October 25, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MOBILESOLID LTD (IL)
HERSZKOWICZ KOREN (IL)
YASHINSKI YARON (IL)
International Classes:
H04M1/72427
Domestic Patent References:
WO2005099262A12005-10-20
Foreign References:
EP1326445A22003-07-09
US20050172001A12005-08-04
GB2423905A2006-09-06
Attorney, Agent or Firm:
G. E. EHRLICH (1995) LTD. et al. (Ramat Gan, IL)
Download PDF:
Claims:

CLAIMS

What Is claimed is:

1. A mobile communication terminal for allowing a user to participate in a network-based communication session with at least one other participant, comprising: a recognition module configured for receiving a communication stream from the at least one other participant and identifying at least one emotional expression from said communication stream; a scene-animating module, associated with said recognition module, configured for managing a scene having at least one virtual object and for applying an animation on said at least one virtual object wherein said animation is selected according to said at least one identified emotional expression; and an output module configured for outputting said animated scene for viewing by the user as a visual enhancement to the communication session.

2. The mobile communication terminal of claim 1, wherein said communication stream comprises a member of the group consisting of: an audio stream, a video stream, and an audiovisual stream.

3. The mobile communication terminal of claim 1, wherein said scene-animating module comprises a directing sub-module configured for automatically adjusting said animated scene according to said at least one emotional expression.

4. The mobile communication terminal of claim 3, wherein said adjusting comprises automatically adjusting a member of the group consisting of: the point of view (POV) of said animated scene, the illumination of said animated scene, and the rate of animating said scene.

5. The mobile communication terminal of claim 3, wherein said scene-animating module is configured for setting a resolution level of said animated scene according to said adjusted animated scene.

6. The mobile communication terminal of claim 1, wherein said scene-animating module comprising a directing sub-module configured for automatically adjusting a point of view (POV) of said animated scene according to said at least one identified emotional expression, tracking the movement of said virtual object in said scene.

7. The mobile communication terminal of claim 1, wherein said at least one virtual object comprises a plurality of body parts, each said body part being separately animated according to said at least one identified expression.

8. The mobile communication terminal of claim 1, wherein said scene-animating module comprises a sound sub-module configured for identifying an audio stream being played simultaneously with said communication stream, said scene-animating module being configured for adjusting said animation according to said audio stream identification.

9. The mobile communication terminal of claim 1, further comprising a viseme detection module, associated with said scene-animating module, configured for identifying at least one viseme of the at least one other participant in said at least one communication stream, said scene-animating module being configured for further animating said at least one virtual object with said at least one viseme.

10. The mobile communication terminal of claim 1, further comprising a user interface configured for allowing a user to input instructions for adjusting a point of view (POV) of said animated scene.

11. The mobile communication terminal of claim 1, wherein said recognition module configured for receiving a plurality of communication streams from a plurality of participants and identifying respective at least one emotional expression from each said communication stream, said scene-animating module is configured for managing a scene having a. plurality of virtual objects and for animating each said virtual object according to respective said at least one emotional expression of one of said plurality of communication streams.

12. The mobile communication terminal of claim 1, further comprising a repository for storing a plurality of animation instructions of said at least one object, said animation being applied according to a match between said at least one emotional expression and at least one of said plurality of animation instructions.

13. The mobile communication terminal of claim 12, further comprising a user interface for allowing the user to rank the accuracy of said animated scene and a learning module configured for weighting said plurality of animations according to said user ranking, thereby adjusting the selection of said animation.

14. The mobile communication terminal of claim 1, wherein said recognition module configured for identifying an intensity of at least one of said at least one emotional expression, said scene-animating module being configured for applying said animation according to a respective intensity.

15. The mobile communication terminal of claim 1, further comprising a caller identification module configured for identifying an identifier of the at least one other participant, said at least one virtual object being selected according to said identifier.

16. A method for allowing a user to participate in a network-based communication session with at least one other participant, comprising: a) locally managing a scene having at least one virtual object for viewing by said user as a visual enhancement to said communication session; b) receiving a communication stream of the at least one other participant via a communication network; c) identifying at least one emotional expression in said communication stream; and d) locally animating said at least one virtual object according to said at least one identified emotional expression.

17. The method of claim 16, wherein said locally animating comprises blending a predefined animation associated with said identified emotional expression into a current animation of said at least one virtual object.

18. The method of claim 16, further comprising automatically adjusting said animated scene according to said at least one emotional expression.

19. The method of claim 18, wherein said adjusting comprises adjusting a member of the group consisting of: a point of view (POV) of said animated scene, the illumination of said animated scene, and the rate of animating said scene.

20. The method of claim 16, further comprising automatically adjusting said animated scene according to said local animation.

21. The method of claim 16, further comprising adjusting said animation according to music identified in said communication stream.

22. The method of claim 16, further comprising identifying at least one viseme of the at least one other participant in respective said communication stream; said locally animating comprising further animating said at least one virtual object according to said at least one viseme.

23. The method of claim 16, further comprising receiving a plurality of communication streams from a plurality of participants and identifying respective at least one emotional expression from each said communication stream, wherein said managing comprises managing a scene having a plurality of virtual objects and said animating comprises animating each said virtual object according to said respective at least one emotional expression.

24. The method of claim 16, further comprising receiving a plurality of communication streams from a plurality of participants in the communication session, each said virtual object being animated according to said at least one emotional expression from one of said plurality of communication streams.

25. The method of claim 16, wherein said animating comprises blending said local animation into a current animation of said at least one virtual object.

Description:

AN APPARATUS AND METHOD FOR ANIMATING EMOTIONALLY DRIVEN VIRTUAL OBJECTS

RELATIONSHIP TO EXISTING APPLICATIONS

The present application claims priority from Provisional US Patent Application No. 60/880,431, filed on January 16, 2007 the content of which is hereby incorporated by reference.

FIELD AND BACKGROUND OF THE INVENTION

The present invention relates to an apparatus and a method for animating one or more virtual objects according to an audio stream and/or a video stream and, more particularly, but not exclusively to a mobile communication terminal and a method for animating one or more virtual objects, such as avatars, according to emotional expressions in an audio stream and/or a video stream.

During the last decade, cellular networks have become a major factor in communication, providing numerous communication solutions and experiences. One new communication feature is the ability to customize a personal experience to a network subscriber by customizing his or her mobile device. Such customizing can be implemented by providing wallpaper entertainment downloads, ring tones, sounds, etc.

One example for such a customized experience is a service that allows customers to download personalized avatars to mobile devices. Such a service allows users to use their mobile device for accessing a website that allows them to create and download user-defined avatars for local use on their mobile devices.

The mobile avatar can take many different shapes depending on the user's desires, for example, a talking head, a cartoon, an animal or a three-dimensional picture of the user. To other users in the virtual world, the mobile avatar is a graphical representation of the user, a caller, and/or a callee. The avatar may be used in virtual reality when the user controlling the avatar logs on to, or interacts with, the virtual world, e.g., via a personal computer or mobile telephone.

As mentioned above, a talking head may be a three-dimensional representation of a person's head whose lips move in synchronization with speech. Talking heads can

be used to create an illusion of a visual interconnection, even though the connection used is a speech channel. A talking avatar may be used for a variety of applications such as a model-based image compression for video telephony, presentations, avatars in virtual meeting rooms, intelligent computer-user interfaces such as e-mail reading and games, and many other operations. An example of such an intelligent user interface is a mobile video communication system that uses a talking head to express transmitted audio messages.

For example, US Patent Application Number 2006/029446, published on December 28, 2006 discloses a method and system for creating mobile avatars, customizing or personalizing mobile avatars and distributing mobile avatars including real-time updates thereof across a wireless network. A mobile avatar of a service subscriber is stored in mobile clients across the wireless network, and the mobile avatar is retrieved from the mobile clients and displayed upon receipt of a call from the service subscriber. An additional example is provided in US Patent Application Number

2004/0235531, published on November 25, 2004 that discloses a transmitting cell phone with a character image database, a user designator, a character image generator, and a communicator. A plurality of character images are preliminarily stored in the character image database. The user designator designates an expression or movement of a character image to be transmitted to a receiving cell phone. The character image generator acquires one character image out of the plurality of character images stored in the character image database and uses the character image to generate a character image with the expression or movement designated by the user designator. The communicator transmits the generated character image to the receiving cell phone.

SUMMARY OF THE INVENTION

According to one aspect of the present invention there is provided a mobile communication terminal for allowing a user to participate in a network-based communication session with at least one other participant. The mobile communication terminal comprises a recognition module configured for receiving a communication stream from the at least one other participant and identifying at least one emotional expression from the communication stream. The mobile communication terminal further comprises a scene-animating module for managing a scene having at least one

virtual object and for applying an animation on the at least one virtual object. The animation is selected according to the at least one identified emotional expression. The mobile communication terminal further comprises an output module configured for outputting the animated scene for viewing by the user as a visual enhancement to the communication session.

Optionally, the communication stream comprises a member of the group consisting of: an audio stream, a video stream, and an audiovisual stream.

Optionally, the scene-animating module comprises a directing sub-module configured for automatically adjusting the animated scene according to the at least one emotional expression.

More optionally, the adjusting comprises automatically adjusting a member of the group consisting of: the point of view (POV) of the animated scene, the illumination of the animated scene, and the rate of animating the scene.

More optionally, the scene-animating module is configured for setting a resolution level of the animated scene according to the adjusted animated scene.

Optionally, the scene-animating module comprises a directing sub-module configured for automatically adjusting a point of view (POV) of the animated scene according to the at least one identified emotional expression, tracking the movement of the virtual object in the scene. Optionally, the at least one virtual object comprises a plurality of body parts, each the body part is separately animated according to the at least one identified expression.

Optionally, the scene-animating module comprises a sound sub-module configured for identifying an audio stream is played simultaneously with the communication stream; the scene-animating module is configured for adjusting the animation according to the audio stream identification.

Optionally, the mobile communication terminal further comprises a viseme detection module, associated with the scene-animating module, configured for identifying at least one viseme of the at least one other participant in the at least one communication stream, the scene-animating module is configured for further animating the at least one virtual object with the at least one viseme.

Optionally, the mobile communication terminal further comprises a user interface configured for allowing a user to input instructions for adjusting a point of view (POV) of the animated scene.

Optionally, the recognition module configured for receiving a plurality of communication streams from a plurality of participants and identifying respective at least one emotional expression from each the communication stream. The scene- animating module is configured for managing a scene having a plurality of virtual objects and for animating each the virtual object according to respective the at least one emotional expression of one of the plurality of communication streams. Optionally, the mobile communication terminal further comprises a repository for storing a plurality of animation instructions of the at least one object, the animation is applied according to a match between the at least one emotional expression and at least one of the plurality of animation instructions.

More optionally, the mobile communication terminal further comprises a user interface for allowing the user to rank the accuracy of the animated scene and a learning module configured for weighting the plurality of animations according to the user ranking, thereby adjusting the selection of the animation.

Optionally, the recognition module configured for identifying an intensity of at least one of the at least one emotional expression, the scene-animating module is configured for applying the animation according to a respective intensity.

Optionally, the mobile communication terminal further comprises a caller identification module configured for identifying an identifier of the at least one other participant, the at least one virtual object is selected according to the identifier.

According to one aspect of the present invention there is provided a method for allowing a user to participate in a network-based communication session with at least one other participant. The method comprises: a) locally managing a scene having at least one virtual object for viewing by the user as a visual enhancement to the communication session, b) receiving a communication stream of the at least one other participant via a communication network, c) identifying at least one emotional expression in the communication stream, and d) locally animating the at least one virtual object according to the at least one identified emotional expression.

Optionally, the locally animating comprises blending a predefined animation associated with the identified emotional expression into a current animation of the at least one virtual object.

Optionally, the method further comprises automatically adjusting the animated scene according to the at least one emotional expression.

More optionally, the adjusting comprises adjusting a member of the group consisting of: a point of view (POV) of the animated scene, the illumination of the animated scene, and the rate of animating the scene.

Optionally, the method further comprises automatically adjusting the animated scene according to the local animation.

Optionally, the method further comprises adjusting the animation according to music identified in the communication stream.

Optionally, the method further comprises identifying at least one viseme of the at least one other participant in respective the communication stream; the locally animating comprises further animating the at least one virtual object according to the at least one viseme.

Optionally, the method further comprises receiving a plurality of communication streams from a plurality of participants and identifying respective at least one emotional expression from each the communication stream, wherein the managing comprises managing a scene having a plurality of virtual objects and the animating comprises animating each the virtual object according to the respective at least one emotional expression.

Optionally, the method further comprises receiving a plurality of communication streams from a plurality of participants in the communication session, each the virtual object is animated according to the at least one emotional expression from one of the plurality of communication streams.

Optionally, the animating comprises blending the local animation into a current animation of the at least one virtual object.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples provided herein are illustrative only and not intended to be limiting.

Implementation of the method and the apparatus of the present invention involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and the apparatus of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected steps of the method and the apparatus of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.

BRIEF DESCRIPTION OF THE DRAWINGS The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in order to provide what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how the several forms of the invention may be embodied in practice. In the drawings:

Fig. 1 is a schematic illustration of a mobile communication terminal, such as a cellular phone, for animating a scene having one or more virtual objects, such as avatars, during a call, according to a preferred embodiment of the present invention;

Fig. 2 is a schematic illustration of the mobile communication terminal of Fig. 1, according to a preferred embodiment of the present invention;

Fig. 3 is a schematic illustration of a method for converting the weighted vector to a set of animation instructions that defines how to animate each one of the body parts of the virtual object, according to one embodiment of the present invention;

Fig. 4 is a schematic illustration of a state machine defining the sound modes of a decision manager module, according to one embodiment of the present invention;

Fig. 5 is a flowchart of a method for animating virtual objects according to identified emotional expressions and identified visemes, according to one embodiment of the present invention; and

Fig. 6 is a schematic illustration of a list of possible animations of the mouth of the virtual object, according to one embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS Some embodiments of the present invention comprise a mobile communication terminal, such as a cellular phone or a Smartphone, and a method for animating a scene having one or more virtual objects, such as avatars, during a communication session, such as an audio and/or video call. The animation is applied according to emotional expressions, such as anger, happiness, disgust, sadness, fear, and surprise, which are identified in a communication data stream that is received from one of the participants in the call.

The mobile communication terminal comprises a recognition module that receives a communication stream, optionally during the communication session, and identifies one or more emotional expression therein. Optionally, the recognition module identifies the emotional expressions by analyzing the voice, the body movements, the facial expressions, and/or the gestures of a communication session participant that is documented in the received communication stream.

The mobile communication terminal further comprises a scene-animating module that manages the scene and applies the aforementioned animation thereon. The animated scene is forwarded and/or rendered to a display unit, such as an integrated screen that presents the animated scene to a user.

Optionally, the scene-animating module comprises a directing sub-module that automatically adjusts the presentation of the animated scene according to one or more of the identified emotional expressions. The adjusting the presentation of the scene according to the identified emotional expressions allows the scene-animating module to display the animated scene or a section thereof in a manner that emphasizes the identified emotional expressions. The presentation adjustment may be understood as adjusting the point of view of the animated scene, adjusting the illumination of the

animated scene, and adjusting the resolution and/or adjusting the rate of animating the scene.

Each one of the virtual objects is optionally a full body, emotionally driven, lip synchronized 3D animated avatar. The animation is optionally applied separately to each one of the body parts of each avatar. In such a manner, the animation is not bounded to a predefined sequence and may vary according to the intensity of one or more of the identified emotional expressions, as further described below.

Optionally, the communication stream includes audio and/or video streams that have been recorded as messages to the user, optionally in an internal and/or a remote messagebank, such as an answering machine module or a voicemail service.

The principles and operation of an apparatus and method according to the present invention may be better understood with reference to the drawings and accompanying description.

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways. In addition, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.

Reference is now made to Fig. 1, which is a schematic illustration of a mobile communication terminal 1, such as a cellular phone, for animating a scene having one or more virtual objects, such as avatars, according to one embodiment of the present invention. The virtual objects are animated according to emotional expressions, which are locally identified in a communication stream, such as an audio, a video, and/or an audiovisual stream that has been received via a communication interface 6, as described below. The communication stream is received via a wireless communication network 5, such as a cellular network. The communication stream is received from a participant 7 in a communication session that is held with the user 10 of the mobile communication terminal 1, such as an audio call, a video call, a conference call, video conference call, etc.

The mobile communication terminal 1 comprises an emotion recognition module 2, which may be referred to as an emotion recognition module 2, a scene-

animating module 3, and an output module 6 that is optionally connected to an integrated display of the mobile communication terminal 1. The emotion recognition module 2 receives the communication stream, optionally during an audio call, a video call, and/or an audio video call and identifies and/or estimates one or more emotional expression therein. Optionally, the emotion recognition module 2 generates a vector, optionally weighted, that represents the identified emotional expressions and optionally the mouth-movements of the communication session participant 7. This process is performed immediately or substantially immediately after the receiving of the communication stream. The emotion recognition module 2 forwards the vector to the scene-animating module 3.

Optionally, the emotion recognition module 2 comprises a voice analysis module that is designed to segment the audio stream of the communication stream into frames and to analyze them, as further described below.

Optionally, the emotion recognition module 2 uses the Mel-frequency Cepstrum coefficients (MFCC) extraction technique for capturing the speech from the audio stream. Optionally, the emotion recognition module 2 uses the method for emotion recognition that is presented in O- W. Kwon et. al, Emotion Recognition by Speech Signals, Eurospeech 2003, pages 125 - 128, Sept 2003, which is incorporated herein by reference. Optionally, the emotion recognition module 2 comprises a video analysis module that is designed to segment a face section from an image in a video stream in the communication stream and to analyze it. Optionally, the analysis of the face section is based on a method for recognizing emotions in a video stream that is disclosed in Ira Cohen, et. al., "Emotion Recognition using a Cauchy Naive Bayes Classifier" International conference on Pattern Recognition (ICPR) 2000], which incorporated herein by reference.

The scene-animating module 3 manages an animated scene with the one or more virtual objects, according to the identified emotional expressions. An example of a simple form of an animated scene is a two dimensional (2D) image. A more complex form of an animated scene which may be used is a three dimensional (3D) scene. The managing of the animated scene includes the manipulation of the virtual objects, optionally with a geometrical transform that is based on translation, scaling,

orientation-change, optionally as described below, and/or any known animating method.

Optionally, the scene-animating module stores a motion library of downloadable animated motions of each body part of a virtual object. The scene- animating module 3 is designed to manipulate the video objects according to the identified emotional expressions, as described below, and optionally to store the last state of each one of the body parts of each one of the virtual objects until additional identified emotional expressions are received. Optionally, the last state allows the scene-animating module 3 to identify the current orientation and position of each body part and optionally whether it is currently animated or not.

Optionally, the writing of a data image or a sequence of data images that represents the animated scene is managed by the scene-animating module 3. The image data is forwarded to a display memory and displayed on the screen of the mobile communication terminal 1. The scene-animating module 3 manages the changing of the image data that is displayed on the screen. Optionally, the scene- animating module 3 comprises a display interface controller that repeatedly reads the image data from the display memory, optionally at regular short time intervals, and converts the read image data into red, green and blue video signals, which are then output to the screen. The mobile communication terminal 1, which is optionally a cellular phone or a WLAN phone, enhances the user experience of a user 10 that participates in a communication session without adding extra bandwidth to the communication between the participants of the call. The emotion recognition module 2 converts the communication stream, which includes representation of the voice and/or look of the communication session participant 7, to a vector of identified emotional expressions. The vector is forwarded to the scene-animating module 3 that animates the virtual object to express the identified emotional expressions in the vector. The animation is optionally presented on the screen of the mobile communication terminal 1, yielding a vivid communication session that allows the user 10 to receive visual messages, which are indicative of emotional expressions of the communication session participant 7.

Optionally, the scene-animating module 3 is designed to animate a number of virtual objects simultaneously. Optionally, all the virtual objects are animated

U according to the same communication stream. Optionally, each virtual object is animated according to different communication streams, which are received from different participants of a communication session. In such an embodiment, the scene- animating module 3 may be used to animate the scene according to a conference call. Each virtual object is animated according to a communication stream of one of the participants of the conference call. Optionally, if separate communication streams are not available, the voices of the speakers are algorithmically separated, for example as described in Alvin Martin, et al., Speaker Recognition in a Multi-Speaker Environment, Eurospeech, 2001, Scandinavia Volume #2, Pages 787-790, which is incorporated herein by reference.

Reference is now made to Fig. 2, which is a schematic illustration of the mobile communication terminal 1, according to one embodiment of the present invention. The recognition module 7, the communication interface 6, the network 5, the session participant 7 and the user 10 are as depicted in Fig. 1. However, in Fig. 2 the mobile communication terminal 1 further comprises a viseme detection module 11 that is designed for detecting mouth movements, as further described below.

Optionally, the viseme detection module 11 comprises a voice analysis module that is designed to segment the audio stream of the communication stream into frames, such that each fragment is generated by a single viseme, as further described below.

Optionally, the viseme detection module 11 uses the Mel-frequency Cepstrum coefficients (MFCC) extraction technique for capturing the speech from the audio stream.

The scene animation module 3 comprises a decision manager sub-module 9, which may be referred to as a decision manager 9, and a directing sub-module 4 that is designed to adjust the presentation of the animated scene according to the identified emotional expressions.

Optionally, the directing sub-module 4 is designed to adjust the point of view (POV) of the animated scene, the illumination of the scene, and/or the animation rate of the scene, according to the identified emotional expressions and/or according to the animation of the one or more virtual objects, as further described below.

The scene-animating module 3 receives the identified emotional expressions, optionally as a weighted vector, and applies sequential animation to the at least one

virtual object, according to the identified emotional expressions, as further described below. As described above, the scene-animating module 3 manages a scene having one or more virtual objects, such as avatars. For example, the scene-animating module 3 may manage a scene that includes a background mesh such as animated scenery of a forest or a street and a three-dimensional model of a vertebrate such as a human character or an animal cartoon character that has a plurality of body parts, such as Mickey mouse™ or Winnie the Poo™.

Optionally, in order to animate such an avatar, an animation such as skeletal animation, which may be referred to as rigging, is used. In such an embodiment, each virtual object is represented in two logical layers: a surface representation that is used to draw the avatar and a set of body parts that is used for animation only, and may be referred to as a skeleton. Each body part has a 3D transformation that includes its position, a scale, and an orientation, respectively in relation to a position, a scale, and axes of the scene surrounding that is a 3D scene mesh. Such an embodiment allows the animation of each one of the body parts separately thereby extends the range of possible animations of the avatar and usually allows continuous animation of the avatar, as further described below. In such an embodiment, the same set of body parts may be used to animate different surface representations, each related to a different avatar. Optionally, each body part has a parent body part. In such an embodiment, the body parts are formed in a hierarchy and the full transform of a child body part is a product of the related body part parent transform and its own transform. For example, moving a body part that simulates a thighbone moves a body part that simulates a lower leg. Optionally, the virtual object is divided into two or more of the following body parts: a head, a mouth, shoulders, left and right arms, left and right eyes, a spine, buttocks, left and right legs, a left and right wings, a feeler, and/or a tail.

As the animated scene and the surface representations of the virtual objects are stored locally in the memory of the mobile communication terminal 1 and as the emotion recognition module 2 and optionally the viseme detection module 11 are designed to identify the emotional expressions in a standard communication stream, the animation does not require more bandwidth than a standard communication session.

The communication stream may be received via a cellular connection established by a communication interface 6 such as a cellular transceiver and modem which are hosted in the mobile communication terminal 1 or via another wireless connection that is established by another communication interface, such as a wireless local area network (WLAN) interface and/or a wireless personal area network (WPAN) interface. The WLAN interface is optionally a radio transceiver that uses high frequency radio signals, which are defined according to a WLAN standard, such as 802.11a, 802.11b, 802.1 Ig, and 802.1 In standards, which are herein incorporated by reference. The WLAN interface optionally uses high frequency radio signals. In such an embodiment, the wireless connection is optionally established according to WiMAX™ IEEE 802.16 standard or wireless Fire Wire IEEE 802.15.3 standard, which are incorporated herein by reference. The WPAN interface comprises a short-range radio interface, such as a Bluetooth™ transceiver, which is defined according to IEEE 802.15.1 specification that is incorporated herein by reference, optionally utilizing a Bluetooth™ enhanced data rate (EDR) chip that is defined according to a Bluetooth™ core specification version 2.0 + EDR of the Bluetooth™ special interest group (SIG) 3 which is incorporated herein by reference, or a Wibree® transceiver. Optionally, the WPAN interface comprises a radio transceiver that uses ultra-wideband (UWB) frequencies. In such an embodiment, the wireless interface may be established according to the WiMedia™ specification or according to the Wireless USB (WUSB) specification, which are incorporated herein by reference.

In order to provide the user 10 of the mobile communication terminal 1 with a realistic experience, the movements of the virtual objects in the scene are sequentially animated. Optionally, the scene-animating module 3 uses the decision manager 9 that converts the identified emotional expressions in the weighted vector to a set of instructions that defines how to animate the virtual object in a manner that does not discontinue the animation that is described in the last status.

The decision manager 9 receives the identified emotional expressions, optionally in a weighted vector of emotion values; each value defines the identified intensity of one emotion. Optionally, the weighted vector defines at least the six basic human emotional expressions: anger, happiness, disgust, sadness, fear, and surprise. Optionally, the intensity of each identified emotion is defined between zero and one,

where zero denotes that no signs of the emotion have been identified in the related communication stream and one denotes that strong signs of the emotion have been identified in the related communication stream.

The decision manager 9 further receives the last status of the animation and calculates animation instructions to generate a sequential animation to the virtual object, according to the identified emotional expressions, as further described below. As described above, skeletal animation is used for animating the virtual object. As such, the last state of the animation is represented as a vector of values; each defines, at least, the current orientation of one a respective body part in relation to the 3D scene mesh.

Optionally, the 3D scene mesh is defined in a file that contains modeling data, optionally converted from a Maya™ file format. The file defines the vertexes of the mesh and one or more attributes that characterized each vertex, such as normalizing values, 2D and/or 3D modeling values, for example UV coordinates, one or more color values, and/or joint references. The file further defines the skeleton structure of one or more virtual objects and/or the surface representation thereof. The skeleton structure defines an initial position and orientation for each one of the body parts, which may be understood as joints. The joints are animated using curves per each degree of freedom. Optionally, each joint has nine degrees of freedom (DOF). Three DOF are for positioning and/or translating, three DOF are for orientating and/or rotating and three DOF for are scaling.

Reference is now jointly made Fig. 2 and to Fig. 3, which is a schematic illustration of a method, optionally implemented by the decision manager 9, for converting the weighted vector to a set of animation instructions that defines how to animate each one of the body parts of the virtual object, according to one embodiment of the present invention.

As described above, the decision manager 9 receives a set of identified emotional expressions, as shown at 100. Then, as shown at 101, the decision manager 9 acquires the last status of each body part of one of the virtual objects in the scene. As shown at 114, this process may be repeated for each virtual object. As described above, for each virtual object, the last status is represented by a vector with values that defines the current orientation of each one of the body parts of the virtual object. In the following step, as shown at 102, the decision manager 9 defines the order in

which body parts should be animated. Optionally, the order is determined randomly in order to increase the number of possible animation sequences an animated virtual object. Optionally, the order does not take into account body parts which are currently being animated by the graphic engine and cannot be break. Optionally, these body part animations last without interference.

In the following step, as shown at 103, the decision manager 9 defines a sound state. Reference is now made jointly to Fig. 2 and to Fig. 4, which is a schematic illustration of a state machine defining the sound modes of the decision manager 9, according to one embodiment of the present invention. As described above, the decision manager 9 is designed to convert a set of identified emotional expressions to a set of animation instructions. Optionally, the decision manager 9 is designed to animate the virtual object in a manner that simulates a character that dances to the sound of music. In such an embodiment, the decision manager 9 is designed to intercept at least a portion of an audio stream from the aforementioned communication stream, an audio stream that is currently played by a music player of the hosting mobile communication device 1, or an audio stream that is locally added to the animation. Optionally, the animation that applies to each one of the body parts is determined according to the sound mode of the decision manager 9. Optionally, the sound mode is determined automatically according to the state machine that is depicted in Fig. 4. For example, if no voice or no music is applied, the decision manager 9 is in an idle mode 300. However, if voice is intercepted, the decision manager 9 switches to voice mode 301. If music is identified or added, the decision manager 9 switches to music and voice mode 302. If the interception of voice stops, the mode switches to music mode 303. Reference is now made jointly, once again, to Fig. 2 and Fig. 3. After the sound mode of the decision manager 9 has been defined, as shown at 103, the decision manager 9 verifies whether to animate the virtual object sequentially or to break the sequence of animations. The break may be scheduled upon request of the operating system of the mobile communication terminal 1, upon the identification of predefined expressions in the communication stream, or the identification of predefined expressions above and/or below a predefined value. Such identification cuts off the animation and allows the scene-animating module to generate instructions that define how to animate a virtual object in a manner that simulates a character that

performs a special effect, such as an unexpected movement, transformation, or the like. For example, if the decision manager 9 identifies emotional expressions of happiness above a certain level, the virtual object is animated to perform a special effect such as a rolling, a blast, or a transformation, optionally modulated, to an animation of a large smiley emoticon.

Optionally, the decision manager 9 decides whether to break the animation of a virtual object according to one or more of the following assessments:

• Are the body parts of the virtual object currently animated or not;

• What is the intensity of one or more of the emotional expressions; • What is the tempo, which is measured in beats per minute (BPM), of the emotional expressions in the received communication stream;

• What is the ability of the graphic engine to break the current animation of the virtual object;

• What is the sound mode of the decision manager 9; and • What are the user preferences and/or requests.

Optionally, the decision whether to break the animation is based on a random calculation. If the decision manager 9 decides to break the sequence, the decision manager 9 switches to selecting a POV for the scene, as described below. However, if the decision manager 9 decides not to break the sequence, the decision manager 9 starts to generate an animation cluster for the graphic engine that includes a set of animation instructions that defines how to animate the body parts of the virtual object.

As described above, the decision manager 9 is optionally designed to generate animation instructions in a manner that does not break off the continuity of the animation of the virtual objects. In such an embodiment, the decision manager 9 implements 104-108. The decision manager 9 selects animation instructions for the first body part in the order that has been defined at 102. Each set of animation instructions includes a new animation status that defines a new orientation for one or more body parts and optionally a course between the current orientation and the new orientation through which the related body part passes during the related animation. Optionally, a list of possible animation instructions is defined in the motion library for each one of the body parts. The list defines the association between tags, which may be referred to as animation tags that define sets of animation instructions, each may be referred to as available body part animations and values that define possible values of

identified emotional expressions. The list defines tags, each associated with a set of instructions for animating a organ and/or a sub-organ of the virtual object and with intensity or a range of intensities of one or more emotional expressions which are defined as sufficient to evoke an animation of the organ and/or the sub-organ, according to the associated set of instructions for animating.

Reference is now made to 104-107, which performed, iteratively or recursively, for each one of the body parts of the currently processed virtual object. Optionally, as shown at 104, the decision manager 9 initially filters irrelevant available body part animations. The decision manager 9 verifies which available body part animations are relevant for animating the body part, optionally according to the identified emotional expressions and their intensity. Optionally, the decision manager 9 filters out the available body part animations, which are not associated with one or more emotional expressions that have been identified with the strongest intensities. For example, if the highest emotional expression, which is identified in the weighted vector, is disgust and if it has been identified with a certain intensity, one or more available body part animations for animating the currently processed body part, which are associated with the certain range of disgust, are selected. The one or more available body part animations that have not been filtered out may be referred to as potential body part animations. Optionally, if no emotional expression has been identified in intensity above a certain threshold or if no potential body part animations have been identified, available body part animations, which are tagged as neutral body part animations, are selected. In such a manner, the virtual object may be animated even if now expression of emotion has been identified in the communication stream. Optionally, records of the animation lists of one or more of the body parts are divided according to the sound mode of the decision manager 9. In such an embodiment, different instructions to the same identified emotion expression may be chosen when the decision manager is in a different sound mode. In such a manner, the scene-animating module 3 may generate animation instructions that animate the virtual object as it dances to the sound of music and/or as it dubs the words of a simultaneously played song.

Optionally, the decision manager 9 verifies which potential body part animations do not valuate a set of predefined rules that define one or more limitations.

This verification is performed in relation to the last state of the virtual object. The set of rules defines one or more of the following limitations:

• Animation area boundaries — defines the boundaries of an area in the mesh that defines the limitations of the position of each one of the body parts of the virtual objects. Optionally, the animation area boundaries are dynamic boundaries and adapted to the last state. For example, is the virtual object is an avatar and the last state defines a bending avatar, the animation area boundaries of the limits the movement of the hand to the lower area of the mesh.

• Scene area boundaries — optionally, the virtual object can change its relative position in relation to the rest of the animated scene. In such an embodiment, the animated scene area boundaries define the area in which the virtual object may be position. Such a rule is used to restrict the movement of the body parts to areas which are depicted in the animated scene and may be presented to the user.

• Delay belts - optionally, one or more body parts may animated in variable rates. Optionally, the mesh is divided to a number of animation rate areas, each define a rate in which the body parts of the animation should be animated. Optionally, the mesh is divided to belts; some of them define areas in which the movement of the body part should be animated with a delay. Optionally, body parts that cannot be animated in the rate that is defined in their current area are filtered out. • Current POV - as described above, the animated scene can be portrayed to be seen as if it is captured from different POVs. Optionally, the decision manager 9 dynamically defines the boundaries of the possible POV according to the last state of the body part and the estimated state of the body part and filters out any animation that exceeds these boundaries. • Animation's global run priority - optionally, body parts may be animated according to different animations. Optionally, each animation has a global run priority that define it priority in relation to other animation, optionally in the light of the current last state. Optionally, the animation with the highest priority is selected. Optionally, if more than two animations have the highest priority, than a random algorithm is used for selecting one of them.

• A priority for blending the animation with other animations - as described above, body parts may be animated according to different animations. Optionally, each animation has one or more priorities for being animated with other animations of

other body parts, optionally as further described below with relation to blending.

In such an embodiment, selecting an animation is based on the priority for blending it with other animations, for example with animations which are currently animated in the last state of the virtual object. • Animation's amplitude - as described above, animation is based on communication streams of one or more speakers. Such communication streams may capture a user that talks in a rate and/or loudness, which may be fixed or variable. These parameters may have influence on the amplitude of the audio sequence. Optionally, the animation is adapted to the rate and/or loudness of the speech that is recorded in the audio sequence. In such a manner, different rates and/or loudness in may be associated with different emotions, such as anger or stress.

Now, after the irrelevant animations have been reduced, one or more potential body part animations, which meet the aforementioned rules and optionally can be applied without breaking off the continuity of the animation, are selected, as shown at 105.

Optionally, more than one possible animation may not be filtered during 104.

In such an embodiment, the decision manager 9 randomly selects one of the unfiltered potential body part animations. Optionally, each one of the animations in the animation list has a predefined weight. Optionally, the random select is based on the weight of the potential body part animations.

Optionally, the body parts of the virtual objects are divided to area clusters.

Each area cluster includes a number of body parts which may be animated together to form a unique body movement. In such an embodiment, the movement of the hands of the virtual object may be correlated to form, for example, an applause hand movement, the movement of the legs may be correlated to form a walking animation, and the hand may correlate to hold and/or to scratch any other body part. Optionally, if the decision manager 9 selects such a correlated animation for a certain body part from a certain area cluster it filters out the available body part animations of the other members of the certain area cluster and associate them with the correlated animation.

Optionally, the random select is based on the ability of the potential body part animations to blend, as further described below, with the last state of the related body part.

Optionally, each one of the animations in the animation list is associated with a certain animation velocity that defines the velocity of the animated movement. Optionally, the random select is based on the animation velocity. In such an embodiment, the selected animation has, if available, an animation velocity that is similar to the velocity of the current animation of the respective body part. In such a manner, the velocity of the animation of the virtual object stays relatively constant. Optionally, the random select is based on the current POV of the animated scene. In such a manner, the selected animation, if available, is directed toward the user.

Now, as shown at 106, the decision manager 9 decides whether to instruct the graphic engine to apply the selected potential body part animations iteratively or recursively. Optionally, the decision manager 9 determines the number of iterations, optionally as a function of the intensity of the related identified emotional expression.

Optionally, as shown at 107, the decision manager 9 determines whether the animation of the body part should be blended out or not. Optionally, each blended animation receives a weight of influence value. This value is used to determine the influence of the blended animation on the final movement model. Now, after a set of instructions of a certain potential body part animation has been selected and optionally edited to adjust repetitions and blending, other body parts are selected and edited in the aforementioned order. As shown at 108, if the body part is the last body part in the aforementioned order, the decision manager 9 switches to another virtual object, if available, as shown at 109, and implements 101-108 thereon. However, if the body part is not the last body part the aforementioned order, the decision manager 9 adds the edited set of instructions, selects the following body part in the aforementioned order, and repeats 104-108 for each one of the following body parts. Such a process is performed iteratively or recursively until the animation cluster contains sets of animation instructions for all the body parts of the virtual object.

After the aforementioned animation cluster is generated, the presentation of the animated scene is adjusted. Optionally, as shown at 111, a POV is selected for the animated scene, optionally according to the body part animations in the animation cluster or according to the weight that is associated with emotional expressions in the weighted vector. As described above, the scene-animating module 3 comprises a directing sub-module 4 that adjusts the POV of the animated scene, according to the

identified emotional expressions. Optionally, the directing sub-module 4 is designed to allow the scene-animating module 3 to portray the animated scene from a number of POVs, such as extreme close-up shot, close-up shot, medium shot medium-long shot, long shot, and extreme long shot. Optionally, the directing sub-module 4 is designed to allow the scene-animating module 3 to portray the animated scene from a dynamic POV, such as a vertical pan, a horizontal pan, zoom in, zoom out, etc.

Optionally, the scene-animating module 3 hosts a directing list of different POVs, each associated with one or more human emotional expressions, optionally each in a certain range of intensities. Optionally, one of the POVs is selected according to a match between a value of the directing list and the weighted vector of the identified emotional expressions. Optionally, the POV is selected according to the orientation of the virtual objects and according to the future orientation of the virtual objects that may be induced from the values in the animation clusters. In such a manner, the animated scene is defined to portray all or most of the animated body parts. Optionally, the POV is selected to emphasis a certain body part that is animated according to an emotional expression that has been captured with relatively high intensity. For example, if the set of instructions in the animation cluster includes a significant animation to the face of the virtual object, the decision manager 9 selects a close up shot that focuses one the face of the virtual object. However, if the virtual object is about to perform a unique action or a series of movements in a number of different body parts, the decision manager 9 selects a long shot POV.

Optionally, the switch from one POV to another is performed to simulate a smooth movement of a virtual camera that captures the animated scene. Optionally, the switch from one POV to another is performed as a cut between two animated scenes.

Optionally, the directing list defines a number of different illumination settings, each associated with one or more human emotional expressions, optionally each in a certain range of intensities. Optionally, one of the illumination settings of the animation scene is selected according to a match between a value of the directing list and the weighted vector of the identified emotional expressions.

Optionally, the directing list defines a number of different animation speeds, each associated with one or more human emotional expressions, optionally each in a certain range of intensities. Optionally, each one of the animation speeds is selected

according to a match between a value of the directing list and the weighted vector of the identified emotional expressions. In such an embodiment, the graphic engine generates the animated scene according to the speed of the selected animation.

Optionally, in order to decrease the memory space that is needed for rendering the animated scene, a number of surface representations of the virtual objects and/or . surrounding scenes with different resolutions are stored in the repository of the mobile communication terminal. In such an embodiment, different versions of the scene and/or the virtual object may be selected according to the selected POV. For example, low-resolution versions may be selected when the POV is a long shot and high- resolution versions may be selected when the POV is a close-up shot.

It should be noted that the chosen POV has to consider the coordinates of the one or more virtual objects in the animated scene. In order to allow the virtual object to appear on the screen, its coordinates has to comply with the POV. In order to display the virtual object properly, the scene-animating module 3 is able to position the animated virtual object substantially in the center of the screen, optionally by matching between coordinates of the virtual object and coordinates of the virtual camera that virtually captures the animated scene. Optionally, the selected, optionally adjusted, POV is added to the animation cluster as a set of instructions to the graphic engine. As shown at 112, after the POV has been selected and optionally adjusted to the animation of the virtual object, the orientation of the body parts of the virtual objects, such as the orientation of the head of the virtual object, is correlated with the POV. Optionally, the head of the virtual object is directed toward the virtual camera that takes the animated scene. In such an embodiment, the animated scene portrays virtual objects which are focused on the virtual camera. Such an animated scene provides the user with a feeling that the virtual objects are addressing her. Optionally, the orientation of the body parts is added to the animation cluster as a set of instructions to the graphic engine.

Optionally, as shown at 113, the animation cluster with the selected POV and body parts orientation is forwarded to and/or accessed by the graphic engine. The graphic engine animates the scene according to the animation cluster with the selected POV

As described above, the same set of body parts may be used to animate different surface representations, each related to a different virtual object, such as a different avatar. Optionally, the graphical engine uses the animation cluster to animate a virtual object which is dynamically selected according to participant which is associated with the communication stream that has been used for animation the virtual object.

As commonly known, mobile communication terminals 1, such as cellular phones modules, uses modules, such as automatic number identification (ANI) modules, for analyzing the incoming calls or messages, thereby identifying the network user ID or the number of the session participant 7.

Optionally, the graphic engine associates or allows the user to associate one or more avatars with each one of the contacts which are present on the users contact list. In such a manner, the graphic engine may be able to present a different animation for different callers, according to the avatars which are associated with a related record in the contact list of the mobile communication terminal 1. For example, the user may associate a virtual character of a dog with a user ID of a certain caller and a virtual character of a pig with a user ID of another. Optionally, the surface representation is graphically portrayed according to a default avatar or random avatar whenever an incoming call from a contact, which is not associated with the avatars which are stored in the repository of the mobile communication terminal, is received, for example, whenever an unidentified incoming call is received.

It should be noted that this process is repeated, iteratively or recursively, as long as new communication streams are received and analyzed by the recognition module. Reference is now made, once again, to Fig. 1. As described above, the cluster of the animation instructions is forwarded to a graphic engine that is designed to generate and render an animated scene accordingly. As described above, in order to maintain the continuity of the animation of the virtual objects, the graphic engine is designed to animate the virtual objects in a sequential manner. Optionally, for every body part, the graphic engine blends a new animation that is generated according to the animation cluster into the last animation that is currently rendered into the screen of the mobile communication device. In such an

embodiment, the graphic engine adds the animation on the previous animation. In blending, two or more animation are blended to form a unified complex animation. Optionally, in order to allow the blending, each joint of the object is defined with nine degrees of freedom. Three degrees are defined for scaling, three are defined for translating, and three are defined for rotating.

For example, an animation that emulates a movement of the arm may be blended with an animation that emulates a movement of the palm and with an animation that emulates a movement of the palm's joint to form a unified complex animation of a certain gesture. Optionally, the graphic engine verifies that the blending is possible in the light of current blending and/or the current position of the animation. Such verification is important to avoid unreasonable animation such as simultaneously moving the palm in self-caressing movements while the arm is not directed toward the object's body.

Optionally, for every body part, the graphic engine replaces the current animation with the new animation. Optionally, the graphic engine delays the new animation until the current animation has been completed or just after a repetitive iteration of the last animation has been completed.

Optionally, the graphic engine, which may be hosted in the output module 8, forwards the animated scene to an integrated display unit such as a screen. Reference is now made to Fig. 5, which is a flowchart of a method for animating virtual objects according to identified emotional expressions, according to one embodiment of the present invention. As described above, the virtual object may be an avatar, such as a human character, a virtual character, and/or an animal cartoon character. In such an embodiment, the avatar may be animated to look as if it announces the audio stream that is included in the communication stream. Fig. 5 depicts actions 100-113, which are depicted in Fig. 3 and a set of additional actions, which are implemented in order to animate facial expressions of the avatar to look as if the avatar announces the audio stream that is included in the communication stream.

In an embodiment that implements the depicted method, after the identified emotional expressions are received, as shown at 100, a set of sequential visemes is identified by a viseme detection module 11, as shown at 200. Optionally, the emotion viseme detection module 11 uses a viseme classifier for identifying the visemes. Optionally, the viseme classifier uses neural networks (NN) and/or hidden Markov

model (HMM) techniques for identifying the visemes. By such computational techniques, the classifier deduces which visemes have been alongside the voicing of the audio stream. Optionally, the audio stream is divided into frames, for example using a sliding window algorithm. Then, the aforementioned MFCC extraction technique is used for capturing a number of speech features in the frames and forward them to a pre-trained HMM model that analyzes the time-series them and estimates which phoneme is most likely to produce the given audio signal. Now, a conversion table, such as a look-up table is used for mapping the phonemes to visemes.

A viseme may be understood as a basic unit of speech in the visual domain that corresponds to one or more phonemes in the acoustic domain. It describes particular facial and oral movements of the mouth of a human speaker that occur alongside the voicing of phonemes. Optionally, each member of the set of sequential visemes is selected to reflect one or more of the identified emotional expressions.

Optionally, a viseme list of possible animations instructions, which may be referred to as a viseme list, is defined for animating a number of possible visemes, for example as shown at Fig. 6, which is a schematic illustration of a list 151 of possible animations of the mouth of the virtual object, according to one embodiment of the present invention. The list 151, which may be referred to as a viseme list, defines the association between tags that define sets of animation instructions and values that define possible values of identified emotional expressions. The exemplary viseme list, which is depicted in 151, defines tags 153; each tag is associated with a set of instructions for animating the mouth of the virtual object 202. The set of instructions is designed to instruct the graphic engine to animate the mouth of the virtual object to look as if it announces the voice stream in the communication stream. Each tag is further associated with an intensity or a range of intensities of one or more emotional expressions 155. Optionally, one or more of visemes tags are selected according to a match between values in the viseme list and values in the weighted vector of the identified emotional expressions. The sets of instructions, which are identified with the selected tags, are adapted to be blended with the last state of the mouth of the related virtual object. The blending is performed in a similar manner to the aforementioned blending that is related to body parts animation. Optionally, the aforementioned graphic engine uses a collusion prevention mechanism that verifies that the animation of the visemes and the animation of the mouth may not contradict.

Optionally, the visemes tags are selected according to the animation cluster in order to present an animation of all or most of the animated body parts and/or to emphasis a certain body part that is animated according to an emotional expression that has been captured with relatively high intensity. Optionally, the mobile communication terminal 1 further includes a user interface (UI). Optionally, the UI allows the user to control the presentation of the animated scene. For example, the UI may be used to allow the user to adjust the POV, the illumination, the background, the music, and/or the animation rate of the animated scene. For example, the user may use the keypad of the mobile communication terminal 1 or any other UI for inputting control instructions that allow the user to change the current POV of the animated scene. Optionally, the animating scene module comprises a modeler that manages a number of models of the scene, each define a different POV.

Reference is now made, once again to Fig. 1. As described above, the animation of the scene is based on a weighted vector that is generated by the emotion recognition module 2 and optionally by the viseme detection module 11. Optionally, the emotion recognition module 2 and/or the viseme detection module 11 host a learning module that allows the user of the mobile communication terminal 1 to be involved in the scaling of the weighted vector. The weighted vector defines a number of intensities of emotional expressions. The emotional expressions are captured from the communication stream that documents a human user, such as a call participant. As commonly known, similar emotions are expressed differently by members of different societies. Therefore, the intensity of a certain emotion and/or the expression thereof in the voice and/or the appearance of the session participant 7 may depend on her culture and on her social relation. Optionally, the learning module allows the user 10 to define an accuracy weight for each emotional expression that has been identified by the emotion recognition module 2. The accuracy weight is associated with the emotional expression by the emotion recognition module 2 and allows the simple user to teach the emotion recognition module 2 which emotional expressions have been identified correctly and which have not. Optionally, the emotion recognition module 2 applies a classification process that generates and/or updates a statistical model. The statistical model maps the intensity of a number of emotions and/or characteristics of the expression thereof. Optionally, the emotion recognition module 2 includes a user

interface that allows the user to configure and/or to update the model. Optionally, the emotion recognition module 2 allows an on-line adaptation of the model by updating the model according to the user instructions and/or reactions to expressions of emotion and/or the intensities of these expressions. It is expected that during the life of this patent many relevant devices and systems will be developed and the scope of the terms herein, particularly of the terms a decision manager, a network, a CPU, a WLAN, a WPAN, and a display are intended to include all such new technologies a priori.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents, and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.