Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TIME-ORDERED TEMPLATES FOR TEXT-TO-ANIMATION SYSTEM
Document Type and Number:
WIPO Patent Application WO/2008/148211
Kind Code:
A1
Abstract:
There is described a method for converting an input text into an input for an animation generator, the method comprising: receiving the input text; extracting from the text a first set of data representing information related to actions identified in the input text and completing a semantically annotated action template using the first set of data; extracting from the input text a second set of data representing information related to a description of every participant involved in the actions and completing a semantically annotated description template using the second set of data; and transmitting the semantically annotated action template and the semantically annotated description template to the animation generator.

Inventors:
BHERER HANS (CA)
Application Number:
PCT/CA2008/001088
Publication Date:
December 11, 2008
Filing Date:
June 06, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
XTRANORMAL TECHNOLOGIE INC (CA)
BHERER HANS (CA)
International Classes:
G06F17/27; G06F19/00; G06T13/00; G06T15/70
Domestic Patent References:
WO2002099627A12002-12-12
Attorney, Agent or Firm:
OGILVY RENAULT LLP (1981 McGill College AvenueMontreal, Québec H3A 2Y3, CA)
Download PDF:
Claims:
I /WE CLAIM :

1. A method for converting an input text into an input for an animation generator, the method comprising: receiving said input text; extracting from said text a first set of data representing information related to actions identified in said input text and completing a semantically annotated action template using at least said first set of data,- extracting from said input text a second set of data representing information related to a description of every participant involved in the actions and completing a semantically annotated description template using at least said second set of data; and transmitting said semantically annotated action template and said semantically annotated description template to said animation generator.

2. A method as claimed in claim 1, wherein said extracting from said text said first set of data comprises extracting: information related to an action predicate; semantic information for the action predicate; temporal information extracted from fluents/events; and inferred information for commonsense reasoning.

3. A method as claimed in claims 1 or 2, wherein said extracting from said text said second set of data comprises extracting : participant information related to a participant in the action;

spatial information related to the position of the participant in a scene,- dynamic information that influences at least one of an emotional state, a physical state, and a behavior of the participant; and a link binding the participant to said action predicate.

4. The method as claimed in claim 2, wherein said information related to an action predicate comprises at least one of event/fluent information related to the action predicate and precondition/post-condition information using a semantic structure .

5. The method as claimed in claim 2, wherein said semantic information for the action predicate comprises at least one of a semantic structure for the action predicate and modulator information.

6. The method as claimed in claim 2, wherein said temporal information comprises at least one of a temporal link between the action predicate and another action predicate, and animation termination information.

7. The method as claimed in any one of claims 1 to 6 , further comprising receiving user information and using said user information to complete at least one of said semantically annotated action template and said semantically annotated description template.

8. The method as claimed in claim 2, wherein said inferred information comprises information related to interactivity of

coexisting events.

9. The method as claimed in claim 3, wherein said participant information comprises information related to an animate/inanimate state of the participant, and at least one of a semantic role, an emotional state and an animation channel of the participant.

10. The method as claimed in claim 3, wherein said spatial information comprises at least one of information related to a spatial position of the participant and information related to spatial constraints applied to the participant.

11. The method as claimed in any one of claims 1 to 10, further comprising: determining a plurality of actions from said input text; extracting said first set of data for each one of said actions ; completing said semantically annotated action template for each one of said actions; extracting said second set of data for each participant related to each one of said actions; completing said at least one semantically annotated description template for said each participant; linking each one of said at least one semantically annotated description template to a corresponding semantically annotated action template; and generating a conceptual structure by temporally ordering completed semantically annotated action templates.

12. The method as claimed in claim 1, wherein said extracting from said text a first set of data and said extracting from said text a second set of data are performed using information comprised in at least one of a conceptual background database storing semantic information, a three- dimensional mapping database storing definitions of action predicates and related parameters, and a predicate interaction database storing commonsense knowledge for said action predicates.

13. The method as claimed in claim 1, wherein said completing said semantically annotated action template and said completing said semantically annotated description template are performed using information comprised in at least one of a conceptual background database storing semantic information, a three-dimensional mapping database storing definitions of action predicates and related parameters, and a predicate interaction database storing commonsense knowledge for said action predicates.

14. A system for converting an input text into an input for an animation generator, the method comprising: a natural language processing module receiving said input text and outputting a semantic structure; a conceptual background database storing semantic information; a predicate interaction database storing commonsense knowledge for action predicates; a three-dimensional mapping database storing definitions of action predicates and related parameters; and a template generator adapted to receive said semantic

structure from said natural language processing module and automatically complete a first template representing information related to actions and a second template representing information related to a description of every participant involved in the actions using information contained in said databases, and transmit said first template and said second template to said animation generator.

15. The system as claimed in claim 14, wherein said template generator is adapted to generate a conceptual structure by temporally ordering a plurality of first templates, each one of said first templates corresponding to one of said actions identified from said input text.

16. The system as claimed in claim 14, wherein the natural language processing module is adapted to perform at least the steps of tokenization, tagging, parsing, and semantic role labeling.

17. The system as claimed in claim 14, wherein said first template comprises a first field for information related to an action predicate, a second field for semantic information for the action predicate, a third field for temporal information extracted from fluents/events, and a fourth field for inferred information for commonsense reasoning.

18. The system as claimed in claim 14, wherein said second template comprises a first field for participant information related to a participant in a corresponding action, a second field for spatial information related to the position of the participant in a scene, a third field related to dynamic

information that influences at least one of an emotional state, a physical state, and a behavior of the participant, and a fourth field related to a link binding the participant to said corresponding action.

19. The system as claimed in claim 15, wherein said conceptual structure comprises a graph temporally ordering said plurality of first templates.

20. The system as claimed in claim 14, wherein said natural language processing module is further adapted to generate a syntactic structure of said text.

21. The system as claimed in claim 20, wherein the template generator is further adapted to use said syntactic structure to complete at least one of said first template and said second template .

22. A method for representing information extracted from a text to be used to create an animation, the method comprising : completing a semantically annotated action template by providing: information related to an action predicate; semantic information for the action predicate; temporal information extracted from fluents/events; and inferred information for commonsense reasoning; completing at least one semantically annotated description template by providing: participant information related to a participant in

the action; spatial information related to the position of the participant in a scene; dynamic information that influences at least one of an emotional state, a physical state, and a behavior of the participant ; and a link binding the participant to an action; wherein said templates encompass all needed syntactic and semantic parameters to be used for said animation.

23. A system for analyzing natural language text describing actions and creating ordered action structures to be used in creating animation, the system comprising: a natural language processing module receiving said text as input and outputting a semantic structure; a conceptual background database storing semantic information; a predicate interaction database storing commonsense knowledge for action predicates; a three-dimensional mapping database storing definitions of action predicates and related parameters; and an ordered action structure generator for generating a conceptual structure and automatically completing a semantically annotated action template representing information related to actions and a semantically annotated description template representing information related to a description of every participant involved in the actions, wherein said templates are completed using said semantic structure and information contained in said databases.

Description:

TIME-ORDERED TEMPLATES FOR TEXT-TO-ANIMATION SYSTEM

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority under 35 USC§ 119 (e) of Provisional Patent Application bearing serial number 60/924,945, filed on June 6, 2007, the contents of which are hereby incorporated by reference .

TECHNICAL FIELD

The present invention relates to the field of text-to- animation systems, and more specifically, to the extraction of temporal, semantic, and pragmatic information from a text in order to generate animation sequences.

BACKGROUND OF THE INVENTION

Animation is created for a wide variety of applications, not only because of the rapid evolution of usable tools, but also because of the availability of the expertise to apply them. Part of this comes from the capability of animation systems, such as Text-To-Scene (TTS) , Text-To-Animation (TTA) , or Text-To-Movie (TTM) systems , to understand the natural language of action concept .

Some early systems have attempted representation using only static images. Some systems try to identify key concepts or words in a text and then employ images to represent them, without however providing a coherent sequence that correctly captures the meaning of text. Other systems providing animation translation of text are built from animation sequences edited together, thus requiring an extensive static database of mini-animation clips, which is of limited use.

There exists therefore a need for a system allowing the analysis of natural language texts and rendering it in a form that can be automatically animated.

SUMMARY OF THE INVENTION In accordance with a first broad aspect of the present invention, there is provided a method for converting an input text into an input for an animation generator, the method comprising: receiving the input text; extracting from the text a first set of data representing information related to actions identified in the input text and completing a semantically annotated action template using at least the first set of data; extracting from the input text a second set of data representing information related to a description of every participant involved in the actions and completing a semantically annotated description template using at least the second set of data; and transmitting the semantically annotated action template and the semantically annotated description template to the animation generator.

In accordance with a second broad aspect of the present invention, there is provided a system for converting an input text into an input for an animation generator, the method comprising: a natural language processing module receiving the input text and outputting a semantic structure; a conceptual background database storing semantic information; a predicate interaction database storing commonsense knowledge for action predicates; a three-dimensional mapping database storing definitions of action predicates and related parameters; and a template generator adapted to receive the semantic structure from the natural language processing

module and automatically complete a first template representing information related to actions and a second template representing information related to a description of every participant involved in the actions using information contained in the databases, and transmit the first template and the second template to the animation generator.

In accordance with a third broad aspect of the present invention, there is provided a method for representing information extracted from a text to be used to create an animation, the method comprising: completing a semantically annotated action template by providing information related to an action predicate, semantic information for the action predicate, temporal information extracted from fluents/events, and inferred information for commonsense reasoning; and completing at least one semantically annotated description template by providing participant information related to a participant in the action, spatial information related to the position of the participant in a scene, dynamic information that influences at least one of an emotional state, a physical state, and a behavior of the participant, and a link binding the participant to an action; wherein the templates encompass all needed syntactic and semantic parameters to be used for the animation.

In accordance with a third broad aspect of the present invention, there is provided a system for analyzing natural language text describing actions and creating ordered action structures to be used in creating animation, the system comprising: a natural language processing module receiving the text as input and outputting a semantic structure; a conceptual background database storing semantic information;

a predicate interaction database storing commonsense knowledge for action predicates; a three-dimensional mapping database storing definitions of action predicates and related parameters; and an ordered action structure generator for generating a conceptual structure and automatically completing a semantically annotated action template representing information related to actions and a semantically annotated description template representing information related to a description of every participant involved in the actions, wherein the templates are completed using the semantic structure and information contained in the databases .

It should be understood that some of the information stored in the various databases may be generic information while other information is context-specific.

In this specification, the term "event" is intended to mean an animation unit which occurs at a particular point on a time line. The event/fluent distinction is a design issue, not an intrinsic property of actions. The term "fluent" is intended to mean an animation unit which holds over a period of time. A fluent is said to be a predicate whose truth value can change over time. An Event can occur simultaneously with any number of fluents .

In the context of linguistics, a predicate is understood to be a feature of language that can be used to make a statement about something in the animation world. A predicate is an "animation word" , which means that a predicate has a meaning understood by the animation generator. For example, the predicate "table" is understood by the animation generator

that will display a graphical representation of a table. The predicate "on" associated to the predicate "table" is understood by the animation generator as being indicative of a location. The animation generator understands that something will be positioned on the graphical representation of the table. The whole set of predicates form the animation language. An action predicate is understood to be a special kind of predicate, in the context of linguistics, that comes with actant slots and that refers specifically to an action concept. Actants are the parameters (variables) of the predicates. Formally, give (X, Y, Z) is an example of an action predicate, with three actants (X, Y and Z) . We sometimes refer to the action predicate give even though give is the name of the action predicate give (X, Y, Z). Semantic information is understood as information relating to the meaning of words and/or sentences . Syntactic information is understood as information relating to the form of sentences. Modulator information is information that can influence an action predicate: quickly influences walking for example. Animation termination information is information allowing to terminate an ongoing animation: killed terminates walking for example. Time graph information is information that allows time ordering on the action predicates, according to the semantic structure. Existential and constraint information relates to particular preconditions and post-conditions of action predicates For example, the action predicate take (x) has exists (x) as a precondition, meaning that the object to be taken, namely x, has to exist in the actual environment. For example, Inanimate (table) is a predicate that is true if the concept table is an inanimate concept. Animation Channel refers to an entity on which an action can be applied (such

as body, eyes, head) . Semantic Structures are graphs that feature predicate-actant connections. Arcs of these graphs are labelled with semantic relations of the actants relative to their predicates. These relations are often referred to with the term semantic role.

BRIEF DESCRIPTION OF THE DRAWINGS

Further features and advantages of the present invention will become apparent from the following detailed description, taken in combination with the appended drawings, in which:

Fig. 1 is a block diagram of an embodiment of the system for analyzing natural language text describing actions and creating ordered action structures;

Fig. 2A is an example of a first ontology describing relations between different concepts of cars;

Fig. 2B is an example of a second ontology connecting color concepts ;

Fig. 2C is an example of an ontology connecting the concept "run" to the instance "sprint";

Fig. 3 is an exemplary structure of an entry in an APIR database;

Fig. 4 is an example of a semantic structure extracted from a first sentence;

Fig. 5 is an example SAAT template;

Fig. 6 is an example SADT template;

Fig. 7 is an example of a second semantic structure extracted from a second sentence;

Fig. 8 is an example of a third semantic structure extracted from a third sentence;

Fig. 9 is an example of a fourth semantic structure extracted from a fourth sentence; and

Fig. 10 is an example of a conceptual structure.

It will be noted that throughout the appended drawings, like features are identified by like reference numerals.

DETAILED DESCRIPTION

Figure 1 illustrates a system 10 for analyzing natural language text describing actions, to generate an ordered action structure to be used in creating animations of the actions by an animation generator. A Natural Language Processing (NLP) 12 module is present. This module takes text 14 as input and outputs a semantic structure 16 using information contained in a conceptual background (CB) database 18, a lexicon 20, and a 3D mapping database 22. The SAAT and SADT generator 24 then creates Semantically Annotated Action Templates (SAAT) 26 and Semantically Annotated Description Templates (SADT) 28 using the semantic structure 16 and the information contained in the CB database 18, the 3D mapping database 22 and an Action Predicates Interaction Relations (APIR) Matrix database 30.

The lexicon 20 is part of the NLP module 12, is language dependent and contains morphological, syntactic and semantic information for every lexical units it contains. Each lexical

unit is provided with a lexical definition similar to a definition that can be found in a printed dictionary. In the NLP module 12, a lexical unit is assigned to each word of the input text. Other parts of the NLP module 12 may include, but are not limited to, tools to address text segmentation, word- sense disambiguation, syntactic ambiguity, and speech acts. NLP modules as are presently known in the art may be used in the present system and will not be described further. Note that the lexicon may be separate from the NLP module.

Another database, called a 3D mapping database, is the bridge between NLP and animation. The 3D mapping database 22 is the bridge between the NLP module 12 and the SAAT and SADT generator 24, and contains the animation vocabulary that the animation generator recognizes and understands and to which it assigns a graphical representation or a function in the animation. A unit entity or a "word" of the animation vocabulary is called a predicate. This database is language independent in the sense that the predicates are written in a pivot language . The input text 14 is written in a given language such as English, French, or German for example, and the predicates contained in the 3D mapping database 22 are written in the pivot language .

In one embodiment, the predicates contained in the 3D mapping database are classified in six classes, namely action predicates (AP) , modulator predicates (MLP) , modifier predicates (MFP) , conceptual assets (CA) , spatial relation predicates (SRP) , and alteration predicates (ALP) . The action predicates are animation words or predicates used to describe an action performed in the animation. Usually, the action predicates are verbs. The predicates "walk", "take", and

"kiss" are examples of action predicates. The conceptual assets are predicates which take part in an action. Usually, the conceptual assets are nouns. The predicates "table", "car" , "box" , "human" , and "hand" are examples of conceptual assets. A modulator predicate is a parameter associated to an action predicate or a conceptual asset. A modulator predicate can be used to specify the manner in which an action is performed, an emotion of a character, a characteristic of an object, etc. Usually, adverbs and adjectives are modulator predicates. Predicates such as "speed", "emotion", "color", and "side" are examples of modulator predicates. A modifier predicate is used to modify a modulator predicate. A modifier predicate such as "plus" amplifies the modulator to which it applies and a modifier predicate such as "minus" reduces the modulator predicate to which it applies. The spatial relation predicates are used to specify initial, final or invariant spatial relations. For example, in the inputted text "Paul takes the book from the table", the token "from" implies that the book is initially on the table (initial spatial relation) , whereas in the inputted text "Paul puts the book on the table" , the token "on" implies that the book will be on the table after the action (final spatial relation) . Examples of spatial relation predicates are "on" , "under" , "above", etc. An alteration predicate alters an action predicate. Predicates such as "stop", "continue" and "pause' are examples of alteration predicates.

The system 10 also comprises a CB database 18 which is a multiple inheritance conceptual structure intended to cover the semantic universe of the application. While the 3D mapping database 22 contains semantic information about the

animation world (animation words or predicates) , the CB database contains semantic information about the real world. The CB database contains ontologies of top-level concepts, mid- level concepts and instances, and acts as a representation of the semantic world of the application. The ontologies indicate relations between the elements of the CB database 18. The ontologies of the CB database 18 are language independent as they are written in the pivot language .

An ontology takes the form of an oriented graph in which nodes are connected together, as illustrated in figure 2A. A node can be either a concept or an instance. An instance can be a particular case of a concept or provide additional information about the concept. Referring to figure 2A, the instances "estate car" and "sedan car" are particular cases of the concept "family car" . Another type of additional information provided by the CB database 18 concerns the possibility of animating a concept or an instance. For example, the concept "table" can be classified as an animated concept in one application, which means that in the 3D animation the table can walk. In another animation, the concept "table" is considered as inanimate and the table is a static object in the 3D animation. The ontologies allow to determine whether a concept or an instance is animated or inanimate. The relations connecting the concepts together and the concepts to the instances can be of any type and are universal in the sense that they are always true independently of the language.

Referring to figure 2A, the concept "vehicle" is a top-level concept which includes mid- level concepts such as the

concepts "car", "boat" and "motorbike". The raid-level concept "car" also contains mid- level concepts such as "sports car" and "family car" . The oriented graph draws up an hierarchy between the concepts and the instances starting from the concept having the broadest meaning to the instances having the narrowest definition. All the concepts and instances are expressed in the pivot language so that the CB database is language independent. Figure 2B is another example of an ontology comprised in the CB database 18. A top-level concept "visual attribute" is linked to a mid-level concept "color" connected to several instances such as "blue" , "white" , "black", etc. Figure 2C illustrates a further example of an ontology comprised in the CB database, in which the instance "sprint" is connected to a concept "run". In this case, the ontology expresses a gradation of the speed of a motion. An ontology can also express a gradation of intensity such as "talk", "shout", and "yell".

Each lexical unit of the lexicon 20 is connected to a single concept or instance of the CB database 18 independently of the language of the lexicon 20. For example, the NLP module 12 receives the token "run" and the lexicon 20 comprises two lexical entries "run 1" and "run 2" , the first lexical entry being synonymous to "operate" and the second lexical entry expressing a type of displacement. The lexical entry "run 1" is associated to the concept "operate" while the second lexical entry, namely "run 2" is associated to the concept "run" .

In another embodiment, each lexical unit of the lexicon 20 is connected to a single concept or instance of the CB database 18 independently of the language of the lexicon 20. It should

be understood that a lexical unit can be connected to more than one concept or instance. For example, the lexical unit "run" is connected to at least two concepts "run 1" and "run 2" . The concept "run 1" is synonymous to "operate" and the concept "run 2" expresses a type of displacement. For example, if the inputted text is written in French and the pivot language is English, each lexical unit expressed in French is connected to at least one concept or instance expressed in English, so that no translation is required. In this case, the lexical unit "courir" is only connected to the concept "run 2" . It should be understood that the connection between the lexical units and the concepts or instances depend on the language of the lexicon 20.

It should be understood that two or more lexical units can refer to a same concept or instance. For example, the lexical units "kiss" and "give a kiss" are both associated to the concept "kissing" .

The APIR Matrix database 30 provides information on how two (or more) action predicates interact when one comes into play with an already existing action. In a sense, it encapsulates commonsense knowledge for action predicates . The underlying structure implemented through this matrix 30 is a graph. The nodes are action predicates, supported by the application. The edges (relations) are semantic relations useful in an animation environment. If node A is linked to node B by the semantic relation S, then we say that: A S B. For example, let A: = KISS, B:= SPEAK, S:= CAN_CLIPPED. The APIR graph will then contain what is seen in figure 3. Hence, if an action of kissing should be triggered while an action of speaking is occurring, the system knows that it has to clip the speaking

while kissing. Information in the APIR matrix 30 addresses synchronicity issues. Some examples of main semantic relationships implemented in the APIR database are found in table 1 below.

TABLE 1

Referring back to figure 1, a text 14 is entered by a user of the application. This text 14 can be written in any language. The NLP module 12 analyses both the syntax and the semantic of the inputted text 14 in order to determine its meaning. In order to perform this step, the NLP module 12 uses the information contained in the lexicon 20 and the ontologies of the CB database 18 in addition to semantic and reasoning algorithms. For example, these algorithms are used to perform the steps of metonymy resolution, co-reference resolution, semantic role labelling, word sense disambiguation, implicit action planning, etc. It should be understood that some of the previously described steps performed by the NLP module 12

can be performed by modules separate from the NLP module 12. For example, the implicit action planning can be performed by an action planner module independent of the NLP module 12. The role of the NLP module 12 is to translate the inputted text 14 into semantic structures 16 and syntactic structures. The first step is tokenization which is the process of demarcating sections of a string of input characters. The NLP module 12 separates the text into words or tokens. The next process is the tagging process which marks up the words of the inputted text 14 as corresponding to a particular part of speech. Parts of speech are the different grammatical categories used to categorize words, such as noun, verb, preposition, etc. Then the NLP module 12 performs the parsing process which consists in analysing a sequence of words in order to determine its grammatical structure with respect to a given grammar. Retrieving the semantic information is the last step which is performed by a semantic role labelling module comprised in the NLP module 12. This step consists in the labelling of relations between the participants of a given action with their respective semantic roles. The semantic roles are obtained using a combination of static rules and information contained in the ontologies of the CB database 18.

In order to create the semantic structure, the NLP module has to determine the meaning of an inputted word. For example, the NLP module 12 associates the word "runs" of the inputted text to the lexical unit "run" which is connected to the concepts "run 1" and "run 2" . Then, the NLP module determines which one of the two concepts represent the meaning of the inputted word associated to the lexical unit "run" by using

the ontologies of the CB database and the word sense disambiguation algorithm. For example, the NLP module 14 determines that the concept "run 1" is the appropriate one and the concept "run 1" is associated to the inputted word "runs" . Then, the inputted word "runs" is mapped to the action predicate "operate" contained in the 3D mapping database. A semantic structure 16 is a graphical interconnection of nodes . Each node comprises a token written in the language of the inputted text 14 , to which a concept written in the pivot language is associated.

Referring to the example illustrated in figure 2A, if the expression "grand tourer" is inputted, the NLP module 12 associates this expression to the lexical unit "grand tourer" connected to the instance "grand tourer" in the CB database. If a predicate corresponding to a grand tourer exists in the 3D mapping database 22, the NLP module 12 maps the inputted expression "grand tourer" to the predicate "grand tourer" . If the predicate "grand tourer" does not exist but the predicate "sport car" exists in the 3D mapping database 22, then the NLP module 12 maps the expression "grand tourer" to the predicate "sport car" . While the present example refers to a text written in English and to a CB database 18 and a 3D mapping database 22 using English as a pivot language, it should be understood that the same process occurs if the text is written in a different language. For example, the inputted text 14 can be written in French and the expression "voiture grand tourisme" (grand tourer in English) is entered in the NLP module provided with a French lexicon. In this case, the expression "voiture grand tourisme" is linked to the lexical unit "voiture grand tourisme" which is connected to the

instance "grand tourer' of the CB database 18. The NLP module 12 maps the expression "voiture grand tourisme" to the predicate "grand tourer" if it exists in the 3D mapping database 22. Otherwise, the expression "voiture grand tourisme" is mapped to the predicate "sport car" .

Referring to figure 2C, the term "sprints" is entered in the NLP module 12 by the user. The NLP module 12 connects the term "sprints" to the lexical unit "sprint" . If an action predicate corresponding to the instance "sprint" exists in the 3D mapping database 22, then the NLP module maps the term "sprints" to the action predicate "sprint" and also associates the term "sprints" to the instance "sprint" . However, if the action predicate "sprint" does not exist, then the NLP module 12 maps the term "sprints" to the action predicate "run" . Alternatively, the NLP module can transform the verb "sprint" into the expression "run fast" using the ontology illustrated in figure 2C. In this case, the term "sprints" is mapped to the action predicate "run" and to the modulator predicate "fast" .

Once the inputted words have been associated to a corresponding instance or concept and mapped to a corresponding predicate, the semantic role labelling module of the NLP module 12 creates the semantic structure 16. Figure 4 illustrates an example of a semantic structure 16. The sentence "Fred walks very quickly to the table" is inputted in the NLP module 12 which assigns a concept to each word of the sentence except to the words "the" and "to" which are ignored by the NLP module 12. The words "Fred", "walks", "very", "quickly" and "table" are associated to the lexical units "Fred", "walk", "very", "quickly" and "table",

respectively, which are connected to the concepts "human" , "walking", "intensifier" , "speedofevent" and "table", respectively. The concept "human", "walking", "intensifier" , "speed of event" and "table" are also mapped to a corresponding predicate of the 3D mapping database 22. Each word and its associated concept occupy a node in the semantic structure. The association word - concept describing the action of the sentence is placed at the top of the semantic structure. The other nodes are placed below the action node as a function of their importance in the sentence. The nodes are also connected by arrows describing the syntax function they play, relative to the node on which they depend. For example, the node "Fred: human" depends on the node "walks : walking" and they depend on the role of agent with respect to the node "walks : walking" . The node "very : intensifier" is connected to the node "quickly : speedofevent" as the word "very" qualifies the word "quickly" and the arrow indicates the degree by which the node "very: intensifier" modifies the node "quickly : speedofevent" . The semantic role of a node generated by the semantic role labelling algorithm is attached to the arrow pointing at the node.

The above described databases and the NLP module are used by the template generator 24 to fill templates in order to represent the knowledge that has been extracted from the inputted text. The template generator 24 generates one SAAT per action comprised in the inputted text and one SADT for each participant involved in each action. The SADT and SAAT generator 24 receives the semantic structure from the NLP module 12 and has access to the CB database 18, the 3D

mapping database 22 and the APIR Matrix database 30. The SAAT and SADT generator 24 also uses the syntactic structure of the inputted text 14, which is generated by the NLP module 12. The SADT and SAAT generator 24 generates a conceptual structure (which is language independent) . The conceptual structure is a kind of semantic structure but labelled with predicates and semantic roles supported by the animation system. The conceptual structure can be represented as a graph of time ordered SAATs, each SAAT being associated to corresponding SADTs. In a sense, the conceptual structure acts as a logical representation of the input text 14.

In one embodiment, using the semantic structures generated by the NLP module 12, the SAAT and SADT generator 24 identifies all of the actions. These actions are detected by looping through all the semantic tokens that where analyzed and querying the lexical database for links to action predicates in the 3D Mapping database 22. Any semantic token found to be linked to an action predicate is considered to introduce an action. Then, the SAAT and SADT generator 24 classifies each node of the semantic structure into two parameter families: actant vs modulator. The distinction is based on the semantic role of the nodes and a static classification of all semantic roles is defined in the 3D Mapping database 22. The SAAT and SADT generator 24 generates a temporal graph that temporally positions each action previously identified. The SAAT and SADT generator 24 takes into account the following information to sequence the actions: conceptual background information of each semantic token, the roles declared in the semantic structure 16, grammatical relations, information from the APIR Matrix database 30and the actants of each

action. This temporal graph will be used as a stencil in step 4. Each node of this graph contains typed links (previous, next, concurrent) to other nodes, that make up the temporal information of the graph. Finally, the SAAT and SADT generator 24 generates a SAAT per action and positions it in the temporal graph previously generated according to the action associated with the SAAT. The SAAT and SADT generator 24 also generates SADTs for each SAAT and connects them to their corresponding SAAT. The temporal graph filled with the SAATs and SADTs represent the conceptual structure outputted by the SAAT and SADT generator 24.

In one embodiment, the templates generator 24 generates attributes predicates such as precondition, invariant, and post condition which are used in a first step to commonsense reasoning about actions. They allow the action planner module to trigger actions which are not explicit but essential to the flow of actions. The domain of the parameters of the predicates, used in those attributes, is tantamount to the predicates themselves. In fact, the domain is either the semantic roles or the conceptual background, depending on the predicate. The actual supported predicates, where X, Y are semantic roles, W is of type POSITION, P is of type Action Predicate, and Z is a member of the conceptual background are illustrated in table 2.

TABLE 2

In one embodiment, the SAAT is composed of six parts. Each part contains information either extracted from the semantic structure, from the databases 18, 22, and 30, from the predicates themselves or from the user interaction with the system. The 6 main sections of a SAAT are illustrated in table 3 below.

TABLE 3

Figure 5 illustrates an exemplary format of a SAAT template. In one embodiment of the present invention, the SADT is composed of 4 parts. Each part contains information either extracted from the semantic structure, from non-action predicates, or from user interaction with the system. The 4 main sections of a SADT are illustrated in table 4 below.

TABLE 4

Figure 6 illustrates an exemplary format of a SADT template. The templates (SAAT and SADT) are filled by the template generator 24 as follows. In the case of the SAAT, the general information along with event/fluent information is obtained directly from information present in the 3D mapping database for a given action predicate. The general information also

contains preconditions and post-conditions which are generated by the template generator 24 by processing the action predicates in the semantic structure for logical and existential information. The general information is also indicative of whether the action predicate is a fluent or an event. The concept associated to the action predicate is also part of the general information. The verb template section contains the semantic structure outputted by the NLP module 12 in addition to the modulator information. Modulator information is created from a filter which is applied to some special semantic roles of the semantic structure, like "manner" or other, and the information is digitized for the use of the animation. The modulator information contains all modulator predicates which apply to the action predicate. The time section information contains animation termination information, time graph information and time information extracted by the generator 24 from the event/fluent status of the actions predicates . Time graph information contains the links to the previous and next SAATs associated with other action predicates. The time graph gives the execution order of the SAATs and is generated from the text in a sequential fashion. The APIR matrix information section is populated by considering the current action unit with the pre-existing fluents at that particular point in time. This information is also further used to populate the Animation Termination Information in the Time Module, in addition to information coming from the semantic structure. The commonsense reasoning inferences are drawn by resolving the predicates with the APIR Matrix database 30 and the CB database 18. For example, using the CB database 18, the generator 24 determines that the action predicate "kiss" has implicit actions such as

"look" and "walk" . Before kissing Hillary, Barack looks at Hillary, walks to Hillary and finally kisses her. The templates generator creates an action predicate and a corresponding template for the "look" action and the "walk" action. Any dynamic information that is obtained from user interaction is stored in the user interaction module (special place holder for information supplied by the user as a result of a user interface module) .

In the case of the SADT, the Participant information section contains all static information about the predicate to which the SADT is associated. This section also comprises information such as the semantic role of the non-action predicate to which the SADT is associated, its emotional state, etc. The semantic role attached to the participant, the modulator predicates applied to the participant, the emotional state (such as happy, sad, and the like) are obtained from the semantic structure 16 outputted by the NLP module 12. The animate/inanimate information obtained from the CB database 18 or via user interaction is also contained in the participant information section of the SADT. Any other information obtained from the user interface is placed in this section. The spatial information section comprises information about the position in the scene of the element to which the SADT is associated. This position is extracted from the semantic structure 16. The spatial information also includes spatial constraints applied to the element of the SADT relative to other non-action predicates. The constraints are determined using information contained in the 3D mapping database. The animation channel comes from the 3D Knowledge map database. An animation channel is the part of a

participant that will be animated when an action is applied to the participant. For example, when the action "walking" is applied to a human being, the legs of the human being are the animation channel as the legs will be animated so that the human being performs the action of walking. If the action "walking" is applied to a snake, the animation channel is the whole body of the snake. The dynamic information section comprises modifier predicates which influence the emotional state, the physical state, the behaviour, and the like. The dynamic information is extracted from the semantic structure 16. Finally, the graph information section indicates the SAAT to which the SADT is connected.

In accordance with an embodiment, the temporal ordering in the system is handled by using the following assumptions: 1) An event is executed to termination as it is introduced in the text; and 2) A fluent starts execution when it is introduced in the text and it continues until there is explicit termination either in the text or as a consequence of introduction of some other fluent/event which marks its termination. The APIR Matrix provides information on how the introduction of an event or a fluent influences the existing fluents. The predicate is tagged as an event or a fluent and this information comes from the 3D mapping database.

While the present description refers to SAATs comprising 6 sections and SADTs having 4 sections, it should be understood that the number of sections may vary as long as all information required to create the animation is present in the templates .

Illustrated below is an example of a conceptual structure

generated by the SAAT and SADT generator 24.

The following text is inputted in the NLP module 12 :

Sentence 1: "Fred walks very quickly to the table:

Sentence 2: "Sally looks at him and talks to Mary";

Sentence 3 : "Mary slowly gives a red book to Paul while he sits on a chair" ; and

Sentence 4 : "Peter tiptoes happily to the table and runs to the door" .

Figures 4, 7, 8, and 9 illustrate the semantic structures outputted by the NLP module 12 for sentences 1, 2, 3, and 4, respectively. The adjective "red" associated to the word "book" in sentence 3 does not generate a node in its corresponding semantic structure of figure 8 as the template generator 24 has access to the syntactic structure of sentence. The template generator 24 determines the color of the book using the syntactic structure outputted by the NLP module 12.

Figure 10 illustrates an embodiment of a conceptual structure 50 outputted by the template generator 24. A SAAT 52 is generated for sentence 1 and two SADTs 54 and 56 are connected to the SAAT 52. Two SAATs 58 and 60 are generated using the semantic structure of figure 7. The SAATs 58 and 60 are positioned below the SAAT 52 in order to show that the actions described by the SAATs 58 and 60 occur after the action described by the SAAT 52. The SAATs 58 and 60 are also positioned on the same line in order to show that the actions described by the SAATs 58 and 60 occur simultaneously. The

template generator 24 generates the SAATs 62 and 64 using the semantic structure illustrated in figure 8. SAATs 66 and 68, which correspond to the semantic structure illustrated in figure 9, are positioned one below the other as the actions associated to the SAATs occur sequentially.

The conceptual structure 50 contains all information required to generate the animation and is sent to the animation generator which is adapted to read the conceptual structure and retrieve all information. Using this information, the animation generator generates an animation which corresponds to the input text .

While illustrated in the block diagrams as groups of discrete components communicating with each other via distinct data signal connections, it will be understood by those skilled in the art that the preferred embodiments are provided by a combination of hardware and software components, with some components being implemented by a given function or operation of a hardware or software system, and many of the data paths illustrated being implemented by data communication within a computer application or operating system. The structure illustrated is thus provided for efficiency of teaching the present preferred embodiment.

It should be noted that the present invention can be carried out as a method, can be embodied in a system, a computer readable medium or an electrical or electro-magnetic signal. The embodiments of the invention described above are intended to be exemplary only. The scope of the invention is therefore intended to be limited solely by the scope of the appended claims.