Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
APPARATUS AND METHOD FOR INDICATING A PRONUNCIATION INFORMATION
Document Type and Number:
WIPO Patent Application WO/2009/066963
Kind Code:
A2
Abstract:
Apparatus and method for indicating pronunciation information that indicate movement animation of vocal organs to support language learning. The apparatus includes a phonetic value configuration data storage that stores phonetic value configuration data for words; a pronunciation information storage that stores state information of vocal organs for phonetic values; a phonetic value configuration data searcher that retrieves phonetic value configuration data for each word included in an input text from the phonetic value configuration data storage; a phonetic value configuration data selector that uniquely chooses target phonetic value configuration data for each word from the retrieved phonetic value configuration data; a pronunciation information searcher that retrieves from the pronunciation information storage the state information of the vocal organs for each phonetic value included in the chosen phonetic value configuration data; and a pronunciation information generator that produces vocal organ animation by using the retrieved state information of the vocal organs.

Inventors:
PARK BONGLAE (KR)
Application Number:
PCT/KR2008/006891
Publication Date:
May 28, 2009
Filing Date:
November 21, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTELAB CO LTD (KR)
PARK BONGLAE (KR)
International Classes:
G10L13/02; G10L11/00; G10L15/24
Foreign References:
JP2006126498A
JP2004347786A
KR20070024498A2007-03-02
JPH0756494A
Attorney, Agent or Firm:
HANYANG PATENT FIRM (677-25 Yeoksam-dongGangnam-gu, Seoul 135-914, KR)
Download PDF:
Claims:

Claims

[1] An apparatus for indicating pronunciation information, comprising: a phonetic value configuration data storage that stores one or more phonetic value configuration data for each of a plurality of words; a pronunciation information storage that stores state information of vocal organs corresponding to each of a plurality of phonetic values; a phonetic value configuration data searcher that retrieves one or more phonetic value configuration data corresponding to each word included in an input text from the phonetic value configuration data storage; a phonetic value configuration data selector that uniquely chooses target phonetic value configuration data for each word for indicating pronunciation information from the one or more phonetic value configuration data for each word retrieved by the phonetic value configuration data searcher; a pronunciation information searcher that retrieves from the pronunciation information storage the state information of the vocal organs corresponding to the phonetic values included in the phonetic value configuration data chosen by the phonetic value configuration data selector; and a pronunciation information generator that produces animation corresponding to the input text on the basis of the state information of the vocal organs retrieved by the pronunciation information searcher.

[2] The apparatus for indicating pronunciation information according to claim 1, wherein the pronunciation information storage further stores voice information corresponding to each of a plurality of phonetic values, the pronunciation information searcher further retrieves from the pronunciation information storage voice information corresponding to each of the phonetic values included in the phonetic value configuration data chosen by the phonetic value configuration data selector, and the pronunciation information generator produces animation and synthesized voices corresponding to the input text on the basis of the retrieved state information of the vocal organs and the retrieved voice information respectively.

[3] An apparatus for indicating pronunciation information, comprising: a phonetic value configuration data storage that stores one or more phonetic value configuration data for each of a plurality of words; a pronunciation information storage that stores voice information for each phonetic value of a plurality of phonetic values; a phonetic value configuration data searcher that retrieves one or more phonetic value configuration data corresponding to each word included in an input text

from the phonetic value configuration data storage; a phonetic value configuration data selector that uniquely chooses target phonetic value configuration data for indicating pronunciation information from the phonetic value configuration data for each word retrieved by the phonetic value configuration data searcher; a pronunciation information searcher that retrieves from the pronunciation information storage the voice information corresponding to the phonetic values included in the phonetic value configuration data chosen by the phonetic value configuration data selector; and a pronunciation information generator that synthesizes the voice information retrieved by the pronunciation information searcher into continuous voice.

[4] The apparatus for indicating pronunciation information according to any one of claims 1 to 3, wherein the phonetic value configuration data selector receives phonetic condition such as a pronunciation phenomenon, etc. with respect to the input text to choose phonetic value configuration data that conforms with the phonetic condition among the phonetic value configuration data retrieved by the phonetic value configuration data searcher.

[5] The apparatus for indicating pronunciation information according to any one of claims 1 to 3, further comprising: a phonetic value configuration data editor that adjusts or edits the phonetic value configuration data chosen by the phonetic value configuration data selector.

[6] The apparatus for indicating pronunciation information according to any one of claims 1 to 3, wherein the phonetic value configuration data selector receives the voice information corresponding to the input text to choose the phonetic value configuration data that conforms with the voice information for each word among the phonetic value configuration data retrieved by the phonetic value configuration data searcher.

[7] A method for indicating pronunciation information, comprising:

(a) assigning a phonetic value configuration data searcher to retrieve phonetic value configuration data for each word included in an input text from a phonetic value configuration data storage;

(b) assigning a phonetic value configuration data selector to uniquely choose target phonetic value configuration data for indicating pronunciation information from the phonetic value configuration data for each word retrieved in step (a);

(c) assigning a pronunciation information searcher to retrieve state information of vocal organs corresponding to phonetic values included in the phonetic value configuration data chosen in step (b); and

(d) assigning a pronunciation information generator to produce animation cor-

responding to the input text on the basis of the state information of the vocal organs retrieved in step (c).

[8] The method for indicating pronunciation information according to claim 7, wherein in step (c), the pronunciation information searcher retrieves from the pronunciation information storage the state information of the vocal organs and the voice information corresponding to each of the phonetic values which are included in the phonetic value configuration data chosen in step (b), and in step (d), the pronunciation information generator produces animation and continuous voice corresponding to the input text on the basis of the state information of the vocal organs and voice information retrieved in step (c) respectively.

[9] A method for indicating pronunciation information, comprising:

(a) assigning a phonetic value configuration data searcher to retrieve phonetic value configuration data for each word included in an input text from a phonetic value configuration data storage;

(b) assigning a phonetic value configuration data selector to uniquely choose target phonetic value configuration data for indicating pronunciation information from the phonetic value configuration data for each word retrieved in step (a);

(c) assigning a pronunciation information searcher to retrieve from a pronunciation information storage voice information corresponding to the phonetic values included in the phonetic value configuration data chosen in step (b); and

(d) assigning a pronunciation information generator to synthesize the voice information retrieved in step (c) to produce continuous voice.

[10] The method for indicating pronunciation information according to any one of claims 7 to 9, wherein in step (b), phonetic condition such as a pronunciation phenomenon, etc. with respect to the input text is received and phonetic value configuration data that conforms with the phonetic condition is chosen for each word from the phonetic value configuration data retrieved in step (a).

[11] The method for indicating pronunciation information according to any one of claims 7 to 9, further comprising:

(b-1) assigning a phonetic value information editor to adjust or edit the phonetic value configuration data chosen in step (b).

[12] The method for indicating pronunciation information according to any one of claims 7 to 9, wherein in step (b), the voice information corresponding to the input text is received and the phonetic value configuration data that conforms with the voice information is chosen for each word from the phonetic value configuration data retrieved in step (a).

Description:

Description

APPARATUS AND METHOD FOR INDICATING A PRONUNCIATION INFORMATION

Technical Field

[1] The present invention relates to an apparatus and a method for indicating pronunciation information, and more particularly, to an apparatus and a method for indicating pronunciation information that can provide an environment in which a language learner can intuitively appreciate a pronunciation principle of a learning target language and a difference in pronunciation between a native speaker and the language learner and naturally get accustomed to all pronunciations of the corresponding language by being familiarized with phonemes to diverse texts. Background Art

[2] With rapid globalization, the necessity of a command of foreign languages has increased. Under such a circumstance, it is necessary to be, first of all, familiar with the pronunciation of the foreign language in order to rapidly learn the foreign language. The reason is that the language learner can understand native speaker's pronunciations only when the language learner is sufficiently familiar with the pronunciation of the corresponding language and the language learner can acquire various phrases or expressions more effectively and efficiently when the language learner can understand the native speaker's pronunciations. Further, since the language learner can converse with the native speaker by using the corresponding language with the correct pronunciation, the language learner can learn the language through the conversation.

[3] Many people say that, In the course of learning a language, children are familiar with phonetic characteristics, particularly, segmentation of the corresponding language from the time of being in a womb, and then learn the meaning and grammar of the language after being born. The vocal organs begin to be stuck in vernacular voice patterns at about the start of ten years of age, such that it becomes difficult to acquire foreign languages.

[4] However, in the current foreign language education, language learners concentrate their efforts on words, phrases and sentences in a state in which the language learners are not familiar with vocal characteristics of foreign languages and thus hardly differentiate spoken words of the foreign languages. Therefore, when even familiar expressions are a bit modified, the familiar expressions cannot be easily heard or used. In particular, since the language learner cannot easily differentiate constituent elements of the language in a text pronounced at high speed, the language learner cannot easily hear the text and furthermore the language learner speaks very crude pronunciations.

[5] Therefore, educational institutions and educational businesses have developed various solutions for correcting the pronunciations. Two representative solutions relating to the present invention are introduced as follows.

[6] One is a solution that presents the movement of vocal organs when the vocal organs produce basic pronunciations one by one. This solution includes 'Pronunciation Power' which is a product made in U.S.A., 'Tell Me More' made in France, and a solution serviced through the Internet by the University of IOWA in U.S.A. These solutions present the process of isolately pronouncing basic English phonemes through the change of a mouth shape from the front of a face and an in-mouth shape from the side of the face to help understand how each phonetic value (phoneme) is pronounced.

[7] The second solution is a solution that presents the spoken voice by a voice wave image and compares a similarity. This solution includes 'Pronunciation Power' which is a product made in U.S.A., 'Tell Me More' made in France, and 'Root English' of Language Technology Co., Ltd. in Korea. These show a voice wave which the native speaker speaks, a voice wave which the language learner speaks, and a similarity therebetween to induce the language learner to speak the sentences, etc. similarly as the native speaker.

[8] The above-mentioned two solutions are useful in providing a means that helps the language learner to understand the principle of the pronunciation and judge whether the learner's own pronunciation is correct. However, there is much room for improvement in that the solutions are too simple or are difficult to understand.

[9] In the solutions presenting the movement of the vocal organs, the pronunciation of the basic phonemes (consonants and vowels of the corresponding language) is just individually shown through two-dimensional animation produced beforehand, such that the language learner cannot understand that even the same phoneme may be variously pronounced depending on speaking accent, speaking speed, phonetic phenomena, etc., and the course of being familiar with the pronunciations is separated from the course of learning the words, phrases, and sentences, such that the language learners cannot continuously correct the pronunciations throughout the whole course of learning the language.

[10] And the voice wave comparison solution cannot help general learners easily understand the voice waves themselves and cannot provide an intuitive method for being familiar with the pronunciation principle. In the solution of comparing the voice waves of the learner and the native speaker, even though the learner correctly pronounces expressions, the pronunciation of the learner may be different from that of the native speaker, such that negative valuation may be presented towards the solution and thus credibility of the solution may decrease.

Disclosure of Invention

Technical Problem

[11] The present invention is contrived to solve the above-mentioned problems. An object of the present invention is to provide an apparatus and a method for indicating pronunciation information that produce and indicate the movement animation of vocal organs in order to effectively support correction of pronunciations in learning languages. Technical Solution

[12] In order to achieve the above-described object, an apparatus for indicating pronunciation information according to an embodiment of the present invention includes a phonetic value configuration data storage that stores one or more phonetic value configuration data for each of a plurality of words; a pronunciation information storage that stores state information of vocal organs corresponding to each of a plurality of phonetic values; a phonetic value configuration data searcher that retrieves one or more phonetic value configuration data corresponding to each word included in an input text from the phonetic value configuration data storage; a phonetic value configuration data selector that uniquely chooses target phonetic value configuration data for each word for indicating pronunciation information from the one or more phonetic value configuration data for each word retrieved by the phonetic value configuration data searcher; a pronunciation information searcher that retrieves from the pronunciation information storage the state information of the vocal organs corresponding to the phonetic values included in the phonetic value configuration data chosen by the phonetic value configuration data selector; and a pronunciation information generator that produces animation corresponding to the input text on the basis of the state information of the vocal organs retrieved by the pronunciation information searcher.

[13] Preferably, the pronunciation information storage further stores voice information corresponding to each of a plurality of phonetic values, the pronunciation information searcher further retrieves from the pronunciation information storage voice information corresponding to each of the phonetic values included in the phonetic value configuration data chosen by the phonetic value configuration data selector, and the pronunciation information generator produces animation and synthesized voices corresponding to the input text on the basis of the retrieved state information of the vocal organs and the retrieved voice information respectively.

[14] An apparatus for indicating pronunciation information according to another embodiment of the present invention includes a phonetic value configuration data storage that stores one or more phonetic value configuration data for each of a plurality of words; a pronunciation information storage that stores voice information for each

phonetic value of a plurality of phonetic values; a phonetic value configuration data searcher that retrieves one or more phonetic value configuration data corresponding to each word included in an input text from the phonetic value configuration data storage; a phonetic value configuration data selector that uniquely chooses target phonetic value configuration data for indicating pronunciation information from the phonetic value configuration data for each word retrieved by the phonetic value configuration data searcher; a pronunciation information searcher that retrieves from the pronunciation information storage the voice information corresponding to the phonetic values included in the phonetic value configuration data chosen by the phonetic value configuration data selector; and a pronunciation information generator that synthesizes the voice information retrieved by the pronunciation information searcher into continuous voice.

[15] Preferably, the phonetic value configuration data selector receives phonetic condition such as a pronunciation phenomenon, etc. with respect to the input text to choose phonetic value configuration data that conforms with the phonetic condition among the phonetic value configuration data retrieved by the phonetic value configuration data searcher.

[16] Preferably, the apparatus for indicating pronunciation information further includes a phonetic value configuration data editor that adjusts or edits the phonetic value configuration data chosen by the phonetic value configuration data selector.

[17] Preferably, the phonetic value configuration data selector receives the voice information corresponding to the input text to choose the phonetic value configuration data that conforms with the voice information for each word among the phonetic value configuration data retrieved by the phonetic value configuration data searcher.

[18] In order to achieve the above-described object, a method for indicating pronunciation information according to an embodiment of the present invention includes (a) assigning a phonetic value configuration data searcher to retrieve phonetic value configuration data for each word included in an input text from a phonetic value configuration data storage; (b) assigning a phonetic value configuration data selector to uniquely choose target phonetic value configuration data for indicating pronunciation information from the phonetic value configuration data for each word retrieved in step (a); (c) assigning a pronunciation information searcher to retrieve state information of vocal organs corresponding to phonetic values included in the phonetic value configuration data chosen in step (b); and (d) assigning a pronunciation information generator to produce animation corresponding to the input text on the basis of the state information of the vocal organs retrieved in step (c).

[19] Preferably, in step (c), the pronunciation information searcher retrieves from the pronunciation information storage the state information of the vocal organs and the voice

information corresponding to each of the phonetic values which are included in the phonetic value configuration data chosen in step (b), and in step (d), the pronunciation information generator produces animation and continuous voice corresponding to the input text on the basis of the state information of the vocal organs and the voice information retrieved in step (c) respectively.

[20] A method for indicating pronunciation information according to another embodiment of the present invention includes (a) assigning a phonetic value configuration data searcher to retrieve phonetic value configuration data for each word included in an input text from a phonetic value configuration data storage; (b) assigning a phonetic value configuration data selector to uniquely choose target phonetic value configuration data for indicating pronunciation information from the phonetic value configuration data retrieved in step (a) for each word; (c) assigning a pronunciation information searcher to retrieve from a pronunciation information storage voice information corresponding to the phonetic values included in the phonetic value configuration data chosen in step (b); and (d) assigning a pronunciation information generator to synthesize the voice information retrieved in step (c) to produce continuous voice.

[21] Preferably, in step (b), phonetic condition such as a pronunciation phenomenon, etc. with respect to the input text is received and phonetic value configuration data that conforms with the phonetic condition is chosen for each word from the phonetic value configuration data retrieved in step (a).

[22] Preferably, the method for indicating pronunciation information further includes

(b-1) assigning a phonetic value information editor to adjust or edit the phonetic value configuration data chosen in step (b).

[23] Preferably, in step (b), the voice information corresponding to the input text is received and the phonetic value configuration data that conforms with the voice information is chosen for each word from the phonetic value configuration data retrieved in step (a).

Advantageous Effects

[24] According to the present invention, an apparatus and a method for indicating pronunciation information can provide an environment in which a language learner can intuitively appreciate a pronunciation principle of a learning target language and a difference in pronunciation between a native speaker and the language learner through the animation of the movement of vocal organs and naturally get accustomed to all pronunciations of the corresponding language by being familiarized with a phoneme to diverse texts. Brief Description of Drawings

[25] FIG. 1 is a block diagram for illustrating a configuration of an apparatus for indicating pronunciation information according to an embodiment of the present invention;

[26] FIG. 2 is a diagram for illustrating a pronunciation information storage of FIG. 1 ;

[27] FIG. 3 is a block diagram for illustrating another configuration of an apparatus for indicating pronunciation information;

[28] FIG. 4 is a flowchart for illustrating a method for indicating pronunciation information according to an embodiment of the present invention;

[29] FIG. 5 is a flowchart for illustrating steps of choosing phonetic value configuration data;

[30] FIG. 6 is a diagram for illustrating steps of selecting phonetic value configuration data; and

[31] FIG. 7 is a diagram for illustrating steps of outputting pronunciation information.

Best Mode for Carrying out the Invention

[32] Hereinafter, most preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the spirit of the present invention. Before the detailed description, it should be noted that reference numerals given to elements of each drawing represent like reference numerals referring to like elements even though like elements are shown in different drawings. Further, in describing the present invention, well- known functions or constructions will not be described in detail since they may unnecessarily obscure the understanding of the present invention.

[33] First, terms used for detailed description of an apparatus and a method for indicating pronunciation information according to an embodiment of the present invention are defined as follows.

[34] Phonetic value configuration data for a word is the information on arrangement of one or more phonetic values corresponding to the word. Herein, the phonetic values correspond to pronunciations of phonemes constituting each word and mean a vocal sound phenomenon caused by actions of the vocal organs.

[35] State information of the vocal organs is the state information on a movement of each vocal organ in pronouncing the phonetic values. Herein, the vocal organs include lips, tongue, nose, uvula, palate, teeth, gums, etc. which are parts of a human body used for generating a voice.

[36] Voice information for a phonetic value is the voice information corresponding to the phonetic value. That is, the voice information for a phonetic value means voice data in which vocal sound of the native speaker is sampled with respect to the phonetic value.

[37] Hereinafter, an apparatus for indicating pronunciation information according to an

embodiment of the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 is a block diagram for illustrating a configuration of an apparatus for indicating pronunciation information according to an embodiment of the present invention. FIG. 2 is a diagram for illustrating a pronunciation information storage of FIG. 1. FIG. 3 is a block diagram for illustrating another configuration of an apparatus for indicating pronunciation information.

[38] As shown in FIG. 1, the apparatus for indicating pronunciation information includes a phonetic value configuration data storage 5, a pronunciation information storage 15, an input unit 20, a phonetic value configuration data searcher 25, a phonetic value configuration data selector 35, a pronunciation information searcher 45, a pronunciation information generator 55, and an indication unit 60.

[39] The phonetic value configuration data storage 5 stores one or more phonetic value configuration data which can be pronounced for each word of a plurality of words. That is, since different phonetic value configuration data for a word may exist depending on a part of speech of the word and additional phonetic value configuration data may further exist by language phenomena such as weakening, omission, assimilation, shortening, etc. of a sound in accordance with speaking speed or accent, the phonetic value configuration data storage 5 stores all available phonetic value configuration data for each word. The phonetic value configuration data storage 5 stores all available phonetic value configuration data for each word as lists such as sequences of phonetic symbols as shown in Table 1. Herein, the phonetic value configuration data storage 5 further stores wrong phonetic value configuration data such as [h aa b] used for discriminating a correct pronunciation and an incorrect pronunciation.

[40] Table 1 [Table 1] [Table ]

[41] [42] The phonetic value configuration data storage 5 stores the phonetic value configuration data for each word including one or more phonetic values. The phonetic

value configuration data storage 5 may store phonetic value configuration data for each of sequences of two or more words, when a boundary between morphemes(or words) is obscure, such as Eojeol(word sequence without space) in Korean.

[43] The phonetic value configuration data storage 5 stores feature information of phonetic values together with the phonetic value configuration data for each word. That is, the phonetic value configuration data storage 5 stores the feature information including a length of pronunciation, accent, pronunciation transition time to next phonetic value, etc. Herein a phonetic value configuration data is a list of pronunciations and the pronunciations are symbols representing the pronunciations such as the phonetic symbols existing in general language dictionaries. The pronunciations include vowels and consonants. The vowels and the consonants have different lengths of pronunciation from each other in speaking. In general, the vowels are pronounced about four times or more as long as the consonants. The length of pronunciation may be different among vowels or consonants. Further, the length of each pronunciation may be remarkably different depending on the speaking state such as the speaking speed, etc. It is preferable that the phonetic value configuration data storage 5 stores the feature information in speaking such as the length for each pronunciation. Of course, the feature information in speaking may be stored for each basic pronunciation or may be processed by a rule rather than is stored for every word. Additional information such as the length for each pronunciation is usefully used in producing smooth animation for the vocal organs hereafter. Besides, the feature information for each pronunciation may further include the accent and the pronunciation transition time in addition to the length of pronunciation. Even the same pronunciation may have relatively different accents depending on the word, and a time length when one pronunciation is transited to the subsequent pronunciation may be different depending on a pair of pronunciations. Since the information is also an important element for producing the smooth animation for the vocal organs hereafter, it is preferable that the information is stored in the phonetic value configuration data storage 5.

[44] The pronunciation information storage 15 stores information on the state of the vocal organs corresponding to phonetic values. That is, the pronunciation information storage 15 stores the state information of the vocal organs which is the information on the state such as the shape of each vocal organ while each phonetic value is spoken. Herein, the vocal organs as organs of the human body include the lip, the tongue, a jaw, an in-mount part, a soft plate, the nose, etc. At this time, the pronunciation information storage 15 may store images of the vocal organs or store features for generating the images as the state information of the vocal organs. Herein, in the phonetic information storage unit 15, it is preferable that the state information of the vocal organs separately exists for each vocal organ when the state information of the

vocal organs is indicated by a two-dimensional image of the vocal organs. The separately stored images of the vocal organs are combined with each other so as to variously produce the images of the vocal organs (for example, an in-mount state information image is produced by combining state information images of the tongue and the uvula with each other as shown in FIG. 2). It is preferable that the forms of the vocal organs existing within the mouth, such as the tongue, etc. are viewed from the side cross section of the face. The pronunciation information storage 15 includes core coordinates, etc. required for generating images of the vocal organs when the pronunciation information storage 15 stores the features as the state information of the vocal organs. For example, in the case of the mouth, a lip shape can be generated with end coordinates of upper and lower lips, left and right end coordinates of the lip, a transverse coordinate which is a projection degree of the upper and lower lips, and relative coordinates of upper and lower teeth and the upper and lower lips by using the existing graphic techniques. In the case of the tongue, when the tip of the tongue, the root of the tongue, curve points in the middle of the tongue, and a curvature degree are used as the features, images of the tongue suitable for the features can be automatically drawn. In the case of a three-dimensional image, more features may be required, but shapes and physical movement modes of the vocal organs may be included in a three- dimensional graphic engine in advance, thereby reducing the features. Herein, the state information of the vocal organs is the representative state information of vocal organs as the information on representative states of the vocal organs. The representative state of the vocal organs means one characteristic state in the movement of the vocal organs which must be performed in order to differentiate each pronunciation in speaking. For example, in order to pronounce an English consonant [t], the tip of the tongue is sure to reach and then separate from the roof of the mouth. At this time, a state in which the tip of the mouth reaches the roof of the tongue is a representative state of the tongue with respect to the consonant [t] and information indicating the state is the representative state information of the vocal organ with respect to a phoneme [t]. Of course, as a general English phoneme 't' may be pronounced in approximately 10 different types of pronunciation modes, one or more data of representative state information of each vocal organ may exist with respect to the same general phoneme. So it is preferable that different and divided phonetic symbols such as 'tl' and 't2' are used corresponding to the same general phoneme and representative state information of vocal organs is assigned to each divided phonetic symbol. Further, even in the case of a diphthong, for example, in the case of Ow', Ow' is processed by being divided into constituent element pronunciations such as 'o_ow' and 'w_ow'. [45] It is preferable that the pronunciation information storage 15 stores information on states of the vocal organs before and after speaking the corresponding pronunciation in

order to help understand the movement of the vocal organs between pronunciations. That is, when the animation is produced only with the representative state information of the vocal organs, it is difficult to produce realistic smooth animation. Therefore, it is preferable that the state information of the vocal organs during the transition between pronunciations is stored altogether and is used in producing the animation.

[46] The pronunciation information storage 15 further stores voice information for each phonetic value. That is, the pronunciation information storage 15 stores voice information corresponding to the phonetic values included in the phonetic value configuration data.

[47] The pronunciation information storage 15 may store diverse voice information in accordance with feature information of each phonetic value. That is, the pronunciation information storage 15 stores voice information differently pronounced in accordance with the feature information including the speaking lengths of the phonetic value, the accent, inter-pronunciation transition time, etc.

[48] The input unit 20 receives an input text. That is, the input unit 20 receives a text constituted by one or more words as the input text. Herein, the input unit 20 may receive the input text such as the phoneme, the syllable, the word, the phrase or the sentence of which animation is wanted to be produced from a user.

[49] The phonetic value configuration data searcher 25 retrieves phonetic value configuration data for each word included in the input text from the phonetic value configuration data storage 5. That is, the phonetic value configuration data searcher 25 retrieves all phonetic value configuration data for each word included in the input text. For example, when the phonetic value configuration data for each word shown in Table 1 is stored in the phonetic value configuration data storage 5, the phonetic value configuration data searcher 25 retrieves [d r i: m], [dz r i: m], and [d 1 i: m] as the phonetic value configuration data for a word 'dream'.

[50] When the phonetic values which are available in accordance with the language phenomena such as the accent, speed, etc. are not stored in the phonetic value configuration data storage 5, the phonetic value configuration data searcher 25 may produce the phonetic values on the basis of additional phonetic value generation rule to acquire all possible phonetic value configuration data for each word. Herein, the phonetic value generation rule is a rule for generating new phonetic value configuration data for a word by modifying the basic phonetic value configuration data for the word extracted from the phonetic value configuration data storage 5 by reflecting that the same phonetic value (or phoneme) may be transformed to various pronunciations or deleted by the accent, the speed, etc. For example, the phonetic value generation rule may include a rule that when a consonant 'n' comes just before an initial consonant 't' of a syllable without an accent, the consonant 't' may not be

pronounced.

[51] The phonetic value configuration data selector 35 chooses target phonetic value configuration data for indicating the pronunciation information from the phonetic value configuration data for each word which are retrieved by the phonetic value configuration data searcher 25. As shown in FIG. 3, the phonetic value configuration data selector 35 may receive phonetic condition such as the pronunciation phenomena, etc. which will be applied to the input text through the phonetic condition input unit 36 or receive the voice information corresponding to the input text through the voice input unit 37.

[52] The phonetic condition is information on speaking speed or the accents of the words included in the input text, or information on the pronunciation phenomena such as omission, contraction, etc. as the speaking condition for the input text. The voice information is acquired by receiving and converting the pronunciation of the native speaker who pronounces the input text into the vocal sound data.

[53] When the phonetic value configuration data selector 35 receives the phonetic condition from the phonetic condition input unit 36, the phonetic value configuration data selector 35 chooses phonetic value configuration data which conforms with the phonetic condition from the phonetic value configuration data for each word retrieved by the phonetic value configuration data searcher 25. At this time, information on conformity degree for each phonetic condition may be stored in the phonetic value configuration data in advance.

[54] When the phonetic value configuration data selector 35 receives the voice information from the voice information input unit 37, the phonetic value configuration data selector 35 chooses the most similar phonetic value configuration data as the voice information by comparing the voice information with the phonetic value configuration data for each word retrieved by the phonetic value configuration data searcher 25. That is, the phonetic value configuration data selector 35 chooses the most similar phonetic value configuration data for each word as the voice information inputted through the speaking of the native speaker from one or more phonetic value configuration data for each word retrieved by the phonetic value configuration searcher 25. Herein, in the method of choosing the phonetic value configuration data for each word, the phonetic value configuration data selector 35 chooses the most similar phonetic value configuration data for each word by comparing the phonetic characteristics of the voice information with the phonetic characteristics of the phonetic value configuration data for each word. In addition, since a method that the phonetic value configuration data selector 35 chooses the most similar phonetic value configuration data for each word as the inputted voice information can be appreciated by those skilled in the art, detailed description thereof will be omitted.

[55] The phonetic value configuration data selector 35 chooses phonetic value configuration data which is the most equivalently conformed with the phonetic condition and the voice information when both the phonetic condition and the voice information are inputted. When neither the phonetic condition nor the voice information are inputted or two or more phonetic value configuration data conform with the phonetic condition and/or the voice information for a word, the phonetic value configuration data selector 35 chooses the most general phonetic value configuration data for the word or allows the user to choose the phonetic value configuration data for the word.

[56] Meanwhile, as shown in FIG. 3, the constituent phonetic values and the characteristics of the constituent phonetic values in the phonetic value configuration data uniquely chosen for each word by the phonetic value configuration data selector 35 may be adjusted by a phonetic value configuration data editor 38. That is, the phonetic value configuration data editor 38 adjusts the phonetic value configuration data chosen by the phonetic value configuration data selector 35 so that the phonetic value configuration data further conforms with the phonetic condition or the voice information. Herein, the adjustment of the characteristic of the phonetic value means adjustment of the speaking length of the phonetic value, etc.

[57] The pronunciation information searcher 45 retrieves the state information of the vocal organs corresponding to each of the phonetic values included in the phonetic value configuration data for each word retrieved by the phonetic value configuration data selector 35 from the pronunciation information storage 15. That is, the pronunciation information searcher 45 retrieves the state information of the vocal organs such as the mouth, the tongue, the jaw, the in-mount part, the soft palate, the nose, etc. which are the human body's organs that participate in pronouncing the voice corresponding to the phonetic value configuration data for each word chosen by the phonetic value configuration data selector 35. Herein, the pronunciation information searcher 45 retrieves images or features of the vocal organs corresponding to the phonetic value configuration data for each word.

[58] As the pronunciation information searcher 45 has been explained in describing the pronunciation information storage 15, the pronunciation information searcher 45 extracts the representative state information of the vocal organs for each pronunciation and may further extract the state information of the vocal organs in transition between pronunciations.

[59] The pronunciation information searcher 45 retrieves the voice information corresponding to the phonetic value configuration data for each word chosen by the phonetic value configuration data selector 35 from the pronunciation information storage 15. That is, the pronunciation information searcher 45 retrieves the voice information for each of the phonetic values included in the phonetic value configuration

data for each word chosen by the phonetic value configuration data selector 35 from the voice information for each of a plurality of phonetic values stored in the voice information storage 15.

[60] The pronunciation information generator 55 produces animation corresponding to the input text on the basis of the state information of the vocal organs corresponding to the phonetic values included in the phonetic value configuration data chosen by the phonetic value configuration data selector 35. That is, the pronunciation information generator 55 produces movement animation of the vocal organs that participate in pronouncing the input text on the basis of the state information of the vocal organs retrieved by the pronunciation information searcher 45.

[61] When the accent and the pronouncing time(pronouncing time length) are available for each phonetic value included in the input text, the pronunciation information generator 55 indicates the accent by means of an accent bar or an effect image or displays an image corresponding to the phonetic value on a screen as long as the pronouncing time.

[62] When the state information of the vocal organs corresponding to the transition progress between the phonetic values is extracted together with the time of the phonetic value transition, the pronunciation information generator 55 displays the images for the transition progress on the screen as long as the transition time by positioning the images for the transition progress between two images corresponding to the representative state information of the vocal organs for the phonetic values. At this time, when two or more images for the transition progress between the phonetic values are displayed, the pronunciation information generator 55 sets a total time in which all the images for the transition progress are displayed in accordance with the transition time between the corresponding phonetic values.

[63] When the state information of the vocal organs are individual images for each vocal organ, the pronunciation information generator 55 produces ultimate images of the animation through image synthesis and when the state information of the vocal organs are the features, the pronunciation information generator 55 draws and produces the corresponding vocal organs in one image in accordance with the features for each vocal organ. Of course, when the state information of the vocal organs are not the individual images for vocal organ but a mixed single image, the pronunciation information generator 55 intactly links the images to produce the animation.

[64] Further, the pronunciation information generator 55 synthesizes the voice information retrieved by the pronunciation information searcher 45 to produce continuous voice and synchronize the voice with the animation. That is, the pronunciation information generator 55 produces the animation to which continuous voice are added by synchronizing the voice information retrieved and synthesized on the

basis of the phonetic value configuration data for each word with the animation produced by the pronunciation information generator 55. But when the corresponding voice information of the native speaker is inputted through the voice input unit 37, pronunciation information generator 55 may synchronize the voice information of the native speaker with the animation.

[65] The indication unit 60 indicates at least one among the phonetic value configuration data for each word retrieved by the phonetic value configuration data searcher 25, the phonetic value configuration data singly chosen for each word by the phonetic value configuration data selector 35, and the animation and the voice information produced by the pronunciation information generator 55. That is, the indication unit 60 displays the phonetic value configuration data for each word included in the input text and the animation corresponding to the input text on the screen. Further, the indication unit 60 outputs the voice information corresponding to the input text in the form of the vocal sound.

[66] Meanwhile, the apparatus for indicating the pronunciation information according to the embodiment of the present invention may further include a word information editor and a pronunciation information editor which are not shown in the drawings.

[67] The word information editor updates the phonetic value configuration data for corresponding words on the basis of the inputted update information. That is, the word information editor updates the phonetic value configuration data stored in the phonetic value configuration data storage 5 into the update information inputted through the input unit 20.

[68] The pronunciation information editor updates the state information of the vocal organs on the basis of inputted change information of the vocal organ. That is, the pronunciation information editor updates the state information of the vocal organ stored in the pronunciation information storage into the change information of the vocal organs inputted into the input unit. Herein, the pronunciation information editor additionally provides a graphic tool, etc. to modify the state information of the vocal organs stored in the pronunciation information storage and update the state information of the vocal organs stored in the pronunciation information storage into the modified state information of the vocal organs.

[69] As described above, although it has been illustrated that the apparatus for indicating pronunciation information operates only when the apparatus for indicating pronunciation information includes the phonetic value configuration data storage 5, the pronunciation information storage 15, the input unit 20, the phonetic value configuration data searcher 25, the phonetic value configuration data selector 35, the pronunciation information searcher 45, the pronunciation information generator 55, and the indication unit 60, the apparatus for indicating pronunciation information are not

limited thereto, but the apparatus for indicating pronunciation information may be configured as shown in FIG. 3 and may process the corresponding functions. The functions of the units are divided for convenience of description, but are not necessarily limited to the above-mentioned divided state.

[70] Hereinafter, a method for indicating pronunciation information according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. FIG. 4 is a flowchart for illustrating a method for indicating pronunciation information according to an embodiment of the present invention. FIG. 5 is a flowchart for illustrating the steps for choosing phonetic value configuration data. FIG. 6 is a diagram for illustrating the steps for choosing phonetic value configuration data. FIG. 7 is a diagram for illustrating the steps for outputting phonetic value configuration data.

[71] First, the input unit 20 receives an input text to be displayed in the form of the animation from the user (SlO). Herein, the input unit 20 receives an input text including at least one among the phoneme, the syllable, the word, the phrase, and the text of which animation is wanted to be generated from a user.

[72] The phonetic value configuration data searcher 25 retrieves phonetic value configuration data for each word included in the input text inputted through the input unit 20 from the phonetic value configuration data storage 5 (S20). For example, when a sentence "I have a dream." is inputted as the input text, the phonetic value configuration data searcher 25 retrieves all available phonetic value configuration data corresponding to each of the words (i.e., T, 'have', 'a', and 'dream') from the phonetic value configuration data storage 5. That is, the phonetic value configuration data searcher 25 retrieves [ay] and [a] which are phonetic value configuration data for the word T, [h aa v], [h ah v], [ah v] and [h aa b] which are phonetic value configuration data of the word 'have', [ah] and [ax] which are phonetic value configuration data for the word 'a', and [d r i: m], [dz r i: m], and [d 1 i: m] which are phonetic value configuration data for the word 'dream'.

[73] Next, the phonetic value configuration data selector 35 chooses target phonetic value configuration data for each word included in the example sentence from the phonetic value configuration data which are retrieved in step S20. As shown in FIG. 5, when the phonetic condition is inputted (S32; YES), the phonetic value configuration data selector 35 chooses the phonetic value configuration data for each word, which conform with the phonetic condition inputted, among the different phonetic value configuration data for each word retrieved in step S20 (S33). For example, after [h aa v], [h ah v], and [ah v] are retrieved as the phonetic value configuration data for the word 'have', when 'vowel reduction' is inputted as the phonetic condition on the word 'have', the phonetic value configuration data selector 35 chooses [h ah v] that reflects a

reduction phenomenon among all phonetic value configuration data for the word have . Further, the phonetic value configuration data selector 35 may also set the speaking lengths of the constituent phonetic values on the basis of general speaking lengths of corresponding consonants and vowels.

[74] Further, the phonetic value configuration data selector 35 chooses the most similar phonetic value configuration data for each word as the voice information inputted among the phonetic value configuration data for each word retrieved in step S20 (S35) when the voice information is inputted (S34; YES). For example, when voice information (i.e., voice information corresponding to 'have' in voice of I have a dream' which the native speaker pronounces) of the native speaker for 'have' is inputted, the phonetic value configuration data selector 35 compares the characteristics of the inputted voice information (i.e., a voice waveform of 'have') with the phonetic characteristics of the retrieved phonetic value configuration data (i.e., [h aa v], [h ah v], and [ah v]) for the word have . The phonetic value configuration data selector 35 chooses [h aa v] having the highest similarity when the similarity of [h aa v] is 90%, the similarity of [h ah v] is 60%, and the similarity of [ah v] is 20%. Further, the speaking lengths of the constituent phonetic values are also set altogether. For example, the speaking length of [aa] in [h aa v] is 0.2 sec. in accordance with the voice information.

[75] Further, the phonetic value configuration data selector 35 displays the phonetic value configuration data to allow the user to choose target phonetic value configuration data when two or more phonetic value configuration data for a word remain even after step S33 or S35 (S36). For example, when [d r i: m], [dz r i: m] and [d 1 i: m] are retrieved as phonetic value configuration data for the word 'dream' in step S20, the phonetic value configuration data selector 35 presents all remaining phonetic value configuration data to the user after step S33 or S35. Herein, as shown in FIG. 6, the pho netic value configuration data selector 35 configures and indicates the screen to choose only one among the indicated phonetic value configuration data for each word.

[76] The phonetic value configuration data editor 38 adjusts the phonetic values and the characteristics of the phonetic values included in the phonetic value configuration data uniquely chosen for each word in step S33 or S35 (S37). For example, when a target to be indicated as the pronunciation information is not [h aa b] but [h aa v] in the case in which [h aa b] is chosen with 'have of the example, the phonetic value V is substituted for 'b' and the speaking length is adjusted.

[77] When the phonetic value configuration data is chosen in step S30, the pronunciation information searcher 45 retrieves pronunciation information corresponding to the chosen phonetic value configuration data (S40). For example, when [d r i: m] is chosen by the user, the pronunciation information searcher 45 retrieves the state information of the vocal organs in accordance with a pronunciation of [d r i: m] from the pro-

nunciation information storage 15.

[78] The pronunciation information searcher 45 retrieves pronunciation information corresponding to the chosen target phonetic value configuration data for each word, which is chosen in step S30, from the pronunciation information storage 15 (S40). For example, the pronunciation information searcher 45 retrieves the state information of the vocal organs corresponding to the phonetic values V, 'aa', and V included in the phonetic value configuration data [h aa v] retrieved in step S30 from the pronunciation information storage 15. At this time, the state information of the vocal organs (i.e., mouth, tongue, jaw, in-mouth part, soft palate, nose, etc.) retrieved by the pronunciation information searcher 45 includes one among a two-dimensional image, a three-dimensional image, and the features (i.e., a coordinate value, etc.) of the vocal organs.

[79] The pronunciation information searcher 45 retrieves the voice information corresponding to each phonetic value included in the chosen phonetic value configuration data for each word, which is chosen in step S30, from the pronunciation information storage 15 (S40). For example, when [h aa v] is retrieved in step S30, the pronunciation information searcher 45 retrieves the voice information corresponding to the phonetic values (i.e., V, 'aa', and V).

[80] Next, the pronunciation information generator 55 produces the animation corresponding to the input text by using the pronunciation information retrieved in step S40 (S50). That is, the pronunciation information generator 55 animates the state images of the vocal organs (i.e., mouth, tongue, jaw, in-mouth part, soft palate, nose, etc.) being changed in accordance with the pronunciation of the input text in the form shown in FIG. 7. Herein, the pronunciation information generator 55 may produce the animation by linking two-dimensional images or produce the animation by using a mapping technique on the basis of the features of the vocal organs corresponding to pronunciation.

[81] Further, the pronunciation information generator 55 synthesizes the voice information retrieved in step S40 to produce continuous voice and synchronize the voice with the animation. But, when the voice information corresponding to the input text is inputted in step S34, the inputted voice information is synchronized with the animation.

[82] The indication unit 60 outputs the animation synchronized with the voice information

(S60).

[83] As described above, an apparatus and a method for indicating pronunciation information can provide an environment in which a language learner can intuitively appreciate a pronunciation principle of a learning target language and a difference in pronunciation between a native speaker and the language learner and naturally get

accustomed to all pronunciations of the corresponding language by being familiarized with a phoneme to a text.

[84] Although preferred embodiments of the present invention have been described, it will be appreciated by those skilled in the art that various modifications and changes may be made without departing from the appended claims of the present invention. For example, the voice information for each word or each phonetic value configuration data may be further stored in the pronunciation information storage, the pronunciation information searcher may retrieve the voice information, and the pronunciation information generator may produce the continuous voice on the basis of the voice information. Industrial Applicability

[85] Since an apparatus and a method for indicating pronunciation information according to the present invention can generate and indicate a movement animation of vocal organs in accordance with an input text and it s corresponding pronunciation, the apparatus and method for indicating pronunciation information may be used in a language education field with respect to various languages. Therefore, it is expected that the apparatus and method for indicating pronunciation information will be able to contribute to revitalization of an educational industry.