Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR EDITING SPEECH RECOGNIZED TEXT
Document Type and Number:
WIPO Patent Application WO/2011/075890
Kind Code:
A1
Abstract:
The present invention relates to a method and an apparatus for editing a speech recognized text. The present invention provides a method comprising the steps of : selecting at least one character to be edited in a speech recognized text; recognizing at least one new character input by a user; and updating said character to be edited with said new character. The method and apparatus according to the present invention provides an improved method for accurately and quickly editing a speech recognized text and conveniently fusion of speech input and handwriting input, particularly finger input, so that operations when the user amends the speech recognized text become very simple.

Inventors:
LIU, Huanglingzi (1-12G, Shi Yu Yuan Lan Dian Chang Mid-Road, Beijing 7, 100097, CN)
KYYRA, Juha-Matti Kalevi (Peltotie 8, Somero, Somero, FI)
GUO, Yongguang (#2 Building, No.5 Dong Huan Mid-Road Economic-Technical Development Area, Beijing 6, 100176, CN)
Application Number:
CN2009/075873
Publication Date:
June 30, 2011
Filing Date:
December 23, 2009
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA CORPORATION (Keilalahdentie 4, FI- Espoo, Espoo, FI)
LIU, Huanglingzi (1-12G, Shi Yu Yuan Lan Dian Chang Mid-Road, Beijing 7, 100097, CN)
KYYRA, Juha-Matti Kalevi (Peltotie 8, Somero, Somero, FI)
GUO, Yongguang (#2 Building, No.5 Dong Huan Mid-Road Economic-Technical Development Area, Beijing 6, 100176, CN)
International Classes:
G10L15/04
Attorney, Agent or Firm:
KING & WOOD PRC LAWYERS (31/F, Office Tower A 39 Dongsanhuan Zhonglu, Chaoyang District, Beijing 2, 100022, CN)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A method, comprising the steps of:

selecting at least one character to be edited in a speech recognized text;

recognizing at least one new character input by a user; and

updating said character to be edited with said new character .

2. The method according to claim 1, wherein said character to be edited is a single character in the speech recognized text.

3. The method according to claim 1 or 2, wherein said new character input by a user is a single character .

4. The method according to any of claim 1-3, wherein said new character input by the user is recognized by speech recognition or handwriting recognition.

5. The method according to any of claim 1-4, wherein said character to be edited in the speech recognized text is selected after the speech recognized text is enlarged.

6. The method according to any of claim 1-5, wherein said character to be edited in the speech recognized text is selected in an enlarged area different from a text editing area.

7. The method according to any of claim 1-6, wherein said new character is input in a blank screen.

8. An apparatus, comprising:

selecting means for selecting at least one character to be edited in a speech recognized text;

recognition means for recognizing new characters input by the user; and

updating means for updating said character to be edited with said new character.

9. The device according to claim 8, wherein said character to be edited is a single character in the speech recognized text.

10. The apparatus according to claim 8 or 9, wherein said new character input by a user is a single character .

11. The apparatus according to any of claim 8-10, wherein said new character input by the user is recognized by speech recognition or handwriting recognition.

12. The apparatus according to any of claim 8-11, wherein

said character to be edited in the speech recognized text is selected after the speech recognized text is enlarged .

13. The apparatus according to any of claim 8-12, wherein

said character to be edited in the speech recognized text is selected in an enlarged area different from a text editing area.

14. The apparatus according to any of claim 8-13, wherein

said new character is input in a blank screen.

15. An apparatus, comprising:

at least one processor and at least one memory including compute program code,

the memory and the computer program code configured to, with the processor, cause the apparatus at least to perform: selecting at least one character to be edited in a speech recognized text;

recognizing at least one new character input by a user; and

updating said character to be edited with said new character .

16. A computer program product, comprising at least one computer readable storage medium having a computer readable program code portion stored thereon, the computer readable program code portion comprising: program code instructions for selecting at least one character to be edited in a speech recognized text; program code instructions for recognizing at least one new character input by a user; and

program code instructions for updating said character to be edited with said new character.

Description:
METHOD AND APPARATUS FOR EDITING SPEECH RECOGNIZED TEXT

FIELD OF THE INVENTION

[01] The present invention relates to a method and apparatus for inputting text, and particularly to a method and an apparatus for editing a speech recognized text .

BACKGROUND OF THE INVENTION

[02] Speech recognition technology can be used by a user to input text rapidly in a large amount. When it is not convenient for a user to operate a keyboard, for example, a handicapped user or a user who is driving a car, a user equipment receives the user's speech input and a speech recognition engine recognizes the speech as text so as to provide much convenience for the user. The speech recognition engine can be provided on the user equipment which, after receiving the audio frequency input by the user, directly recognizes text from the audio frequency input by the user. Another way is providing a speech recognition server. The audio frequency input by the user is transferred via a network to the speech recognition server which carries out the speech recognition.

[03] There are some limitations for the speech recognition technology. For example, due to limitations of languages themselves and influences of audio quality and ambient noise, the recognition rate of the speech recognition engine might not be high. Much research is carried out in an attempt to improve the recognition rate of speech recognition.

[04] In addition, some equipment such as a personal computer or hand-held equipment provided with a writing pad, a mouse or a touch screen can employ handwriting. The handwriting recognition engine receives a trajectory of a stylus or mouse or a trajectory of a finger on a touch screen, and recognizes the text written by the user by recognizing letters and strokes of Chinese characters. The recognition rate of the handwriting recognition engine might not be high subjected to the limitations of factors such as the speed of the user' s movement and identiflability of the handwritten characters.

[05] At present, the processing ability of handheld equipment becomes more and more, powerful, and speech recognition can be carried out on handheld equipment. A touch screen on handheld equipment can provide convenience for user's handwriting input. Therefore, how to better combine the two input modes will improve the user's experience in using the handheld equipment.

[06] When the speech recognition engine cannot correctly recognize the text, the user wishes to amend the text. Usually, the user' s amendments to the recognition text all focus on words, for example, the speech recognition engine provides a candidate list of words for the user's selection. If the candidate list does not contain a correct word, the user says the correct word again and the speech recognition engine recognizes again so that the wrong word can be updated. In addition, the user can employ another input mode such as handwriting input mode to update the wrong words in the speech recognized text. [07] The US patent application US2009/0228273A1 discloses a method for correction by combining speech recognition input with handwriting input, wherein when the speech recognized text is incorrect, handwriting input is employed to indicate the positions to be amended and amendment modes. A word or phrase is circled to indicate that the circled word or phrase needs to be substituted. Therefore, the correct word or phrase is input to amend the speech recognized text.

[08] There are some problems with the above modes for editing and updating the speech recognized text. No matter whether it is speech recognition or handwriting recognition, incorrect recognition is possible, and the user needs to input for many times or change the input mode. After repetition for many times, the user might abandon speech input or handwriting input and have to use keyboard input. Such operations are not convenient enough and need to be improved. SUMMARY OF THE INVENTION

[09] If the text provided by a speech recognition engine is incorrect and only one or several characters in the word is/are wrong, a user can only edit the characters so as to facilitate the user's operations. Besides, a precision rate of a speech recognition engine recognizing an individual character is much higher than the precision rate for a word or sentence, and a precision rate of a handwriting recognition engine recognizing an individual character is much higher than the precision rate for a word. When the user designates to edit the individual characters, the recognition engine only needs to recognize the newly input individual characters. As such, the user can conveniently use correct characters in place of original wrongly recognized characters.

[10] One of the objects of the present invention is to provide a method and an apparatus for accurately and quickly editing a speech recognized text.

[11] The other object of the present invention is to provide a method for conveniently fusion of speech input and handwriting input, particularly finger input, so that operations when the user amends the speech recognized text become very simple.

[12] According to one aspect of the present invention, a method is provided, comprising: selecting at least one character to be edited in a speech recognized text; recognizing at least one new character input by a user; and updating said character to be edited with said new character .

[13] According to a preferred embodiment of the present invention, the character to be edited is a single character in the speech recognized text.

[14] According to a preferred embodiment of the present invention, the new characters input by the user are individual characters .

[15] According to a preferred embodiment of the present invention, said new character input by the user is recognized by speech recognition or handwriting recognition .

[16] According to a preferred embodiment of the present invention, a blank screen is opened to input new characters [17] According to a preferred embodiment of the present invention, said character to be edited in the speech recognized text is selected in an enlarged area.

[18] According to a preferred embodiment of the present invention, said character to be edited in the speech recognized text is selected in an enlarged area different from a text editing area.

[19] According to another aspect of the present invention, an apparatus is provided, comprising: selecting means for selecting at least one character to be edited in a speech recognized text; recognition means for recognizing new characters input by the user; and updating means for updating the selected characters with new characters.

[20] According to another aspect of the present invention, an apparatus is provided, comprising: at least one processor and at least one memory including compute program code, the memory and the computer program code configured to, with the processor, cause the apparatus at least to perform: selecting at least one character to be edited in a speech recognized text; recognizing at least one new character input by a user; and updating said character to be edited with said new character.

[21] According to another aspect of the present invention, a computer program product is provided, comprising at least one computer readable storage medium having a computer readable program code portion stored thereon, the computer readable program code portion comprising: program code instructions for selecting at least one character to be edited in a speech recognized text; program code instructions for recognizing at least one new character input by a user; and program code instructions for updating said character to be edited with said new character.

BRIEF DESCRIPTION ON THE DRAWINGS

[22] The aforesaid and other features of the present invention will be made more apparent by describing the present invention in detail with reference to embodiments illustrated in the accompanying drawings, wherein the same reference number designates, the same or similar component. In the drawings,

[23] Fig.l illustrates a flow chart of a method for editing speech recognized text according to the present invention;

[24] Fig.2 illustrates a block diagram of an apparatus for editing speech recognized text according to the present invention;

[25] Fig. 3 illustrates a main user interface for editing speech recognized text according to one embodiment of the present invention;

[26] Fig. 4 illustrates a user interface for selecting the characters to be edited according to one embodiment of the present invention;

[27] Fig. 5 illustrates a user interface for recognizing new character input by the user according to one embodiment of the present invention;

[28] Fig. 6 illustrates a user interface for updating the character to be edited according to one embodiment of the present invention;

[29] Fig.7 is a planar front view of an apparatus according to one embodiment; and [30] Fig.8 is a block diagram of main framework of the apparatus according to one embodiment.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[31] The present invention will be described hereunder by embodiments with reference to the drawings .

[32] A method for editing speech recognized text according to the present invention is described with reference to Fig.l.

[33] As shown in Step 102, when a user needs to edit the speech recognized text, said character to be edited in the speech recognized text is selected. It should be appreciated that it is very favorable in the present invention that a single character to be edited in a speech recognized word is selected. For example, if the single character "c" in the recognized word should be amended to "k", then the user selects the single character "c" and re-speaks the character "k" or handwrites the character "k". The recognition engine will easily recognize the correct character "k and the wrong character "c" is replaced by the correct "k".

[34] It is appreciated that there exists a circumstance in which one or more wrong characters in a speech recognized word need to be replaced by one or more correct characters. For example, a plurality of characters in the speech recognized word are wrong, for example, "ph" should be amended to "f". At this time, the user selects the wrong characters "ph" and re-inputs the character "f". In a situation opposite to this, when a single character in the recognized text is wrong, for example, "f" should be amended to "ph", whereupon the user selects the wrong character "f" and re-inputs the characters "ph" via speech input or handwriting input. It is possible that a plurality of characters in the speech recognized text are wrong and need to be replaced by a plurality of characters .

[35] In addition, the user can also select characters to be edited in many modes. For example, the user can use a mouse or direction keys on a keyboard to move the cursor to select characters. If a touch screen is used, the user can use a stylus or finger to touch the displayed characters for selection.

[36] Since the resolution rate of a finger on a touch screen is relatively low, it might be inconvenient to use a finger to select a single character. In this scenario, the speech recognized text is first enlarged, and then said character to be edited in the speech recognized text is selected. For example, when the user' s finger touches the screen, a magnified glass is provided on the user interface to help the user to select the character to be edited. Besides, the spacing between characters can be increased or the character pattern or font may be changed to facilitate the operations of the user selecting the characters .

[37] Fig. 3 illustrates an improved user interface of fusion of speech recognition and finger input according to one embodiment of the present invention. Operations of this fusion of speech recognition and finger input facilitate selection of the character to be edited. In the user interface of Fig. 3, the textbox in the upper portion of the screen displays the speech recognized text. An editor tape is provided in the lower portion of the screen and dedicated to enlarging the corresponding characters in the speech recognized text. The user can click on the text in the textbox or move a selection line in the textbox so that the characters nearby the clicked area or the corresponding characters on the selection line are displayed on the large buttons in the editor tape. The user can click on one or more large buttons on the editor tape to select the corresponding character to be edited in the speech recognized text, as shown in Fig.4.

[38] Then, in Step S104, one or more new characters input by the user are recognized. It is appreciated that the newly input characters can be recognized by conventional speech recognition or handwriting recognition. For example, after the user selects the character "c" to be edited, he speaks or handwrites a new character "k". The speech recognition engine or handwriting recognition engine will recognize the character "k".

[39] Fig. 5 illustrates a user interface for recognizing new characters input by the user according to one embodiment of the present invention, wherein a blank screen is opened to input a new character. It is appreciated that the input of handwriting can be done in the editing area or the full screen of the touch screen.

[40] Then, in Step S106, the new character is used to update the selected character so as to obtain the correct text .

[41] Fig.2 illustrates a block diagram of a construction of an apparatus for editing speech recognized text according to the present invention.

[42] An apparatus 200 for editing the speech recognized text of the present invention comprises selecting means 202 for selecting at least one character to be edited in the speech recognized text; recognition means 204 for recognizing at least one new characters input by the user; and updating means 206 for updating said character to be edited with said new character.

[43] The method for editing speech recognized text according to the present invention conveniently fuse speech input and handwriting input, particularly finger input. The operations when the user amends the speech recognized text are very simple. When only few characters in the speech recognized text are wrong, for example, amending the recognized "practise" to "practice", the editing operations are very accurate and quick.

[44] It should be appreciated that the characters defined in the application documents are not limited to 26 English letters, and other recognizable characters or symbols can be contained in the scope of the present invention.

[45] Fig. illustrates a mobile terminal 700 that can be used as the apparatus 200 for editing the speech recognized text in more detail. The mobile terminal 700 comprises a speaker or earphone 702, a microphone 706, a touch display 703 and a set of keys 704 which may include virtual keys 704a, soft keys 704b, 704c and a joystick 705 or other type of navigational input device.

[46] The internal component, software and protocol structure of the mobile terminal 700 will now be described with reference to FIG. 8. The mobile terminal has a controller 800 which is responsible for the overall operation of the mobile terminal and may be implemented by any commercially available CPU ("Central Processing Unit"), DSP ("Digital Signal Processor") or any other electronic programmable logic device. The controller 800 has associated electronic memory 802 such as RAM memory, ROM memory, EEPRO memory, flash memory, or any combination thereof. The memory 802 is used for various purposes by the controller 800, one of them being for storing data used by and program instructions for various software in the mobile terminal. The software includes a real-time operating system 820, drivers for a man-machine interface (MMI) 834, an application handler 832 as well as various applications. The applications can include a message text editor 850, a hand writing recognition (HWR) application 860, as well as various other applications 870, such as applications for voice calling, video calling, sending and receiving Short Message Service (SMS) messages, Multimedia Message Service (MMS) messages or email, web browsing, an instant messaging application, a phone book application, a calendar application, a control panel application, a camera application, one or more video games, a notepad application, etc. It should be noted that two or more of the applications listed above may be executed as the same application.

[47] The MMI 734 also includes one or more hardware controllers, which together with the MMI drivers cooperate with the first display 836/703, and the keypad 838/704 as well as various other I/O devices such as microphone, speaker, vibrator, ringtone generator, LED indicator, etc. As is commonly known, the user may operate the mobile terminal through the man-machine interface thus formed.

[48] The software can also include various modules, protocol stacks, drivers, etc., which are commonly designated as 830 and which provide communication services (such as transport, network and connectivity) for an RF interface 806, and optionally a Bluetooth interface 808 and/or an IrDA interface 810 for local connectivity. The RF interface 806 comprises an internal or external antenna as well as appropriate radio circuitry for establishing and maintaining a wireless link to a base station. As is well known to a man skilled in the art, the radio circuitry comprises a series of analogue and digital electronic components, together forming a radio receiver and transmitter. These components include, band pass filters, amplifiers, mixers, local oscillators, low pass filters, AD/DA converters, etc.

[49] The mobile terminal also has a SIM card 804 and an associated reader. As is commonly known, the SIM card 804 comprises a processor as well as local work and data memory.

[50] The various aspects of what is described above can be used alone or in various combinations. The teaching of this application may be implemented by a combination of hardware and software, but can also be implemented in hardware or software. The teaching of this application can also be embodied as computer readable code on a computer readable medium. It should be noted that the teaching of this application is not limited to the use in mobile communication terminals such as mobile phones, but can be equally well applied in Personal digital Assistants (PDAs), laptops, drawing pads, personal organizers or any other device designed for accepting touch or pen-based user input .

[51] It should be noted that some more specific technological details which are publicly known for those skilled in the art and are requisite for implementation of the present invention are omitted in the above description to make the present invention more easily understood.

[52] The description of the present invention is provided for illustration and depiction purpose not for listing all the embodiments or limiting the present invention to the disclosed form. It is apprehended by those skilled in the art that many modification and variations are obvious. Those skilled in the art should appreciate that the method and device in the embodiments of the present invention are achieved by means of software, hardware, firmware or combination thereof. The hardware portion can be realized by using special logic; the software portion can be stored in a memory and executed by an appropriate instruction execution system such as a microprocessor, a PC or a mainframe computer.

[53] Therefore, it should be appreciated that the above preferred embodiments are selected and described to better illustrate principles of the present invention and actual applications thereof, and to enable those having ordinary skill in the art to understand that without departure from the essence of the present invention, all the modifications and variations fall within the scope of protection of the present invention as defined by the appended claims.