Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR ERROR RECOVERY IN METHOD AND DEVICE FOR RECOGNISING A USER PRESENTATION THROUGH ASSESSING THE RELIABILITY OF A LIMITED SET OF HYPOTHESES
Document Type and Number:
WIPO Patent Application WO/2000/016311
Kind Code:
A1
Abstract:
For recognizing a user presentation that is represented as a plurality of physical components through assessing a feasibility of a provisional recognition, various presentation components thereof are recognized in successive steps. In particular, along with the successive steps of the recognizing, a limited set of hypotheses is built for the presentation. At predetermined instants along the steps, these hypotheses are matched with an associated reliability level. Upon non-meeting of the reliability level with respect to one or more components as signalling an impending dead-end situation, the feasibility is improved by for the latter such component driving the method to an error recovery procedure to improve the reliability of one or more elements of the limited set.

Inventors:
RUEBER BERNHARD J (NL)
KELLNER ANDREAS (NL)
SCHRAMM HAUKE (NL)
Application Number:
PCT/EP1999/006669
Publication Date:
March 23, 2000
Filing Date:
September 09, 1999
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KONINKL PHILIPS ELECTRONICS NV (NL)
PHILIPS CORP INTELLECTUAL PTY (DE)
RUEBER BERNHARD J (NL)
KELLNER ANDREAS (NL)
SCHRAMM HAUKE (NL)
International Classes:
G06F3/16; G10L15/00; G10L15/08; G10L15/10; G10L15/193; G10L15/22; G10L15/26; (IPC1-7): G10L15/10; G10L15/26
Foreign References:
EP0651372A21995-05-03
US5712957A1998-01-27
Other References:
WEINTRAUB M: "LVCSR LOG-LIKELIHOOD RATIO SCORING FOR KEYWORD SPOTTING", PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP '95), DETROIT, USA, vol. 1, 9 May 1995 (1995-05-09) - 12 May 1995 (1995-05-12), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS,, pages 297 - 300, XP000657989, ISBN: 0-7803-2432-3
KELLNER A ET AL: "Strategies for name recognition in automatic directory assistance systems", PROCEEDINGS IEEE 4TH WORKSHOP INTERACTIVE VOICE TECHNOLOGY FOR TELECOMMUNICATIONS APPLICATIONS (IVTTA '98), TORINO, ITALY, IEEE, New York, NY, USA,, pages 21 - 26, XP002127629, ISBN: 0-7803-5028-6
Attorney, Agent or Firm:
Gössmann, Klemens (Internationaal Octrooibureau B.V. Prof. Holstlaan 6 AA Eindhoven, NL)
Download PDF:
Claims:
CLAIMS :
1. A method for recognizing a user presentation that is represented as a plurality of physical components through assessing a feasibility of a provisional recognition, in successive steps recognizing various presentation components thereof, characterized in that along with the successive steps of said recognizing a limited set of hypotheses is built as regarding said presentation, which hypotheses pertain to a combination of components, and which hypotheses at predetermined along said steps are matched with an associated reliability level, and upon nonmeeting of said reliability level with respect to one or more said components as signalling an impending deadend situation, improving such feasibility through for the latter such one or more components driving the method to an error procedure to improve the reliability of one or more elements of said limited set.
2. A method claimed in Claim 1, wherein said user presentation represents a user request that is received in the form of speech.
3. A method as claimed in Claim 1, wherein such reliability level is codetermined by one or more interference parameters that indicate an actual adversity for such recognizing.
4. A method as claimed in Claim 1, said reliability level is assessed for a single of said hypotheses.
5. A method as claimed in Claim 1, wherein said reliability is based on a renormalized probability of the"set of the best".
6. A method as claimed in Claim 1, said reliability is based upon restricting said"set of the best"to a predetermined of such hypotheses.
7. A method as claimed in Claim 1, wherein said reliability is based upon restricting said"set of the best"to hypotheses that have at most a predetermined differential score with respect to an actually most reliable one of said hypotheses.
8. A device being arranged to implement as method as recited in Claim 1, said device having recognizing means for receiving and recognizing a user presentation that is represented as a plurality of physical components through assessing a feasibility of a provisional recognition, by in successive steps recognizing various presentation components, characterized by comprising hypothesis building means fed by said recognizing means for along with the successive steps of said recognizing building a limited set of hypothesis as regarding said presentation, which hypothesis pertain to a combination of components, and which hypotheses at predetermined points along said steps are matched with an associated reliability level, and having signaling output fed to said recognizing means for upon nonmeeting of said reliability level with respect to one or more said components as signaling an impending deadend situation, improving such feasibility through for the latter such one or more components driving the recognizing means to an error recovery procedure to improve the reliability of one or more elements of said limited set.
Description:
METHOD FOR ERROR RECOVERY IN METHOD AND DEVICE FOR RECOGNISING A USER PRESENTATION THROUGH ASSESSING THE RELIABILITY OF A LIMITED SET OF HYPOTHESES BACKGROUND OF THE INVENTION The invention relates to a method as recited in the preamble of Claim 1. A prime application of the method is for recognizing speech that can be a basis for searching in various static or dynamic environments, such as telephone directories, train tables, and other.

Generally, a human user will an entry derived from a data base and defined through a string of request components that must be successively recognized. A prime difficulty is that for one or more of these components the spectrum of choice may be orders of magnitude larger than for other components. For example, in a train time table environment there may be several of destination stations, but the departure time can usually expressed in only a few tens of items, such as the ten digits and combinations thereof, and furthermore certain terms like early, before, and a few others. Other types of databases feature similarly varying levels search diversity. The inventors have appreciated that the different levels complexity among the various subtasks associated to recognizing respective speech items warrant higher level of organization than recognizing on the level of the respective single or request components alone. In particular, the recognizing process has gotten threatens to get a dead-end situation, remedy should be undertaken as early reasonable. Such dead-end situation could occur if the incomplete recognition process still many possible but none of these outcomes can be assessed as representing possibly correct : the arrangement of outcomes would thereby be unfeasible.

The method may in similar manner be used for recognizing other types of user presentation that are represented as a plurality of components, such as a facial image of the user that may be accompanied by a bar-code or another type of code entry, and wherein the face must be matched to a photograph or a set of photographs that is stored in memory. The error remedy may again various, such as scanning theimage another resolution or another colour A still further type of presentation may be a combination of image and speech.

SUMMARY TO THE INVENTION In consequence, amongst otherthings, is an object of the present invention to let the procedure detect the evolving of any that could lead to success, and upon

such detecting, transfer early to an error recovery procedure. Now therefore, according to one ofits aspects, the invention is characterized according to the characterizing part of Claim 1.

Such error recovery may be on the level a single word or component, such as having the user person repeat or spell it. On a higher level, it may be that the system poses an inquisitive or exhortatory statement to elicit from the user an utterance that would be better adapted to the actual recognition facilities ; for example, the user could then choose another formulation for therequest component in question. The invention differs from procedures that test the reliable recognition of only the most recent component, but instead looks to an actually reliable recognition of a combination of components.

The invention also relates to a device being arranged to implement a method as recited supra. Further advantageous aspects of the invention are recited in dependent Claims.

BRIEF OF THE DRAWING These and further aspects and advantages of the invention will be discussed more in detail hereinafter with reference to the disclosure of preferred embodiments, and in particular with reference to the appended Figures that show : Figure 1, exemplary system architecture ; Figure 2, a letter graph for recognizing the word"Miller" ; Figure 3, a flow chart of the procedure of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Figure 1 an exemplary system architecture with an interface 20 to a telephone system not shown. The system comprises network interface and I/O unit 22, speech recognizer 24, special spelling filter or postprocessor 26, speech understanding component 28, speech output unit 30, and data base 32. Further subsystems are : knowledge processing 33, dialog control module 35, reliability assessment module 29, and error recovery module 31. the architecture, most modules will read immediately from database 32 to avoid unnecessary copying of large data sets ; by themselves, most blocks may correspond to elements known from other systems. The gist of the invention is in particular centered in blocks 29,31 control the overall policy. Spelling filter 26 should output one or more sequences of exclusively words with associated probability scores. Subsequent to spelling filter 26, the recognizing ofsingle is finished. For in particular implementing the invention, block 29 assesses actual reliability. If an unreliability is detected, an activation signal sent to block 31 for effecting error This block decides in particular, which utterances have to be recovered. Next, dialog control block 35 decides how the error recovery will have to proceed exactly. For example, the dialog control 28 may ask the user for to repeat

a particular request component, to spell it, or the recovery can be by steering a user person to provide alternative information. The error can be recovered by combining with earlier data that had been derived by blocks 24, 26 from components already before.

As the between speech recognition, spelling, speech understanding, often word graphs are used. A word graph is a compact representation of various plausible sentence or sequence hypotheses. Every individual path through the graph is a hypothesis. At every dialog turn, the speech recognizer processes the caller's utterance and produces a word graph. In the spelling mode, a detailed graph is used that consists of letters and qualifiers, the latter being used for giving extra information on a particular such as the term"double","accented". The spelling mode may be controlled either by the system or by the user person. Otherwise, the recognizer is configure recognize single-word utterances only, so that a word graph then represents one or more candidate strings built from single To achieve a better recognition and real-time operation, the recognizer's vocabulary may switched-over to enable only those words expected in the actual dialog state.

In this respect, 2 gives a letter graph for recognizing the word"Miller". The links between various successive states may get scores for recognizing the letters as shown, wherein the third letter may be recognized as a straight alphabetic element ("L"), or an element of an international alphabet, of which the representations for"T"and"H"have been shown.

Finally, a qualifier"double"can be used to indicate that the next letter"L"occurs twice.

Similar graphs apply to word sequences in a sentence.

Hereinafter, various concrete recognition will be discussed. The hypotheses used for each component thereof carry indication as to the partial reliability of the associated matching the component in question with the example. In particular, a score item is formed the negative of the a posteriorv that is generated by the recognizer As exemplary vehicle, an"N-best list"may be used as an alternative to a word-graph.

Hereinafter, the feasibility is determined for the"set of the best hypotheses", built from the applicable vehicle such as the exemplary N-best list or Word-graph, supra, elements of which which set are selected from all extant hypotheses. In consequence, there is a two-tier selection process, the"set of the best"being selected in the second stage. Now, the elements of an N-best list are {(h1, b1)..... (hN,bN}, the hypothesized recognition results are h ;, associated to a respective score bi, lowest score representing the highest probability.The list elements can be ordered according increasing scores, and the number N need not be uniform over the recognition procedure. Furthermore by way of example, the"set

ofthe best"may consist of only the first F hypotheses, such as according to {(hs, bl),b2), (h3, wherein F=3. In the following, the N-best list is assumed present and the"set of the best"is assumed to consist of the first F<N thereof. The setting of F will behereinafter.

VARIOUS RELIABILITY MEASURES A. Various reliability measures R may be used, of which a first one is the renormalized of the"set of the best" : In the case of the above scores this is simply : f (x) =exp (-ax), wherein for the general the value of a may differ from 1. The"set of the best"may be defined according to one of the following : 1. setting a uniform single value for F ; 2. Empirically setting a score difference Ds, pairs (hi, bi) with(bj-b1) <Ds in the set ; B. A next possible reliability measure R= (1-F/N), is the"relative fraction of the hypotheses not in the"set of the best", which measure is generally combined with the score-distance criterium (2).

C. A next possible reliability measure R is the absolute number of hypotheses in the"set of the best", generally combined with the score-distance criterium.

D. A next possible reliability measure R is the differential score of the first element not in the"set of the best" : R=(bF+i-bi). the"set of the best"may be determined by the straight setting of F, by the above renormalized probability (1). After setting an empirical accumulated probability, 0<pcd is given by the relations PF<PC, butPF, ltP,.

Herein, PF is above renormalized probability of the"best set"with exactly F elements, as given by expression (1).

All described reliability measures have as a consequence for their use, that the feasibility of the set of hypotheses for the combination has a higher probability for recognition if the elements of the set have a higher reliability. The actual level of the reliability may trigger various system responses. In particular, thresholding may be applied. That is, if actual reliability is lower, the hypothesis will be discarded as useless, and a special error recovery procedure will be initiated. For example, a user person will be asked to repeat an information that had already been earlier requested by the system. Otherwise, the hypothesis in

question will be considered useless. The threshold may be set at different levels that are appropriate for the actually present data.

A particular application for the reliability indicator is to calculate it separately for the hypothesis set of each respective user indication. The feasibility of each of these hypothesis sets will be calculated separately, such as according to the above threshold method.

TWO VARIANTS FOR SIMULTANEOUS USAGE OF PLURAL SETS OF HYPOTHESES A. In the first place, the various sets of hypotheses for the individual user indications will be combined to produce a new set for the combined indications. In case of a personal name consisting of given name and family name, the N-best list of the complete name will appear, such as {"Christoph Kramm"and"AlexanderKlamm"}, two family names being particularly difficult to be discriminated from each other. Often, such family name is the first item the recognition system will look to. The discussed procedure can be applied on the combined set of hypotheses in similar manner as described, and therefore will contribute to its reliability. If the set is unfeasible for use, this means that at least one of the respective single hypotheses was not useful. Then, and possibly subject to assessing of the various reliabilities of the respective single hypotheses, an appropriate action may be started.

B. In the scenario of the so-called hierarchical recognition, often only the first recognition step is critical. For example, in searching for the correct extension number, the system must recognize the spelling of a single name out of a set of many personal names, in certain cases up to 200, 000 or even more. Subsequent recognition steps will often have lexica with no more than in the order of ten entries. If now the procedure according to the invention is applied on the sets of hypotheses pertaining to latter non-critical recognition steps, their lack of feasibility will lead to consider the earlier hypotheses that pertain to the critical step as being useless from the start. This procedure can therefore be only combined with the immediate reliability of the first recognition step. If the latter one were nevertheless of high reliability, this leads immediately to concluding that the later step is itself unreliable, which incites to take appropriate measures with respect to this later step.

FURTHER PROCEDURES FOR CALCULATING RELIABILITY CRITERIA In addition to the above discussed procedure, all other standard calculating methods for single reliabilities be used for the new task. The simplest manner to do so is to use these methods for calculating (single) reliability levels of the best hypothesis hi and to use this as a general indication for the overall reliability of the whole set of hypotheses.

Various ones of the procedures may be used also in corresponding for"set of the

best", in corresponding manner as has been done here for the renormalized which had already been used by introducing as a single reliability as described in German Patent Application 19740147. 3, corresponding US Application Serial No. 08/...,..., PHD97127, to present assignee, and herein incorporated by reference.

Thereamong, several criteria can provide indications on the recognizing conditions, even independently from the acquired set of hypotheses, and therefore also independently from the respective single assessments therein. For example, the adaptation quality of noise interference models may indicate whether a speech recognition system is being used in a noisy environment, which offers an actual adversity that leads to recognition difficulties. Other such adverse environments can be due to very loud or very soft speaking, etcetera. Countermeasures will also provide information for the use as discussed in hierarchical recognizing, because they can detect problems with subsequent non-critical recognition steps that follow a useful first recognition step.

Figure 2 a flow chart of the procedure of the invention. In block 40, the procedure is started with assignment of the necessary hardware and software facilities. In block 42 system detects whether for the actual user requirement the procedure has effectively become complete. If yes, in block 56 the procedure is terminated with effecting the appropriate action viz à a user, such as by outputting the required information in speech or hardcopy, or by driving a service authority into rendering appropriate service. Otherwise, the system in block 44 receives the next element of the user utterance or utterances. If appropriate, this may be effected in a dialog organization. The system-outputted speech has been omitted for brevity. In block 46, the next element, is processed by the system for recognition, if appropriate, also for its understanding. As recited earlier, this will produce a set of hypotheses for the element in question. In block 48, the hypotheses are joined with such earlier hypotheses as remaining"in the race"from the recognition process as executed up to then, so that a set of joined hypotheses are formed, each having its own associated reliability quantity. In block 50, the set of hypotheses now formed is assessed with respect to an outlook possibly successful recognition or otherwise, with respect to occurrence a dead-end situation. If no such dead-end situation is expected, the system returns to block 42. If however, the outlook for success is bleak, the system proceeds to block 52, wherein one of the partial hypotheses now present is selected for necessary improvement, and in block 54, corrective is being undertaken. Subsequently, the upgraded hypothesis from the corrective is fed back into block 48. The above upgrading again be effected in a dialog structure, and if the case be, effectively in a smaller version of the overall Figure as shown, which for brevity has been symbolized by the block 54 only.