Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ECHO CANCELING IN A SPEECH PROCESSING SYSTEM
Document Type and Number:
WIPO Patent Application WO/2004/014055
Kind Code:
A1
Abstract:
A speech processing system includes an echo canceller 100 with a first input for receiving a first input signal from a microphone and a second input for receiving a second signal to be cancelled from the first input signal. The echo canceller subtracts the second input signal from the first input signal under control of a set of at least one reverberation parameter. A memory 200 stores a plurality of sets of reverberation parameters. Each of the sets is associated with a predetermined type of room. The system includes means 210 for selecting one of the sets of reverberation parameters for use by the echo canceller.

Inventors:
YAP CHEE KIAN (DE)
Application Number:
PCT/IB2003/003911
Publication Date:
February 12, 2004
Filing Date:
July 24, 2003
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KONINKL PHILIPS ELECTRONICS NV (NL)
YAP CHEE KIAN (DE)
International Classes:
G10K11/178; H04M9/08; (IPC1-7): H04M9/08; G10K11/178
Foreign References:
US20010055985A12001-12-27
US6266408B12001-07-24
EP0719028A21996-06-26
US6137881A2000-10-24
Attorney, Agent or Firm:
Volmer, Georg (Weisshausstr. 2, Aachen, DE)
Download PDF:
Claims:
CLAIMS:
1. A speech processing system including: an echo canceller including a first input for receiving a first input signal from a microphone ; a second input for receiving a second signal to be cancelled from the first input signal; a signal processing unit for subtracting the second input signal from the first input signal under control of a set of at least one reverberation parameter ; and an output for outputting the signal produced by the processing unit; a memory for storing a plurality of sets of reverberation parameters, each associated with a respective predetermined type of room; and means for selecting one of the sets of reverberation parameters for use by the signal processing unit.
2. A speech processing unit as claimed in claim 1, wherein the means for selecting includes a user interface for presenting a user with a choice of sets and for enabling the user to select one of the presented sets.
3. A speech processing system as claimed in claim 2, wherein each predetermined type of room is associated with a description characterizing a reverberation aspect of the room ; the description being stored in the memory; and the user interface being operative to present the description to the user to support the user in choosing one of the predetermined room type best matching a room in which the system is used.
4. A speech processing system as claimed in claim 3, wherein the description includes at least one of the following: shape of the room; size of the room; and acoustic damping characteristic of the room.
5. A system as claimed in claim 3, wherein the system is connected to at least one freely placeable loudspeaker for rendering the second signal; and the description includes a position of the at least one loudspeaker in the room.
6. A system as claimed in claim 1, wherein the system is operative to automatically select one of the sets of reverberation parameters for use by the signal processing unit.
7. A system as claimed in claim 6, wherein the system is connected to a microphone for receiving the first signal and to a loudspeaker for rendering the second signal; the system being operative to perform the automatic selection by causing the loudspeaker to generate an acoustic test signal, sequentially operate the signal processing unit under control of each of the sets of reverberation parameters, and select the set that caused the signal processing unit to output a best cancelled signal.
Description:
Echo canceling in a speech processing system The invention relates to a speech processing system with an echo canceller.

Echo canceling has been used extensively for telephony applications, for example in telephone conferencing, and hands-free phones. A particular need for echo canceling exists for voice control of consumer electronic devices, like a television, that produce sound that can negatively influence the recognition rate. Without special measures, a voice reproduced via loudspeakers of the device could actually control the device.

In itself, echo canceling using adaptive filters are known for telephony applications. US 5,636, 272 describes a typical echo canceller for telephony, wherein a sequence of delay signals is produced and a sequence of corresponding weights. Each delayed signal is multiplied by the corresponding weight. The sum of all weighted delayed signals is an estimate of a reverberated version of the signal being output by the system via a loudspeaker. This estimated signal is subtracted from a signal received via the microphone, to give the'clean'input signal. The weights may be adaptively chosen.

Adaptive echo cancellers suffer from stability problems and are computationally expensive, particularly if high quality canceling is required, for example for voice control applications. Therefore, conventional echo cancellers in voice control applications have been designed for canceling echoes for one'average'room.

The reverberation parameters that control the echo canceller may, for example, have been empirically determined for such an average room, or may be based on simulations.

Performance of such cancellers suffers if the actual room in which the canceller is used deviates substantially from the average room.

It is an object of the invention to provide an improved echo canceller suitable for use in consumer electronic devices.

To meet the object of the invention, the speech processing system includes: an echo canceller including a first input for receiving a first input signal from a microphone; a second input for receiving a second signal to be cancelled from the first input signal; a signal processing unit for subtracting the second input signal from the first input signal under control of a set of at least one reverberation parameter; and an output for outputting the signal produced by the processing unit; a memory for storing a plurality of sets of reverberation parameters, each associated with a respective predetermined type of room; and means for selecting one of the sets of reverberation parameters for use by the signal processing unit.

According to the invention, for several different rooms the relevant reverberation parameters that control the echo canceller have been determined (for example, empirically). The actual parameters and their values depend on the canceller being used (and the canceling algorithm employed by the canceller). A typical reverberation parameter is the reverberation delay time. For the actual room in which the system is used a choice is made from the available sets of parameters. Since the parameters can be optimally chosen during design of the system, advanced mechanisms, like simulations and real-life tests can be used for choosing the optimal settings for a particular exemplary employment of the system. In this way, for the rooms for which the sets of parameters have been made an optimal cancellation can be achieved that can be of a higher quality than can be achieved by conventional real-time adaptive systems.

Moreover, the actual implementation of the system itself can be kept simple (and cost- effective). By choosing rooms that are highly representative of actual rooms commonly used, for a high percentage of actual employments of the system a high quality of canceling can be achieved.

As described in the dependent claim 2, the user can select the set of parameters that optimally suits his actual room and room arrangement.

As described in the dependent claim 3, the system includes for each set of parameters (and thus for each type of room supported by the system) a characterizing description of a reverberation aspect of the room. This simplifies selection by the user of a sample room/room arrangement that best matches his room/room arrangement.

Preferably, at least one of the following reverberation aspects is covered: - shape of the room; - size of the room; and - acoustic damping characteristic of the room.

If the user can place the loudspeaker freely in the room, preferably, the position of the loudspeaker (s) in the room (and thus also the distance to the microphone that is usually built into the system) is used as a reverberation aspect.

As an alternative to the user selecting the set of parameters, the system is operative to automatically select one of the one of the sets of reverberation parameters for use by the signal processing unit, as described in the dependent claim 6. Using the predetermined sets of parameters, the automatic selection can be simple as compared to fully adaptive systems.

As described in the dependent claim 7, the system is connected to a microphone for receiving the first signal and to a loudspeaker for rendering the second signal. The automatic selection is performed by causing the loudspeaker to generate an acoustic test signal, sequentially operate the signal processing unit under control of each of the sets of reverberation parameters, and by selecting the set that caused the signal processing unit to output a best cancelled signal. As a measure, for example, the energy level of the cancelled signal may be used (or a digital strength of the signal).

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.

In the drawings: Fig. 1 shows a block diagram of a preferred embodiment of a voice control module with the echo canceller according to the invention; and Fig. 2 shows a block diagram of a system with the echo canceller.

The invention will be described in more details for echo canceling used in voice control applications for control of audio producing devices, such as a television. It will be appreciated that the echo canceller may be used in other speech processing systems, such as teleconferencing systems. Fig. 1 shows a block diagram of a voice control module 100 for use in a TV set. The voice control application is executed by a dedicated IC 110. It will be appreciated that this function may also be executed by a general purpose processor, suitably programmed for this task. In the preferred embodiment for consumer electronic applications, as shown in Fig. 1, the acoustic echo canceller (AEC) function is also performed by the same IC 110. If so desired this function may also be performed by a suitably programmed processor or using dedicated hardware collectively referred to as signal processing unit. The AEC 110 receives input from a microphone 120. Preferably an integrated directional microphone is used that is specially engineered to improve speech quality by reducing pick-up of noise and reverberation. An example of such a microphone is DM1000-Ml 18HC of Philips Electronics. The microphone is preferably fixedly mounted in the TV cabinet. The signal of the microphone may be amplified by an amplifier 130. Shown is the monaural microphone amplifier NJM21 lOM of Philips Electronics. The amplified signal is converted to the digital domain using a A/D converter 140. Shown is a stereo A/D converter UDA1360TS of Philips Electronics of which one input is connected to the microphone amplifier. The signal supplied to the speakers 150 (shown are a left and right speaker) is also supplied to the AEC 110. In Fig. 1 block 160 combines the speakers signals to a mono signal and may adjust the signal strength. The combined signal is supplied as the second input of the A/D converter 140. Both digital output signals (microphone signal and speaker signal) are supplied as input to the AEC 110, in the preferred embodiment via an I2S digital signal connection. The AEC cancels the speaker signal from the microphone signal in order to give a'clean'speech signal. Echo cancellers and echo canceling algorithms are generally known and will not be described further. The echo canceller/voice controller 110 is controlled by a processor of the device in which the module is located. In this example, the entire module 100 can be controlled by processor 170 of a TV. The interaction between the processor 170 and the module 100 is preferably via the 12 C digital control bus.

Fig. 2 gives an overview of a block diagram of a system incorporating the module 100 of Fig. 1. The system includes a processor 170 for controlling the operation of the module and interaction with a user. As an example, the module 100 may recognize a voice command spoken by the user. The recognized command is given to the processor 170 for execution. The command may be in any suitable form, like a digital code identifying the recognized command but, alternatively, it may also be in a textual transcription of the recognized command. The system also includes a memory 200 for storing a plurality of sets of reverberation parameters, each associated with a predetermined type of room. The memory is preferably of a permanent type, such as a ROM. Each set of parameters may include one or more parameters depending on the operation of the echo canceller being used. For example, the set of parameters may simply be an average reverberation delay. The set of parameters may also be much more complex, for example with respective weights for a range of reverberation times. If so desired, the parameters may also be frequency specific (for example, different parameters for different frequency bands or even different sequences of weights for respective reverberation times for each of the frequency bands). The parameters itself are not the subject of the invention. According to the invention for a range of predetermined types of room corresponding sets of parameters have been determined and stored in the memory 200. The processor 170 has been programmed to select one of the sets of reverberation parameters for use by the signal processing unit 100 (in particular for use by the signal processing performed by the AEC 110). The different sets of parameters are preferably empirically determined by testing the systems in typical rooms used by users, and varying the controllable parameters of the AEC until an optimal result has been achieved. Preferably, the result is judged based on the results of the speech processing application, such as voice control (for example, the relative number of correct recognitions, of insertions and of deletions). The result may also be judged on the outcome of the AEC, for example the best result is achieved if the output of the AEC has least energy.

In a preferred embodiment, the system includes a user interface 210 for interaction with the user. The output to the user may, for example, be visibly displayed on a television screen, e. g. using menus and other output options. The output may also be audible. The input may, for example, be via a remote control or voice input. The processor 170 is programmed to present the user with a choice of sets of parameters.

This may mean that, for example, the user is given a choice between typical reverberation times of the rooms supported by the system. Using reverberation times within the range of 0.3 to 0.7 secs. gives good results. The processor 170 is also programmed to receive the choice from the user. The dialogue with the user may take place via menus. Preferably, each predetermined type of room is associated with a description characterizing a reverberation aspect of the room. Such a description is also stored in the memory 200. The description may be textual (for example: room 1: length 6 meters, width 3.5 meters ; room 2:....), graphical (for example: graphical drawing of sizes and/or shapes of the rooms and/or location of the system in the room and/or locations of the speakers in the room), etc. The processor 170 is programmed to cause the user interface 210 to present the description to the user to support the user in choosing one of the predetermined room types best matching a room in which the system is used.

Preferably, the predetermined types of rooms (and the corresponding descriptions) are chose to deal with at least one of the following variations: - shape of the room; - size of the room; - acoustic damping characteristic of the room; - location of the system in the room; - position of the speakers in the room - position of the microphone in the room As an example, the system may allow a user to select between three room shapes: a square room, a longitudinal room with the system along one of the long walls, and a longitudinal room with the system along one of the narrow walls. For the size of the room also a choice may be given, for example small (< 20 m2), average (40m2), and large (> 60m2). For the location of the system a choice may be given of, for example, near the corner or along the wall. For the acoustic damping a choice may be given of low, medium and high damping. For the position of the speakers a choice may be given of, for example, using the speakers integrated in the system's cabinet, using external stereo speakers near or far removed from the cabinet, using surround speakers in the corners of the room, etc. For the position of the microphone a choice may be given of, for example, using the built-in microphone, using a hand-held microphone or using a head-mounted microphone. The system may but need not give the user many options. If the user is given a choice between two categories (e. g shape and size of the room) and a choice of three settings within each category, this implies that in fact nine different room types are supported. For each of these room types an optimal set of parameters is stored.

The system may also support use of an external amplifier. In this case, it is desired that an amplified output of that amplifier is provided as input to the echo canceller 110 instead of the output of the speakers integrated in the system to cancel the speaker signal in dependence on the level of amplification of the external amplifier.

Similarly, the system may support use of an external microphone. In this case, the signal of that microphone must be supplied as input to the echo canceller 110 instead of or in addition to the signal from the built-in microphone.

In an alternative embodiment, the system automatically selects one of the sets of reverberation parameters for use by the signal processing unit. The selection may, for example, be based on input available to the system, such as the country in which the system is used. In a country with a high nominal income, the system may automatically choose a larger room and corresponding longer reverberation time.

Similarly, in a country with mainly houses built of stone, brick or concrete walls a low damping factor may be chosen as default. In a country with mainly wooden walls a higher damping factor may be used as default. Preferably, the system performs the automatic selection based on an automatic test. To this end, the processor 170 may cause the loudspeaker (s) to generate an acoustic test signal. The signal may be generated by a signal generator. The signal may be over the entire relevant spectrum, but if so desired also a separate test may be performed for several respective frequency bands.

The processor 170 sequentially loads the parameter sets stored in the memory 200 in the AEC 110. It also ensures that the outcome of the performance of each set is tested, for example using dedicated hardware (or software in a software operated echo canceller) to determine whether the canceling has been successful (e. g. by measuring the residual energy). The processor then ensures that the best set of parameters is loaded for subsequent use by the system. This set (or a reference to this set) may be stored in a permanent memory such as a flash memory.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The words"comprising"and"including"do not exclude the presence of other elements or steps than those listed in a claim. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. Where the systern/device/apparatus claims enumerate several means, several of these means can be embodied by one and the same item of hardware. The computer program product may be stored/distributed on a suitable medium, such as optical storage, but may also be distributed in other forms, such as being distributed via the Internet or wireless telecommunication systems.