Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND SYSTEMS FOR REDUCING ACOUSTIC ECHOES IN MULTICHANNEL AUDIO-COMMUNICATION SYSTEMS
Document Type and Number:
WIPO Patent Application WO/2008/137059
Kind Code:
A2
Abstract:
Various embodiments of the present invention are directed to adaptive real-time, acoustic echo cancellation methods and systems. One method embodiment of the present invention is directed to reducing acoustic echoes in microphone-digital signals transmitted from a first location (102) to a second location (104). The first location includes a plurality of loudspeakers (604) and microphones (606) each microphone produces one of the microphone-digital signals, including sounds produced at the first location and acoustic echoes produced by the loudspeakers. The method includes determining approximate impluse responses, each of which corresponds to an echo path (702) between the microphones and the loudspeakers. The method includes determining a plurality of approximate acoustic echoes, each approximate acoustic echo corresponds to convolving a digital signal played by one of the loudspeakers with a number of the approximate impluse responses. The acoustic echo in at least one of the microphone-digital signals is reduced based on the corresponding approximate acoustic echo.

Inventors:
FOZUNBAL MAJID (US)
Application Number:
PCT/US2008/005662
Publication Date:
November 13, 2008
Filing Date:
April 30, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HEWLETT PACKARD DEVELOPMENT CO (US)
FOZUNBAL MAJID (US)
International Classes:
H03H17/02; H04B3/23
Foreign References:
KR20000065243A
KR19990080327A1999-11-05
Other References:
See references of EP 2130298A4
Attorney, Agent or Firm:
LEE, Denise A, (Intellectual Property AdministrationP O Box 272400,M/S 3, Fort Collins Colorado, US)
Download PDF:
Claims:

CLAIMS

1. A method for reducing acoustic echoes in a plurality of microphone-digital signals transmitted from a first location (102) to a second location (104), the method comprising: providing a plurality of loudspeakers (604) and a plurality of microphones (606) located at the first location, each of the plurality of microphones produces one of the plurality of microphone-digital signals that includes sounds produced at the first location and acoustic echoes produced by the plurality of loudspeakers; determining approximate impulse responses, each approximate impulse response corresponds to an echo path (702) between each of the plurality of microphones and each of the plurality of loudspeakers; determining a plurality of approximate acoustic echoes, each approximate acoustic echo corresponds to convolving a digital signal played by one of the plurality of loudspeakers with a number of the approximate impulse responses; and reducing the acoustic echo in at least one of the microphone-digital signals based on the corresponding approximate acoustic echo.

2. The method of claim 1 wherein determining an approximate impulse response further comprises identifying a type of audio transmission associated with each echo path as one of the following: audio signals produced at the first location only; audio signals produced at the second location only; audio signals simultaneously produced at the first location and the second location; and no audio signals produced at either the first location or the second location.

3. The method of claim 1 wherein reducing the acoustic echo in each of the microphone-digital signals further comprises determining a controlled digital signal as follows:

where j ε {l,K J} is a microphone index, i ε {l,K ,/} is a loudspeaker index, Tr N is a truncation operator of a decision period length Nd,

IFFT is inverse Fast-Fourier transform,

"o" represents component- wise multiplication of two vectors, is a frequency domain microphone-digital signal, , is a frequency domain digital signal, is a frequency domain, approximate-impulse-response vector, and is frequency domain, approximate acoustic echo.

4. The method of claim 3 wherein determining the frequency domain, approximate-impulse-response vector further comprises computing

where FFT is the Fast-Fourier transform, and } is an approximate impulse response vector depending the type of audio transmission.

5. The method of claim 4 wherein computing the approximate-impulse-response vector further comprises: when no signals are transmitted between the first location and the second location or when signals are transmitted from first location to the second location, asigning:

where h}™ ~ ' is an approximate-impulse-response vector associated with a previous decision period identified by m - 1 ,

when signals are simultaneously transmitted between the first location and the second location, selecting the approximate-impulse-response vector form an impulse-response data structure with the K most recent approximate impulse responses with the highest signal to noise ratio when signals are transmitted from second location to the first location only employing a recursive formula given by:

where is a shadow-impulse-response vector associated with a decision period identified by m,

is an approximate-impulse-response vector associated with a previous decision period identified by m — 1 , and γ ιj m is an impulse response step size.

6. The method of claim 5 wherein the impulse response step size γ ιj m further comprises computing:

where

for M decision periods and

otherwise

7. The method of claim 6 wherein the impulse response step size γ l]jm further comprises setting

when is greater than " 1," where

is the Hth element of the shadow-impulse vector , and is the nth element of a trust region vector λ .

8. The method of claim 5 wherein recursively computing the shadow-impulse- response vector further comprises employing a recursive formula given by:

where εf is a shadow-mismatch vector,

is a shadow-impulse-response vector associated with a previous decision period identified by m - 1 , and μ tJ m is a shadow update step size.

9. The method of claim 8 wherein computing the shadow-mismatch vector further comprises employing a recursive formula given by:

where m is an integer associated with a decision period, L is a number of vector components, IFFT is the inverse Fourier transform operator, Tr 1 is a truncation operator of size L,

X is a frequency domain vector corresponding to the audio signal,

is a frequency domain, shadow error vector, and

such that , ^ [ ] is a component of an average spectrum of the frequency domain vector , given by , is a component of an average spectrum of the frequency domain

shadow error vector λ is a stabilizing constant,

7

is a maximum energy associated with the digital signal, and

' is a maximum energy associated with the microphone-digital signal,

10. The method of claim 8 wherein the shadow update step size μ l]jm further further comprises setting

/V m = 0-2 when , otherwise

where

is the nth element of the shadow-mismatch vector , and is the

«th element of a trust region vector , and

Description:

METHODS AND SYSTEMS FOR REDUCING ACOUSTIC ECHOES IN MULTICHANNEL AUDIO-COMMUNICATION SYSTEMS

TECHNICAL FIELD The present invention is related to acoustic echo cancellation, and, in particular, to methods and systems for reducing acoustic echoes in multichannel audio-communication systems.

BACKGROUND OF THE INVENTION Increasing interest in communication media, such as the Internet, electronic presentations, voice mail, and audio-conference communication systems, is increasing the demand for high-fidelity audio and communication technologies. Currently, individuals and businesses are using these communication media to increase efficiency and productivity, while decreasing cost and complexity. For example, audio-conference communication systems allow one or more individuals at a first location to simultaneously converse with one or more individuals at other locations through full-duplex communication lines in nearly real time, without wearing headsets or using handheld communication devices.

In many audio-conference communication systems, audio signals carry a large amount of data, and employ a broad range of frequencies. Modern audio-conference communication systems attempt to provide clear transmission of audio signals over a single channel, also called a "monochannel," free from perceivable distortion, background noise, and other undesired audio artifacts. One common type of undesired audio artifact is an acoustic echo. Acoustic echoes can occur when a transmitted audio signal loops through an audio-conference communication system due to the coupling of a microphone and a speaker at a location. Figure 1 shows a schematic diagram of an exemplary, two-location, monochannel audio-conference communication system 100. The audio-conference communication system 100 includes a near room 102 and a far room 104. Sounds, such as voices, produced in the near room 102 are detected by a microphone 106, and sounds produced in the far room 104 are detected by a microphone 108. Microphones 106 and 108 convert

sounds into signals represented by x(t) and y(t) , respectively, where t represents time.

The microphone 106 can detect many different sounds produced in the near room 102, including sounds output by a loudspeaker 1 14. An analog signal produced by the microphone 106 is represented by: y(t) = s(t) + ∫(x(t)) + ν(t) where s(t) is an analog signal representing sounds produced in the near room 102, ν(t) is an analog signal representing noise, or extraneous signals created by disturbances in the microphone or communication channel 1 10, that, for example, may produce an annoying buzzing sound output from the loudspeaker 1 16, and f (x(t)) is an analog signal representing an acoustic echo.

The acoustic echo ∫(x(t)) is due to both acoustic propagation delay in the near room 102 and a round-trip transmission delay of the analog signal x(f) over the communication channels 1 10 and 1 12. Sounds generated by the analog signal y{t) are output from a loudspeaker 1 16 in the far room 104. Depending on the amplification, or gain, in the amplitude of the signal y(t) and the magnitude of the acoustic echo ∫(x(t)), a person speaking into the microphone 108 in the far room

104 may hear, in addition to the sounds carried by the signal s (t) , an echo or an annoying, high-pitched, howling sound emanating from the loudspeaker 1 16 as a result of the sound generated by the acoustic echo ∫(x(t)). Designers and manufacturers of audio-conference communication systems have attempted to compensate for acoustic echoes in various ways. One compensation technique employs a filtering system that reduces the acoustic echo. Typically, filtering systems employ adaptive filters that adapt to changing conditions at an audio-signal- receiving location.

In recent years there has been an increasing interest in developing multichannel audio communication systems in an effort to enhance the audio-

conference experience. These systems employ a plurality of microphones and loudspeakers at the first and second locations. However, employing a plurality of microphones and loudspeakers at each location creates acoustic echoes that are separated by several hundred milliseconds of communication delay, which is an obstacle to deploying multichannel audio-conference communication systems. For example, designers and manufacturers have developed methods that employ nonlinear functions to uncorrelated excitation signals prior to exciting the loudspeakers. However, this nonlinearity ultimately diminishes the spatial audio experience. Designers, manufacturers, and users of audio-conference communication systems have recognized a need for multichannel audio-conference communication methods and systems that can reliably remove an acoustic echo from audio signals in real-time, and can rapidly adapt to the changing conditions at audio-signal-receiving locations.

SUMMARY

Various embodiments of the present invention are directed to adaptive realtime, acoustic echo cancellation methods and systems. One method embodiment of the present invention is directed to reducing acoustic echoes in a plurality of microphone-digital signals transmitted from a first location to a second location. The first location includes a plurality of loudspeakers and a plurality of microphones, each of the plurality of microphones produces one of the plurality of microphone-digital signals that includes sounds produced at the first location and acoustic echoes produced by the plurality of loudspeakers. The method includes determining approximate impulse responses, each approximate impulse response corresponds to an echo path between each of the plurality of microphones and each of the plurality of loudspeakers. The method also includes determining a plurality of approximate acoustic echoes, each approximate acoustic echo corresponds to convolving a digital signal played by one of the plurality of loudspeakers with a number of the approximate impulse responses. The acoustic echo in at least one of the microphone- digital signals is reduced based on the corresponding approximate acoustic echo.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 shows a schematic diagram of an exemplary, two-location, audio- conference communication system. Figures 2A-2C illustrate conversion of an analog signal to a digital signal.

Figure 3 is a plot of an impulse excitation of a loudspeaker and a plot of an overall room impulse response resulting at the output of the microphone in response to the impulse excitation.

Figures 4A-4E illustrate an example of determining a digital signal output from a microphone by convolving an input digital signal with the microphone impulse response.

Figure 5 illustrates convolving a digital signal with a five-component impulse response in order to obtain a convolved digital signal.

Figure 6 shows a block diagram of a mixed multichannel audio- communication system that represents an embodiment of the present invention.

Figure 7 shows acoustic coupling between the loudspeakers and the microphones, shown in Figure 6, in accordance with an embodiment of the present invention.

Figure 8 shows a block diagram of acoustic echo cancellation carried out by a multichannel echo control unit in accordance with an embodiment of the present invention.

Figure 9 shows a plot of decision periods and decision epochs associated with approximate-impulse-response vectors in accordance with an embodiment of the present invention. Figure 10 shows a control-flow diagram that represents an embodiment of the present invention for reducing acoustic echoes in a plurality of audio signals transmitted from a first location to a second location.

Figures 1 IA-11C show a control-flow diagram for the routine "determine control state" called in step 1008 in Figure 10 and represents an embodiment of the present invention.

Figure 12 is a control-flow diagram for the routine "residual echo suppression" called in step 1009 in Figure 10 and represents an embodiment of the present invention.

Figure 13A is a control-flow diagram for the routine "determine called in step 1015 in Figure 10 and represents an embodiment of the present invention.

Figures 13B-13C shows two plots of amplification energy versus time for the four types of control states that represents an embodiment of the present invention.

Figure 14A is a control-flow diagram for the routine "compute called in step 1304 in Figure 14 and represents an embodiment of the present invention.

Figure 14B shows a two-dimensional illustration of a hyper-elliptical region located within a search space that represents an embodiment of the present invention.

Figure 15 is a control-flow diagram for the routine "determine " called in step 1404 in Figure 14 and represents an embodiment of the present invention. Figure 16 is a control-flow diagram for the routine "determine called in step 1406 in Figure 14 that represents an embodiment of the present invention.

Figure 17 is a control-flow diagram for the routine "determine called in step 1407 in Figure 14 that represents an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Various embodiments of the present invention are directed to real-time, adaptive acoustic echo cancellation methods in multichannel audio-communication systems. In particular, these methods reduce acoustic echoes in a plurality of audio signals transmitted between a first location and a second over multichannel audio- communication systems. The communication system embodiments of the present invention can be electronic presentations, voice mail, audio-conference communication systems or any other type of communication system capable of transmitting audio signals between a first location and a second location. A plurality of microphones and loudspeakers are employed at the first location and the second location. Method embodiments of the present invention compute a control state for

each acoustic coupling between microphones and loudspeakers. The control state characterizes one of the following four types of communications: (1 ) sound transmitted from the first location only; (2) sound transmitted from the second location only; (3) sounds transmitted simultaneously between the first and second locations; and (4) no sound transmitted between the first and second locations. For each of the signals detected by the microphones located at the first location, the methods then compute approximate acoustic echoes based on the control state. The methods subtracts the corresponding computed, approximate acoustic echoes from each of the digital signals that are transmitted from the second location to the first location and adjusts these signals for gain before the signals are output at the first location.

Embodiments of the present invention are mathematical in nature and, for this reason, are described below with reference to numerous equations and graphical illustrations. Although mathematical expressions, alone, may be sufficient to fully describe and characterize embodiments of the present invention to those skilled in the art of acoustic echo cancellation, the more graphical, problem oriented examples, and control-flow-diagram approaches included in the following discussion are intended to illustrate just one of many embodiments of the present invention so that the present invention may be accessible to readers with various backgrounds. In order to assist in understanding descriptions of various embodiments of the present invention, an overview of digital signals, impulse responses, and convolution is provided in a first subsection. Embodiments of the present invention are provided in a second subsection.

An Overview of Digital Signals. Impulse Responses, and Convolution

This subsection is intended to provide a general description of digital signals, impulse responses, and convolution in a monochannel audio transmission system.

The concepts introduced in this subsection are then used to describe each of the audio channels in a multichannel acoustic echo embodiments described below in the next subsection. The term "microphone" refers to a transducer or any suitable device that converts sounds into signals. The "loudspeaker" refers to any suitable device capable of converting a signal into sound.

Sounds received by a microphone are transformed into an analog signal comprising a time-dependent, continuously varying voltage. In order to process an analog signal using a digital computer, the analog signal is first converted into a digital signal with minimal alteration of the essential information contained in the analog signal. Digital signals can be stored electronically, magnetically, or optically and can be processed using logical operations encoded in computer programs.

Figures 2A-2C illustrate conversion of an analog signal into a digital signal.

In Figures 2A-2B, horizontal axes, such as horizontal axis 202, represent time, and vertical axes, such as vertical axis 204, represent analog signal amplitudes in volts. Figure 2A is a plot of a time-dependent, continuously-varying analog signal x(t)

206. The analog signal x(t) 206 is first sampled by measuring the amplitude of the analog signal x(t) at discrete sampling times. In order to prevent loss of essential information contained in the analog signal, the duration between sampling times is generally selected to be sufficiently short so that the analog signal varies little between consecutive sampling times. Figure 2B is a plot of a sampled signal 208 obtained by sampling the analog signal 206 in Figure 2A. The sampling times are in tenths of a second, and the sampled signal 208 is approximated as a step function by assuming a constant-signal amplitude between sampling times. For example, a constant-amplitude region 210 represents a constant value of -1.9 volts between sampling times 0.2 and 0.3 seconds.

For efficient and convenient digital signal processing, it is desirable for both time and magnitude values to be integers. Therefore, an integer-encoded, digital signal is produced by multiplying the value of each constant-amplitude region by a first constant and by multiplying the sampling times by a second constant in order to produce integer values that represent the amplitude and sampling times of each step in the step function. An integer-valued sampling time is called a "time sample," and an integer-valued amplitude is called a "digital amplitude." The resulting digital signal can be functionally represented by *[«] , where n, an independent variable, represents a time sample domain. Figure 2C is a plot of a digital signal obtained from the sampled signal 208 in Figure 2B. In Figure 2C, horizontal axis 212 is a time sample domain, and vertical axis 214 is a digital signal axis. Each point in the graph

represents a quantized value representing the scaled amplitude of the digital signal at a scaled sampling time. For example, point x[2] 216 with coordinates (2,-19) represents step 210 in Figure 2B.

A digital signal x[n] can, in general, be thought of as a series of impulses, each impulse corresponding to a unique component. The notation *[«] , where n represents a particular time sample, can also be used to represent a single impulse of a digital signal which is called a "component" of a digital signal. Each component is a signal comprising all zero sample values except for a single value representing the amplitude at a single time sample, which is mathematically represented by: x[n] = dδ[n- p] where d is an integer scale factor that represents the amplitude, or strength, of the impulse, p is a time sample, and δ is the delta function defined by:

For example, in Figure 2C, the component x[0] 218 is equal to 10δ[p], and the component x[2] 216 is equal to -19δ[2 -p] . In other words,p in the delta function δ[n - p] represents a time sample shift, and n -p represents a time sample relative to time sample n.

A digital impulse response, h[n] , is a digital signal that is output from a microphone when the input to the microphone is a unit impulse δ[n] , where p is "0" and d equals "1." The impulse response of a microphone can be determined by applying an impulse of sound with a very short duration to the microphone and measuring the signal output by the microphone. The impulse response of the microphone can also be represented by a vector as follows:

where h n [•] is an impulse response component, and

L is the number of components comprising the impulse response.

Figure 3 shows a plot of an impulse x[n] and a plot of an overall room impulse response h[n] produced in response to the impulse x[n] . The impulse response h[n] includes the sound impulse produced by the loudspeaker and the sound of the impulse reflected in the room and received by microphone. In Figure 3, impulse plot 302 represents an impulse x[n\ input to a hypothetical microphone 304.

The impulse x[n] 302 comprises all zeros except for a single nonzero point 306 at n equal to 0 which is represented by dδ[n- p] . In this case, d equals "1" andp equals "0" so the impulse can be represented as δ[n] . In response to the impulse 302, the microphone 304 outputs an impulse response h[n] that is represented by an impulse- response plot 308. The impulse response 308 comprises all zeros except for the three nonzero digital signals represented by points 310-312. An actual digital impulse response to an impulse typically comprises a greater number of nonzero components than contained in the impulse, as shown in Figure 3. Impulse response 308 can be represented by the 3 -component vector:

Typically, the impulse used to determine an impulse response is output from a loudspeaker into a room and is detected by a microphone. The loudspeaker, room, and microphone are referred to as a "system," and an associated impulse response can be referred to as a "system impulse response." A digital signal produced by the loudspeaker of the system is determined by convolving a digital signal x[n]

representing the sound produced by the loudspeaker with the impulse response h[n] of the system. The convolved digital signal is represented by x c [n] . Figures 4A-4D provide a graphical example of convolving a three-component digital signal x[n] produced by a hypothetical loudspeaker with an impulse response h[n] in order to produce a digital signal x c [n] output from the system. Figure 4A is a plot of an example, two-component, impulse response h[n] 402 that is produced by the hypothetical system. The impulse response h[n] is assumed to be invariant with time.

Figure 4B is a plot of a first component of the digital signal x[n] at a time sample "0." The first component is represented by a scaled impulse 2δ[0] 404. In response to the impulse 2δ[n] 404, the system outputs an impulse-response A[«] comprising a first component 406 at the time sample "0," and outputs a second component 408 at a later time sample "1." The impulse response to the impulse 2δ\n\ 404 is essentially the impulse response in Figure 4A with the components multiplied by the impulse scale factor "2."

Figure 4C is a plot of a second component of the digital signal x[n] that is input to the system at the later time sample "1." The second component is represented by an impulse o[n - 1] 410. In response to the impulse δ[« - l] 410, the system outputs an impulse-response comprising a third component 412 at the time sample "1," and outputs a fourth component 414 at a later time sample "2." The impulse response to the impulse δ[n - 1] 410 is essentially the impulse response in

Figure 4A with the component time samples shifted by a factor of "1." Because the second and the third components 408 and 412 occur at the same time sample "1," the amplitudes of the components 408 and 412 are summed in order to obtain a fifth component 416, which is the output at the time sample " 1."

Figure 4D is a plot of a third component of the digital signal x[n] that is input to the system at the time sample "2." The second component is represented by an

impulse -2δ[n-2] 418. In response to the impulse -2δ[n-2], the system outputs an impulse response comprising a sixth component 420 at the time sample "2," and a seventh component 422 at a later time sample "3." The impulse response to the impulse -2δ[n-2] 418 is essentially the impulse response in Figure 4A with the components multiplied by the scale factor "-2," and the component time samples shifted by a factor of "2." Because the fifth and the sixth components 414 and 420 occur at the same time sample "2," the amplitudes of components 414 and 420 are summed to give an eighth component 424, which is the output at the time sample "2."

Note that convolving the three-component, input digital signal x[n] with the two-component impulse response h[n] outputs the four-component digital signal x c [n] . In general, convolving an N component input digital signal x[n] with an L component impulse response h[n] gives an N + L - 1 component convolved digital signal x c [n] .

Components of the convolved digital signal x c [n] 426, in Figure 4D, can also be obtained by calculating the scalar or dot product of a two-component vector representation of the impulse response and two-component vectors corresponding to each component of the digital signal x[n] that are given by:

The order of the components in the column vectors A and x[n] are opposite in time ordering with respect to one another. For example, in the column vector A , the first component h[0] appears earlier in time than the second component A[I] , while, in the column vector x[n], the first component x[n] appears later in time than the second component x[n- 1] . Note that components of the vector x[n] corresponding

to negative valued time samples are assigned the value "0." For example, the first component 406, in Figure 4D, is calculated by:

where the component x[-1] is assigned the value "0," and the superscript T identifies the matrix transpose operation. The second, third, and fourth components 416, 424, and 422 are calculated as follows:

In general, a component of a convolved digital signal x c is mathematically determined by:

where

"*" is a symbol representing convolution, and

In order to compute a convolved signal component x c [n] , the L previously obtain digital signal components of the digital signal x[n] are used, and the components of the vector that correspond to negative valued time samples are assigned the value "0." Convolution may also be denoted by

Figure 5 illustrates convolving a digital signal displayed in a plot 502 with a five-component impulse response in order to obtain a convolved digital signal displayed in a plot 504. In plots 502 and 504, horizontal axes, such as horizontal axis 506, are time sample axes, and vertical axes, such as vertical axis 508, are digital number axes. The convolved digital signal sample 510 in the plot 504 is obtained as shown in Figure 5 and is mathematically represented by:

In the examples of convolution described above, the impulse response is assumed to remain constant at each time sample in the time domain. However, in practice, the impulse response of a system often depends on the conditions of the room. In other words, the impulse response of the system may change over time as conditions in the room change. For example, an impulse response of a system with no sound produced in the room is different from an impulse response of the same system at a later time when sounds are produced in the room.

Embodiments of the Present Invention

Various embodiments of the present invention are directed to adaptive realtime, acoustic echo cancellation methods and systems for reducing acoustic echoes in a plurality of audio signals transmitted between a first location and a second location. An overview of acoustic echo cancellation method and system embodiments is described first followed by a description of one of many implementation embodiments of the present invention. Embodiments of the present invention are described with reference to numerous block diagrams, control-flow diagrams, mathematical equations, and graphical illustrations.

1. Overview of Acoustic Echo Cancellation in Mixed Multichannel Audio- Communication Systems

Figure 6 shows a block diagram of a mixed multichannel audio- communication system 600 that represents an embodiment of the present invention.

The mixed multichannel audio-communication system 600 includes a multichannel echo control unit ("MECU") 602, and / loudspeakers 604 and J microphones 606 located in the near room 102, where / and J are natural numbers that represent the total number of loudspeakers and the total number of microphones, respectively. Digital signals x 1 [n],..., x 1 [n] generated in the far room 104 are transmitted to

MECU 602 and are played simultaneously through loudspeakers 604 located in near room 102. Microphones 606 detect sounds produced by people, audio devices, and other noise generating sources located in near room 102 and detect reverberated sounds or echoes produced by sounds originating from loudspeakers 604. Acoustic coupling between loudspeakers 604 and microphones 606 is described below with reference to Figure 7. The sounds detected by microphones 606 are transmitted in the form of J microphone-digital signals y 1 [n],... ,y j [n] to MECU 602. MECU 602, described in greater detail below with reference to Figure 8, processes the microphone-digital signals y 1 [n],..., y j [n] in order to obtain J processed digital signals r 1 [n],...,r j [n], which are substantially free of acoustic echoes and background noise and are transmitted to far room 104.

In the following description of the present invention, the notation, {•} , where N is the number of elements in the set, is introduced as a compact way of representing a set of N digital signals. For example, the digital signals x 1 [n],..., x I [n] can instead be represented by { x i [n]} .

At any given time sample n, acoustic coupling may exist between each loudspeaker and each microphone located in near room 102. This coupling is called an "echo path" and there exists a time varying, real-valued impulse-response vector for each acoustic coupling of an /th loudspeaker and a jth microphone, where

i ε {1, ...,/} is a loudspeaker index, and j e {1,... J} is a microphone index. Figure 7 shows acoustic coupling between loudspeakers 604 and microphones 606, shown in Figure 6, that represents an embodiment of the present invention. As shown in Figure 7, sixteen of the echo paths coupling loudspeakers and microphones are represented by lines, and four of these sixteen echo paths are labeled with a corresponding impulse-response vector For example, at the time sample n, the impulse- response vector associated with echo path 702 between the 2 nd loudspeaker 704 and the 1 st microphone 706 is . A reverberated version or echo of a particular digital signal x, [n] played through the /th loudspeaker and detected by theyth microphone is denoted by h y n * x i [n] . The total echo associated with the digital signals {*, [«]}, detected by the jth microphone is given by:

As shown in Figures 6 and 7, microphones 606 transmit J microphone-digital signals {y j [n]} J to MECU 602, where each microphone-digital signal is characterized by:

and S j [n] represents a local source signal generated by people, audio devices, and other noise generating sources located in near room 102.

For each microphone-digital signal, method and system embodiments of the present invention are directed to substantially cancelling the acoustic echoes {e j [n]} from the corresponding microphone-digital signals {y j [n]} J without causing substantial distortion to the local source signal {s j [n]} . Figure 8 shows a block diagram of acoustic echo cancellation carried out by MECU 602, shown in

Figure 6, that represents an embodiment of the present invention. MECU 602 includes an estimation and tracking block 802 that receives the digital signals {x, [n]} I and the microphone-digital signals {y j [n]} J , as indicated by directional arrows connecting input channels 804-806 and the microphone channels 810-808 to the estimation and tracking block 802. The estimation and tracking block 802

generates approximate impulse response vectors to produce {ê j [n]} J , which are correspondingly subtracted from the microphone-digital signals {y j [n]} J at J adaptive filters, three of which are represented by dashed-line boxes 812-814. However, rather than generating approximate-impulse-response vectors for each time sample n, method embodiments of the present invention are based on the assumption that the impulse-response vector exhibits little variation over a number of time samples, N c , called the "coherence time." In other words, for each impulse-response vector, the maximum deviation over N c time samples is bounded by:

where E represents the maximum amplification energy, or gain, produced by a microphone in response to an impulse. As a result, instead of determining a set of approximate-impulse-response vectors for each time sample n, methods of the present invention compute a set of approximate-impulse-response vectors, denoted by , at the beginning of every Nd time samples, where N « N c , and m is a positive integer. An interval with Nj time samples is called a "decision period," and the beginning of a decision period is called a "decision epoch."

Figure 9 shows a plot of decision periods and decision epochs associated with approximate-impulse-response vectors that represent embodiments of the present invention. In Figure 9, horizontal axis 902 represents a time axis and structures 904- 907 represent digital signals and approximate impulse responses for an echo path. Each vertical line segment represents a digital signal or an impulse response associated with a time sample. In particular, the vertical line segments in the structure 904 represent n + 1 consecutive impulse-response vectors the vertical line segments in the structure 905 represent n + 1 microphone-digital signals y j [n], and the vertical line segment in the structure 906 represent n + 1 digital signals *, [n] . In Figure 8, decision periods are comprised of 6 time samples, the

beginning of each is identified by one or the dashed lines 908-910. In accordance with method embodiments of the present invention, estimation and tracking block

802, produces a new set of approximate-impulse-response vectors at the beginning of each decision period. For example, suppose that during decision period 912, estimation and tracking block 802 produces an approximate-impulse-response vector ^ which is used to determine a reverberated digital signal for the decision period 912. At the beginning 908 of the next decision period 914, estimation and tracking block 802 produces a new approximate-impulse-response vector which is then used to determine a reverberated digital signal for the next N d time samples of decision period 914. In general, at the beginning of a decision period τnN d , a set of approximate-impulse-response vectors is computed and as the approximate-impulse-response vectors for each time sample n in the interval [mN d ,(m + 1)N d - 1].

Returning to Figure 8, adaptive filters 812-814 represent the first two and last of J adaptive filters, each of which represents convolving the digital signals {*, [«]]

with corresponding approximate-impulse-response vectors generated by estimation and tracking block 802 in order to obtain approximate reverberated digital signals . For example, adaptive filter 812 represents convolving the digital signals { x 1 [n]} with the approximate-impulse-response vectors

respectively, which gives approximate reverberated digital signals . In general, theyth adaptive filter represents convolving the digital signals {x, [n]} I with

the approximate-impulse-response vectors respectively, in order to obtain approximate reverberated digital signals:

Summing the reverberated digital signals associated with each of the J adaptive filters produces a set of J approximate acoustic echoes Ie 7 [«]} , where each vector is given by:

The J approximate acoustic echoes are correspondingly subtracted from each of the microphone-digital signals iy> j [n]| at summing junctions, such as summing junctions 816-818, resulting in J controlled digital signals each element of which is given by:

The difference is called the "residual echo." Method embodiments of the present invention carried out by estimation and tracking block 802 are directed to reducing the residual echo.

Estimation and tracking block 802 also controls nonlinear processing of each of the controlled digital signals For example, nonlinear processing blocks

("NPBs") 820-822 are located in microphone channels 808-810, which represent three of the J nonlinear processes carried out by estimation and tracking block 802. The NPBs attenuate background noise and any residual echo carried by a corresponding controlled digital signal in order to produce J processed digital signals

As described above with reference to Figure 6, MECU 602 produces the J processed digital signals (^ [«]} which are transmitted to the far room 104 substantially free of acoustic echoes and background noise.

As shown in Figure 6, MECU 602 can be located outside the near room 102. In fact, in certain embodiments of the present invention, MECU 602 can be located in an adjacent room, a room in the same building, or a room located tens or even thousands of miles away from near room 102. In other embodiments of the present invention, MECU 602 can be located inside near room 102. In other embodiments of the present invention, a second MECU can be included to cancel acoustic echoes in the digital signals Ix 1 [«]} transmitted from the far room 104 to the near room 102.

II. Implementation The estimation and tracking 802 may also include direct current ("DC") offset removal for signals transmitted between near room 102 and estimation and tracking 802. DC offset is a low-frequency distortion often caused by electrical interference. This electrical interference creates a constant voltage that can cause clicks and pops in the sound output from a loudspeaker. DC offset removal corrects the DC offset in each of the digital signals |x, [«]] produced in the far room 104 as follows:

where a is a constant ranging from about 0.9 to about 0.9999, and corrects the DC offset in each of the microphone-digital signals Jj^ [«]} produced in the near room 102 as follows:

Control-flow diagrams shown in Figures 10-16 and the following discussion provide a description of one of many method embodiments for reducing an acoustic echo in microphone-digital signals |.y, [«]} and generating associated processed digital signals {>) [«]} y - Figure 10 shows a control-flow diagram that represents an embodiment of the present invention for reducing acoustic echoes in a plurality of audio signals that are transmitted from near room 102 to far room 104. In step 1001 of Figure 10, parameters used in the equations described below are initialized. Tables 1-3 display these parameters and associated example initial values that may be used in certain

applications. Note that values displayed in Tables 1-3 depend on the room setup, and therefore, are subject to change.

Table 1 displays constants that typically remain unchanged during operation of the method embodiments of the present invention. Table 1 also includes example values associated with each of the constants.

Note that the values associated with the parameters displayed in Table 1 can be adjusted based on different near and far room conditions and room configurations. The parameter P is the number of digital signals in the digital signal vector described below with reference to step 1004 in Figure 10. The parameter N is the number of digital signals in the frequency domain vectors , and The parameters β, η, λ, K u and AT 2 are values that are used to assign relative importance or weight to terms in equations described below with reference to Figures 10-14. The parameters G m0x and G min are the maximum and minimum gain associated

with each of the J microphones. The parameter M is used during double talk described below with reference to Figure 10.

Table 2 shows initial values for variable parameters that change during iterations of the methods described below with reference to Figures 1 1 -12:

The parameters and are short-term energy variances associated with the digital signals [ ] and , respectively. The parameter ^ is the short-term energy variance associated with the energy in the controlled digital signal . The parameters σ and are long-term energy variances associated with the digital signals *' [«] and yf [n] , respectively. The parameter σ* n is a long-term energy variance associated with noise. The parameters M and M are maximum square energies associated with the digital signals and

The parameter G^ 0 ' is an initial gain adaptation value described below with reference to Figure 12.

Table 3 shows initial values for components of vectors £ hf , and h^ 0 ' that change during iterations of the method described below with reference to Figures 13-14:

Table 3

The vector S is an initial average spectrum associated with the vector , and the vector is an initial average spectrum associated with a vector ' , which are described below with reference to step 1402 in Figure 14A. The vector defines an evolving trust region described below with reference to step 1405 in Figure 14A. The vector is an initial shadow-impulse-response vector described below with reference to step 1408 in Figure 14A. The vector is an initial approximate-impulse-response vector described below with reference to step 1409 in Figure 14A.

In the^oMoop beginning in step 1002, steps 1003-1015 are repeated for each decision epoch m. In the jϊv-loop beginning in step 1003, steps 1004-1012 are repeated for each time sample n. In step 1004, the estimation and tracking unit 802

receives / digital signals output from the far room 104, and J digital signals output from the near room 102. In step 1005, a set of digital signal vectors is formed, where each digital signal vector is an (Z, + P) - component digital signal vector:

formed from the digital signal and L + P - 1 previous digital signals > anc * a set °f m i cro ph° ne "d'gital signal vectors is formed, where each microphone-digital signal vector is a P - component microphone-digital signal vector

formed from the microphone-digital signal yy' [n] and P - 1 previous digital signals In step 1006, a fast Fourier transformation ("FFT") is applied to the each vector in the sets , and in order to obtain corresponding sets of domain frequency vectors , where vector elements are given by:

An FFT and a corresponding inverse fast Fourier transform ("IFFT") are types of Fourier transformations that are often employed to avoid carrying out convolution in the time sample domain. Using the FFT and the IFFT can be hundreds or even thousands of times faster than convolving digital signals in the time sample domain. A number of different FFT and IFFT methods are described in the book "Discrete- Time Signal Processing (2 nd Edition)," by A. Oppenhiemer, R. Schafer, and J. Buck, Prentice Hall, Inc., (1999-2000), which is just one of many references for the field of digital signal processing. Additional details can be obtained from the above- referenced book, or from many other textbooks, papers, and journal articles in this field. In step 1007, a set of controlled digital signal vectors are computed, where each controlled digital signal vector is computed as follows:

where

Tr N^ is a truncation operator of length N d ,

"° " represents component-wise multiplication of two vectors,

, and

represents the frequency domain acoustic echo in theyth microphone-digital signal.

Component-wise multiplication of the 3-tuples (1,2, 3) and (3,1,2) is represented by: (1,2, 3)o (3, 1,2) = (l - 3,2 1,3- 2) = (3,2,6)

In step 1008, the routine "determine control state" is called, which identifies four types of audio signal transmissions that can exist between the near room 102 and the far room 104. The four types of audio signal transmissions are called "control states" ("CS") and are identified as follows: (1) an audio signal is output from the near end room 102 only and is represented by S NE O', (2) an audio signal is output from the far end room 104 only and is represented by SFEO', (3) audio signals are simultaneously output from both the near end room 102 and the far end room 104, which is called "double-talk," is represented by Sof, and (4) no audio signals output from the near end room 102 and the far end room 104 is represented by SVs- In step 1009, the routine "residual echo suppression" is called in order to compute J gain-corrected processed digital signals In step 1010, the processed digital signals are transm ' tte d to the far room 104. In step 101 1, when n is less than or equal to (m + Ϋ) N d - 1 , control passes to step 1012, otherwise control passes to step 1013. In step 1012, the time sample n is incremented by "1," and steps 1004-1011 are repeated. In other words, the approximate-impulse-response vectors are not updated during the N < / decision period. In step 1013, when another decision epoch m is available, control passes to step 1014, otherwise the routine "determine >s terminated. In step 1014, the decision epoch m is incremented by "1."

In step 1015, the routine "determine is called. In other words, a new decision period with N d decision epochs is initiated and the approximate impulse response vectors are updated.

Figures 1 1 A- 1 1 C show a control-flow diagram for the routine "determine control state" called in step 1008 in Figure 10 and represents an embodiment of the present invention. In \hefor-\oop beginning in step 1 101 of Figure 1 IA, steps 1 102-

1 1 14 are repeated for each index / e {l,...,/} . In step 1 102, an average square energy associated with the vecto ; [ ] is computed as follows:

and a long-term energy variances is recursively determined for the average square energies by:

In steps 1 103-1 107, a maximum square energy M? J-" 1 ' associated with the digital signal vector is determined. In step 1 103, when the maximum square energy ' is less than the average square energy E ' control passes to step 1 104, otherwise control passes to step 1 105. In step 1 104, the maximum square energy is computed by:

and control passes to step 1 108. In step 1 105, when the average square energy Ef} m ϊ is greater than 0.1λf,^ w" ^ control passes to step 1106, otherwise control passes to step 1 107. In step 1 106, the maximum square energy is computed by:

and control passes to step 1 108. In step 1 107, the maximum square energy is assigned the value of ' from the previous decision period. In steps 1 108- 1 1 13, Boolean logic values "TRUE" and "FALSE" are assigned to Boolean variables

Tl,, T2,. In step 1108, when the long-term variance is greater than or equal to \ control passes to step 1109 and Tl, is set to "TRUE," otherwise control passes to step 1110 and Tl, is set to "FALSE." In step 1111, when the average square energy is greater than or equal to , control passes to step 1112 and 72, is set to 'TRUE," otherwise control passes to step 1114 and T2, is set to "FALSE." In step 1114, when i is greater than / control passes to step 1116, in Figure 1 IB, otherwise control passes to step 1115. In step 1115, /is incremented by "1" and steps 1102-1114 are repeated.

In the for-loop beginning in step 1116 of Figure HB, steps 1117-1131 are repeated for each index j e{l,...j) . In step 1116, average energies are computed for and a shadow-error vector as follows:

and long-term energy variances are recursively determined for the average square energies by:

In steps 1118-1122, a maximum square energy associated with the digital signal vector is determined. In step 1118, when the maximum square energy is less than the average square energy control passes to step 1119, otherwise control passes to step 1120. In step 1119, the maximum square energy is computed by:

and control passes to step 1 123. In step 1120, when the average square energy is greater than control passes to step 1 121, otherwise control passes to step 1 121. In step 1121, the maximum square energy is computed by: and control passes to step 1 123. In steps 1 123-1 128, Boolean logic values "TRUE" and "FALSE" are assigned to Boolean variables T3 P T4 } . In step 1 123, when the average square energy is greater than or equal to control passes to step 1 124 and T3 j is set to "TRUE," otherwise control passes to step 1 125 and TS 1 is set to "FALSE." In step 1 126, when the long-term variance is greater than or equal to ' , control passes to step 1 127 and T4 } is set to

"TRUE," otherwise control passes to step 1 128 and T4 } is set to "FALSE." In step 1 129, an echo return loss enhancement value ("ERLE") is calculated according to:

where and

are short-term variances associated with the average square energies E and , respectively. The ERLE can be used to gauge the performance of acoustic echo cancellation when the system is in the state SF EO and determine approximate- impulse-response vectors when the system is in double talk described below in step

1309 of Figure 13 A. In step 1 130, when j is greater than J control passes to step

1 132, in Figure HC, otherwise control passes to step 1 131. In step 1 131, j is incremented by "1" and steps 1 1 17-1 130 are repeated.

In the for-loop beginning in step 1 132 of Figure HC, steps 1 133-1 145 are repeated for each j e {\,... j} . In the for-loop beginning in step 1 133, steps 1 134-

1 143 are repeated for each ι ε {l,...,/} . In steps 1 134-1 141, the Boolean logic

values determined in steps 1108-1 113 in Figure HA and steps 1 123-1 128 in Figure 1 IB are used to determine a control state CS. j for each echo path. In step 1 134, when Tli and T3 j are "TRUE," and 72, and 7V, are "FALSE," control passes to step 1135 and CS, j is assigned S NEO , otherwise control passes to step 1 136. In step 1 136, when Tl, and T4, are "TRUE," and 72, is "FALSE," control passes to step 1137, and CS 0 is assigned S^s, otherwise control passes to step 1138. In step 1 138, when 72, is "TRUE" and Tl 1 , T3 j , and 7V, are "FALSE," control passes to step 1 139, and CS tJ is assigned SFEO, otherwise control passes to step 1 140. In step 1 140, when 72, and 73, are "TRUE," and Tl, and 7V, are "FALSE," control passes to step 1 141, and CS tJ is assigned S DT , otherwise control passes to step 1142. In step 1 142, when i is greater than / control passes to step 1 144, otherwise control passes to step 1143. In step 1 143, / is incremented by "1" and steps 1 134-1142 are repeated. In step 1 144, when/ is greater than J, and are returned, otherwise, in step 1 145, j is incremented by "1" and steps 1 133-1144 are repeated. Figure 12 is a control-flow diagram for the routine "residual echo suppression" called in step 1009 in Figure 10 and represents an embodiment of the present invention. In the for-loop beginning in step 1201, steps 1202-1207 are repeated for each index/. The control states of the audio transmission are used to correct for gain in each of the processed digital signals For example, during double-talk, or when an audio signal is produced in the near room 102 only, the gain acquired by the controlled digital signals is larger than when an audio signal is produced only in the far room 104 or when there is no audio signal transmitted at all. The maximum and minimum amounts of gain that the controlled digital signal can acquire are represented by the constants G max and G mm , respectively. Steps 1202-1205 adjust the gain in each of the processed digital signals

In step 1202, when there is double talk or sound produced in the near room 102 only, control passes to step 1203, otherwise control passes to step 1204. In step 1203, the gain is computed as follows:

d m) = K 2 G^ 1 K(I - K 2 )G 11n where AT 2 is a weight assigning a lower relative importance to G (m~1) than to the maximum gain G n ^. In step 1204, the gain is computed as follows:

G H = K ^ m - ] K(\ - K 1 )G^ n where K \ is a weight assigning a relatively higher importance to G (m~1) than to the minimum gain G mn . In step 1205, theyth processed digital signal is determined by: r j ; [»J = G V 'η ' [n\

In step 1206, when y is less than or equal to J, control passes to step 1207 and j is incremented by "1," otherwise processed digital signals are returned. Figure 13A is a control-flow diagram for the routine "determine called in step 1015 in Figure 10 and represents an embodiment of the present invention. In the^or-loop beginning in step 1302, steps 1302-1313 are repeated for each i ε {l,..., /} . In the yόr-loop beginning in step 1302, steps 1303-131 1 are repeated for each j ε {l,...J} . In step 1303, when the CS y is SFEO, control passes to step 1304, otherwise control passes to step 1305. In step 1304, the routine "compute hjj m ' " is called and an approximate-impulse-response vector h^"' is computed. In step 1305, when the CS ij S NEO or S NS , control passes to step 1306, otherwise control passes to step 1307. In step 1306, the impulse-response vector is updated by assigning elements in the vector to corresponding values in the previous impulse response and the shadow-impulse-response vector is updated by assigning elements in the shadow-impulse-response vector to corresponding elements in the shadow-impulse-response vector from the previous decision period. Steps 1307, 1308, and 1309 are used to guard against misinterpreting the presence of double talk as described below with reference to Figures 13B-13C. In step 1307, when "count" does not equal "0," control passes to step 1307, otherwise

control passes to step 1309. In step 1308, the "count" is decremented by "1" and control passes to step 1306. In step 1309, the approximate-impulse-response vector for double talk is selected form an impulse-response data structure represented by a set . In step 1310, the "count" is assigned the value M. The value Ncan be 10, 12, 15, 18, or any other suitable value. In step 1310, wheny is greater than J, control passes to step 1312, otherwise control passes to step 1311. In step 131 1 , i is incremented by "1" and steps 1302-1310 are repeated. In step 1312, when / is greater than /, the set of approximate impulse response vectors is returned, otherwise control passes to step 1313 andy is incremented by "1." Note that there are Ix J impulse-response data structures, one impulse- response data structure for each different echo path. Each data structure includes the K most recent approximate impulse responses with the highest signal to noise ratio. The subscripts on the decision epochs m correspond to the order in which the impulse response vectors have been added to the data structure. For example, in the data structure in step 1309, the impulse response vector has been in the data structure the longest, and the impulse response vector is the most recent addition to the data structure. Note that the approximate-impulse-response vector is not updated during double talk because during double talk the signal-to-noise ratio decreases rapidly. The increased noise distorts the approximate-impulse-response vector. In order to avoid this distortion, the approximate-impulse-response vector is selected from the corresponding impulse-response data structure. In various embodiments of the present invention, the approximate-impulse-response vector can be selected based on any number of different criteria. For example, in one embodiment of the present invention, an approximate-impulse-response vector can be selected in step 1309 according to which impulse-response vector has the largest corresponding In another embodiment of the present invention, in step 1309, the approximate-impulse-response vector can be selected from the

corresponding data structure based on which approximate-impulse-response vector has been in the data structure the longest, which is

In the control-flow diagram of Figure 13 A, the steps 1307, 1308, and 1306 are repeated for M iterations in order avoid misinterpreting noises transmitted in a corresponding echo paths as SD T - Figures 13B-13C shows two plots of amplification energy versus time for the four types of control states that represents an embodiment of the present invention. In Figures 13B-13C, vertical axes, such as vertical axis 1314, represent amplification energy associated with signals transmitted between the near room 102 and the far room 104, horizontal axes, such as horizontal axis 1315, represent time, and horizontal dashed lines, such as dashed line 1316, correspond to a double talk threshold energy, E,/,. Curves 1318 and 1320 represent the amplification energies associated with signals transmitted between the near room 102 and the far room 104. Amplification energies below the double talk threshold Ea, correspond to an S F EO, S NE O, or SVs control state. Peak 1322 corresponds to amplification energy resulting from double talk, which exceeds the double talk threshold £, λ . However, peak 1324 corresponds to an echo path noise produced in the near room 102 or the far room 104. This noise initially creates the appearance of double talk because the amplification energy exceeds the double talk threshold energy E th even though double talk is not actually taking place. In order to avoid misinterpreting short duration noises as double talk, at time 1326, a countdown begins with the variable "count," described with reference to steps 1306, 1307, and 1305, which avoids selecting an approximate-impulse-response vector from the data structure in step

1308 until double talk has been confirmed for M iterations. In other words, the method of the present invention continues operating as if double talk has not occurred for M decision periods. If after M iterations the amplification energy has decreased, as indicated at time 1328 in Figure 13B, inappropriate selection of an approximate- impulse-response vector for double talk has been avoided. On the other hand, if after M iterations the amplification energy has increased, as indicated the curve 1320 at time 1330 in Figure 13C, an approximate impulse-response for double talk is selected in step 1309.

The methods now described with reference to Figures 14-17 are repeated for each echo path. Figure 14A is a control-flow diagram for the routine "compute

/JiJ" 1 ) " called in step 1304 in Figure 14 and represents an embodiment of the present invention. In step 1401, the FFT is applied to the shadow impulse-response vector /jj" 7"1 ^ in order to obtain a frequency domain dependent vector:

In step 1402, a frequency domain, shadow-error vector is computed as follows:

where

Tr P is a truncation operator of length P, and

In step 1403, an IFFT is applied to the frequency domain, shadow-error vector in order to obtain a shadow-error vector:

In step 1404, the routine "determine ε;™'" is called to calculate a shadow mismatch vector ε}™' . In step 1405, an evolving trust-region vector is determined in accordance with the following recursive formula:

The parameter 7 is a weighting factor. The trust-region vector is used to compute the shadow update step size μ l} m in the routine "determine μ, J m " called in step 1406. The trust-region vector 1 ^ is also used to compute and the adaptation step size γ tj m in the routine "determine y υ>m " called in step 1407. In step 1408, the shadow mismatch vector , determined in step 1404, and the shadow update step size μ lj m , determined in step 1406, are used to recursively compute a shadow- impulse-response vector:

In step 1409, the adaptation step size χ lj m , determined in step 1407, is used to recursively compute the approximate-impulse-response vector as follows:

The parameter γ tJ m is used to weighting factor. In step 1410, when the ERLEy' is greater than a threshold value C, control passes to step 141 1. The threshold value C can be 10, 12, 15, or any other suitable value for weighting the approximate-impulse- response vector hjf"' . In step 1411, the impulse-response data structure described above with reference to step 1309, in Figure 13A:

is updated.

The impulse-response vectors are arranged in the impulse-response data structure in order of increasing decision epoch values as follows /W 1 > m 2 > ... > m κ _ χ > m κ , where the decision epoch m κ corresponds to an approximate-impulse-response vector that has been in the data structure for the longest period of time, and the decision epoch /M 1 corresponds to the most recent approximate-impulse-response vector added to the data structure. In one embodiment of the present invention, the data structure can be updated in step 1411 by removing the impulse-response vector from the data structure and adding the most

recently computed impulse-response vector h, ™' to the data structure computed in step 1409, which gives the impulse response data structure:

In other embodiments of the present invention, the data structure can be updated in step 141 1 according to the magnitude of the ERLE values associated with each approximate-impulse-response vector. For example, the approximate-impulse- response vector with the smallest associated ERLE value is removed from the data structure in order to accommodate addition of the most recently computed approximate-impulse-response vector computed in step 1409 and have an ERLE satisfying the threshold condition in steps 1410.

The shadow update step size μ l] m determined by the routine "determine μ v „ " called in the step 1406 substantially ensures that the shadow-impulse-response vector determined in the step 1408 lies within an evolving "trust region" that lies within an Z-dimensional search space U. For every shadow-impulse-response vector there is a separate trust region that lies within the search space U. The trust region can be have many different hyper-dimensional shapes, such as a hyper- spherical region, a hyper-elliptical region, a hyper-cubic region, or any other hyper- dimension region that lies within the search space U.

In one embodiment of the present invention described with reference to Figure 14B, the trust region is assumed to be a hyper-elliptical region that lies within the search space U. Figure 14B provides a two-dimensional illustration of a hyper- elliptical region 1420 located within the search space U 1422 that represents an embodiment of the present invention. Although in Figure 14B the regions 1420 and 1422 are represented in two-dimensions, in practice these regions are actually L- dimensional vector subspaces of D L and are centered at the origin of D L . The shape of the hyper-elliptical region 1420 is determined by the trust-region vector AJ" 1"1 ) computed in step 1405, which in this example is a hyper-elliptical vector defining

shape of the hyper-elliptical region 1420. In other words, as shown in Figure 14B, six previously determined shadow-impulse-response vectors represented by dots lie within the hyper-elliptical region 1420 which is associated with the vector A) j . The hyper-elliptical region 1420 is referred to as an "evolving" hyper-elliptical region because when a subsequent hyper-ellipsoid-specification vector ' is determined, the associated region changes orientation within the search space U 1422 but this newly oriented elliptical region retains a substantially elliptical shape. A newly computed shadow-impulse-response vector computed according to the recursive formula:

falls within a newly oriented hyper-elliptical region because the shadow update step size μ tJ m is a function of the hyper-ellipsoid-specification vector Ay , as described below with reference to Figure 15. As shown in Figure 14B, when the magnitude and direction of the shadow mismatch vector έ* m ' is large, as indicated by vector 1424, the shadow update step size μ y m reduces the magnitude and changes the direction of the shadow mismatch vector , as indicated by vector 1426. As a result, the shadow-impulse-response vector lies within a newly oriented hyper-elliptical region 1428.

Note that in other embodiments of the present invention, rather than computing a single shadow update step size μ lj m to update the shadow impulse- response vector, the magnitude of the shadow impulse-response vector can be changed to lie within the trust region by computing a second vector that is added to the shadow impulse-response vector n so that the resulting shadow impulse- response vector lies within the trust region. Figure 15 is a control-flow diagram for the routine "determine y " called in step 1404 in Figure 14 and represents an embodiment of the present invention. In

step 1501 , an average spectrum of the frequency domain vector Xy' is computed as follows:

and an average spectrum of the frequency domain shadow-error vector Ry' is computed as follows: where

and

The parameter β is a weighting factor. In step 1502, the maximum energies computed in Figure 1 1 are used to determine:

In steps 1503-1506, elements of an N-component, frequency domain, preconditioning vector are determined. In the for-loop beginning in step 1503, steps 1504-1506 are repeated for each frequency domain index k. In step 1504, the elements of the preconditioning vector are calculated as follows:

where k ranges from 0 to N - 1. In step 1505, when k is less than N-I , control passes to step 1506, otherwise control passes to step 1507. In step 1506, the index k is incremented by the value "1." In step 1507, the vector ε;J"' is computed as follows:

where

Tr 1 is a truncation operator of size L, and

Figure 16 is a control-flow diagram for the routine "determine μ y m " called in step 1406 in Figure 14 that represents an embodiment of the present invention. In step 1601 , a parameter is computed as follows:

In step 1602, when is greater than the value "1," control- passes to step 1603, otherwise control passes to step 1604. In step 1603, μ tJ m is assigned the value

. In step 1604, μ IJ m is assigned the value "0.2."

Figure 17 is a control-flow diagram for the routine "determine γ m " called in step 1407 in Figure 14 that represents an embodiment of the present invention. In step 1701 , a parameter γ _ scale] "' is computed as follows:

In step 1702, when - and after M time samples, control passes to step 1703, otherwise, control passes to step 1704. In step 1703, a parameter c lj m is computed recursively as follows:

In step 1704, the parameter c lj m is computed recursively as follows:

In step 1705, γ is assigned a value according to:

where v lj m is computed in step 1502 of Figure 15. In step 1706, when y _scale)™' is greater than "1," control passes to step 1707. In step 1707, γ lj m is assigned the value:

Although the present invention has been described with respect to one embodiment, methods of the present invention are not limited to this embodiment. For example, in other embodiments of the present invention, acoustic echo cancellation may only be applied to a portion of the microphone digital signals. In particular, acoustic echo cancellation methods of the present invention may be applied to only those microphone-digital signals with amplitudes greater than some predetermined threshold rather than applying the method to all of the microphone- digital signals.

The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. The foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive of or to limit the invention to the precise forms

disclosed. Obviously, many modifications and variations are possible in view of the above teachings. The embodiments are shown and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents: