Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR ACOUSTIC ECHO CANCELLATION
Document Type and Number:
WIPO Patent Application WO/2019/112467
Kind Code:
A1
Abstract:
Embodiments of the present application provide a method and an apparatus for acoustic echo cancellation for improving detection accuracy of acoustic echo cancellation and reducing echo residual. The method includes: the first terminal picks up a signal including an echo signal caused by playing a reference signal by the first terminal, where the reference signal is a speech signal received by the first terminal from the second terminal; the first terminal performs an adaptive filtering to the reference signal and the picked-up signal by using a Kalman filter and a variable-step normalized least mean square (NLMS) filter, to obtain a first residual signal and a second residual signal, respectively; the first terminal performs a hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal; the first terminal performs a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal; the first terminal performs a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression. Embodiments of the present invention provides a hybrid adaptive filter based on Kalman filtering and variable-step NLMS filtering, and designs a method for multiple times of NLP and an implicit single-talking/double-talking determining method, to perform more accurate residual echo suppression gain calculation without the need of performing delay estimation. The solution may notably improve robustness of the acoustic echo cancellation algorithm, and achieve a smooth full duplex call effect.

Inventors:
VASILYEV VLADISLAV IGOREVICH (CN)
FAN FAN (CN)
SARANA DMITRY VLADIMIROVICH (CN)
Application Number:
PCT/RU2017/000925
Publication Date:
June 13, 2019
Filing Date:
December 08, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
VASILYEV VLADISLAV IGOREVICH (CN)
International Classes:
H04M9/08
Domestic Patent References:
WO2011133075A12011-10-27
Foreign References:
US20150256929A12015-09-10
Other References:
None
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY & PARTNERS" LTD. et al. (RU)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A method for acoustic echo cancellation, at a first terminal, where the first terminal conducts speech communications with a second terminal, comprising:

picking up (201), by the first terminal, a signal comprising an echo signal caused by playing a reference signal by the first terminal, where the reference signal is a speech signal received by the first terminal from the second terminal;

performing (202), by the first terminal, an adaptive filtering to the reference signal and the picked-up signal by using a Kalman filter and a variable-step normalized least mean square (NLMS) filter, to obtain a first residual signal and a second residual signal, respectively;

performing (203), by the first terminal, a hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal;

performing (204), by the first terminal, a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal;

performing (205), by the first terminal, a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression.

2. The method according to claim 1, the performing (203), by the first terminal, hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal, comprises:

determining (30), by the first terminal, an energy of the first residual signal and an energy of the second residual signal, on multiple frequency bins, respectively; and

selecting (31), by the first terminal, on each of the multiple frequency bins, a residual signal whose energy is smaller between the first residual signal and the second residual signal, to obtain the target residual signal.

3. The method according to claim 1, the performing (204), by the first terminal, a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal, comprises:

determining (40), by the first terminal, the estimate echo signal using the Kalman filter and the variable-step NLMS filter together;

determining (41), by the first terminal, a first echo power spectrum of the estimate echo signal;

performing (42), by the first terminal, a harmonic generation processing to the estimate echo signal, to obtain a power spectrum after harmonic generation; performing (43), by the first terminal, a frequency spectrum splicing to the power spectrum after harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum;

performing (44), by the first terminal, a smoothing processing to the second echo power spectrum, to obtain a third echo power spectrum;

selecting (45), by the first terminal, on each of the multiple frequency bins, an echo power spectrum whose energy of frequency bin is bigger between the third echo power spectrum and the second echo power spectrum, to obtain the estimate residual echo signal.

4. The method according to claim 1, the performing (205), by the first terminal, a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression, comprises:

determining (50), by the first terminal, an energy of the target residual signal and an energy of the estimate residual echo signal;

performing (51), by the first terminal, an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression;

performing (52), by the first terminal, a pitch signal detection to the signal after initial residual echo suppression;

performing (53), by the first terminal, a harmonic enhancement to the signal after initial residual echo suppression to obtain a signal after harmonic enhancement, when the pitch signal is detected in the signal after initial residual echo suppression;

performing (54), by the first terminal, a secondary residual echo suppression to the signal after harmonic enhancement through a secondary gain calculation, to obtain a signal after secondary residual echo suppression;

performing (55), by the first terminal, a cepstral smoothing processing to the signal after secondary residual echo suppression, to obtain a signal after cepstral smoothing;

performing (56), by the first terminal, a final residual echo suppression to the signal after cepstral smoothing through a final gain calculation, to obtain a signal after echo suppression.

5. The method according to claim 4, after the performing (S52), by the first terminal, a pitch signal detection to the signal after initial residual echo suppression, comprises:

performing (60), by the first terminal, a cepstral smoothing processing to the signal after initial residual echo suppression, to obtain a signal after cepstral smoothing, when a pitch signal is not detected in the signal after initial residual echo suppression;

performing (61), by the first terminal, a final residual echo suppression to the signal after cepstral smoothing through a final gain calculation, to obtain a signal after echo suppression.

6. The method according to claim 4, the performing (51), by the first terminal, an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression, comprises:

determining (71), by the first terminal, a prior signal-to-echo ratio according to the target residual signal and the estimate residual echo signal;

performing (72), by the first terminal, the initial gain calculation according to the prior signal-to-echo ratio.

7. The method according to claim 1 , after the performing (204), by the first terminal, a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal, further comprises:

generating (81), by the first terminal, scenario identification information according to the reference signal, the picked-up signal and the target residual signal, where the scenario identification information comprises at least one of a magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path;

performing (82), by the first terminal, a dynamical adjustment to the estimate residual echo signal according to the scenario identification information.

8. An apparatus for acoustic echo cancellation, at a first terminal, where the first terminal conducts speech communications with a second terminal, comprising:

a signal acquiring module (901), configured to pick up a signal comprising an echo signal caused by playing a reference signal by the first terminal, where the reference signal is a speech signal received by the first terminal from the second terminal;

an adaptive filtering module (902), configured to perform an adaptive filtering to the reference signal and the picked-up signal by using a Kalman filter and a variable-step normalized least mean square (NLMS) filter, to obtain a first residual signal and a second residual signal, respectively;

a hybrid filtering module (903), configured to perform a hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal;

a residual echo estimation module (904), configured to perform a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal;

a residual echo suppression module (905), configured to perform a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression.

9. The apparatus according to claim 8, the hybrid filtering module (903) further comprises: a first energy determining module (9031), configured to determine an energy of the first residual signal and an energy of the second residual signal, on multiple frequency bins, respectively; and

a first signal selecting module (9032), configure to select, on each of the multiple frequency bins, a residual signal whose energy is smaller between the first residual signal and the second residual signal, to obtain the target residual signal.

10. The apparatus according to claim 8, the residual echo estimation module (904) further comprises:

an estimate echo signal determining module (9041), configured to determine the estimate echo signal using the Kalman filter and the variable-step NLMS filter together;

a power spectrum determining module (9042), configured to determine a first echo power spectrum of the estimate echo signal;

a harmonic generation processing module (9043), configured to perform a harmonic generation processing to the estimate echo signal, to obtain a power spectrum after harmonic generation;

a frequency spectrum splicing module (9044), configured to perform a frequency spectrum splicing to the power spectrum after harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum;

a smoothing processing module (9045), configured to perform a smoothing processing to the second echo power spectrum, to obtain a third echo power spectrum;

a second signal selecting module (9046), configured to select, on each of the multiple frequency bins, an echo power spectrum whose energy of frequency bin is bigger between the third echo power spectrum and the second echo power spectrum, to obtain the estimate residual echo signal.

11. The apparatus according to claim 8, the residual echo suppression module (905) further comprises:

a second energy determining module (9051), configured to determine an energy of the target residual signal and an energy of the estimate residual echo signal;

an initial residual echo suppression module (9052), configured to perform an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression;

a pitch signal detecting module (9053), configured to perform a pitch signal detection to the signal after initial residual echo suppression;

a harmonic enhancement module (9054), configured to perform a harmonic enhancement to the signal after initial residual echo suppression to obtain a signal after harmonic enhancement, when the pitch signal is detected in the signal after initial residual echo suppression;

a secondary residual echo suppression module (9055), configured to perform a secondary residual echo suppression to the signal after harmonic enhancement through a secondary gain calculation, to obtain a signal after secondary residual echo suppression;

a cepstral smoothing module (9056), configured to perform a cepstral smoothing processing to the signal after secondary residual echo suppression, to obtain a signal after cepstral smoothing;

a final residual echo suppression module (9057), configured to perform a final residual echo suppression to the signal after smoothing through a final gain calculation, to obtain a signal after echo suppression.

12. The apparatus according to claim 11, where:

the cepstral smoothing module (9056), further configured to perform a cepstral smoothing processing to the signal after initial residual echo suppression, to obtain a signal after cepstral smoothing, when a pitch signal is not detected in the signal after initial residual echo suppression.

13. The apparatus according to claim 11, the initial residual echo suppression module (9052) further comprises:

a signal-to-echo ratio determining module (90521), configured to determine a prior signal- to-echo ratio according to the target residual signal and the estimate residual echo signal; an initial gain calculating module (90522), configured to perform the initial gain calculation according to the prior signal-to-echo ratio.

14. The apparatus according to claim 8, the first terminal further comprises:

a scenario identification module (906), configured to generate a scenario identification information according to the reference signal, the picked-up signal and the target residual signal, where the scenario identification information comprises at least one of a magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path;

the echo estimation module (904) is further configured to, after the scenario identification module generate a scenario identification information, perform a dynamical adjustment to the estimate residual echo signal according to the scenario identification information.

15. A computer-readable storage medium comprising instructions that when running on a computer cause the computer to execute any one of method of claims 1 -7.

Description:
METHOD AND APPARATUS FOR ACOUSTIC ECHO

CANCELLATION

TECHNICAL FIELD

[0001] The present application relates to the field of communications technology and, in particular, to a method and an apparatus for acoustic echo cancellation.

BACKGROUND

[0002] An acoustic echo is generated by an acoustic feedback process in which after a sound transmitted by a far-end is played by a near-end speaker, a near-end microphone retrieves the sound and transmits the sound to the far-end. Generation of the acoustic echo mainly includes the following several paths:

1) Echo transfer inside a call device: A microphone directly receives a sound signal played by a speaker;

2) Direct sound transfer outside the call device: The microphone receives, by using an acoustic transfer path outside the device, a direct sound of the signal played by the speaker;

3) Single time/multiple times of reflected sound transfer outside the call device: The microphone receives, by using an acoustic transfer path outside the device, single/multiple reflected sounds of the signal played by the speaker.

[0003] The acoustic echo mainly exists in a scenario in which a speaker is used as a sound speaking-out device in a call application, for example, in a scenario such as a handheld/hands- free mode of a mobile phone, a notebook computer call, a vehicle-mounted call or a video conference. If no processing is performed to the echo, a delay generated by a communications network (GSM/WCDM A/VOLTE, VoIP) severely affects a subjective call experience, or even a howling sound is generated.

[0004] With large-scale use of speech communications systems, acoustic echo cancellation (AEC) as an indispensable key algorithm module of a speech enhancement system is being widely used, where the AEC is a process of eliminating an acoustic echo. A common AEC algorithm generally includes two parts: an adaptive filtering (AF) and a nonlinear processing (NLP). The adaptive filtering part of a conventional acoustic echo cancellation technology mainly uses a normalized least mean square (NLMS) method. A NLP part mainly uses a gain calculation method based on a coherence function. However, in order to accurately calculate a coherence function, delays of a reference signal and a picked-up signal need to be estimated. Moreover, in order to ensure a smooth double-ended talking effect, a double-talking detection further needs to be performed.

[0005] The convergence speed of the adaptive filter in the prior art is low, NLP gain calculation is not accurate, the delay estimation and the double-talking detection are prone to being affected by various factors, and acoustic echo cancellation performance cannot be optimal.

SUMMARY

[0006] Embodiments of the present application provide a method and an apparatus for acoustic echo cancellation for improving detection accuracy of acoustic echo cancellation and reducing echo residual.

[0007] In order to solve the foregoing problems, the embodiments of the present application provide the following technical means.

[0008] In one aspect, embodiments of the present application provide a method for acoustic echo cancellation, at a first terminal, where the first terminal conducts speech communications with a second terminal, including:

[0009] picking up, by the first terminal, a signal including an echo signal caused by playing a reference signal by the first terminal, where the reference signal is a speech signal received by the first terminal from the second terminal;

[0010] performing, by the first terminal, an adaptive filtering to the reference signal and the picked-up signal by using a Kalman filter and a variable-step normalized least mean square (NLMS) filter, to obtain a first residual signal and a second residual signal, respectively;

[0011] performing, by the first terminal, a hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal;

[0012] performing, by the first terminal, a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal;

[0013] performing, by the first terminal, a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression. [0014] In a first implementation of the first aspect of the present application, the performing, by the first terminal, hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal, includes:

[0015] determining, by the first terminal, energy of the first residual signal and energy of the second residual signal, on multiple frequency bins, respectively; and

[0016] selecting, by the first terminal, on each of the multiple frequency bins, a residual signal whose energy is smaller between the first residual signal and the second residual signal, to obtain the target residual signal.

[0017] In a second implementation of the first aspect of the present application, the performing, by the first terminal, a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal, includes:

[0018] determining, by the first terminal, the estimate echo signal using the Kalman filter and the variable-step NLMS filter together;

[0019] determining, by the first terminal, a first echo power spectrum of the estimate echo signal;

[0020] performing, by the first terminal, a harmonic generation processing to the estimate echo signal, to obtain a power spectrum after harmonic generation;

[0021] performing, by the first terminal, a frequency spectrum splicing to the power spectrum after harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum;

[0022] performing, by the first terminal, a smoothing processing to the second echo power spectrum, to obtain a third echo power spectrum;

[0023] selecting, by the first terminal, on each of the multiple frequency bins, an echo power spectrum whose energy of frequency bin is bigger between the third echo power spectrum and the second echo power spectrum, to obtain the estimate residual echo signal.

[0024] In a third implementation of the first aspect of the present application, the performing, by the first terminal, a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression, includes:

[0025] determining, by the first terminal, an energy of the target residual signal and an energy of the estimate residual echo signal; [0026] performing, by the first terminal, an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression;

[0027] performing, by the first terminal, a pitch signal detection to the signal after initial residual echo suppression;

[0028] performing, by the first terminal, a harmonic enhancement to the signal after initial residual echo suppression to obtain a signal after harmonic enhancement, when the pitch signal is detected in the signal after initial residual echo suppression;

[0029] performing, by the first terminal, a secondary residual echo suppression to the signal after harmonic enhancement through a secondary gain calculation, to obtain a signal after secondary residual echo suppression;

[0030] performing, by the first terminal, a cepstral smoothing processing to the signal after secondary residual echo suppression, to obtain a signal after cepstral smoothing;

[0031] performing, by the first terminal, a final residual echo suppression to the signal after cepstral smoothing through a final gain calculation, to obtain a signal after echo suppression.

[0032] In a fourth implementation of the fourth implementation of the present application, after the performing, by the first terminal, a pitch signal detection to the signal after initial residual echo suppression, includes:

[0033] performing, by the first terminal, a cepstral smoothing processing to the signal after initial residual echo suppression, to obtain a signal after cepstral smoothing, when a pitch signal is not detected in the signal after initial residual echo suppression.

[0034] performing, by the first terminal, a final residual echo suppression to the signal after cepstral smoothing through a final gain calculation, to obtain a signal after echo suppression.

[0035] In a fifth implementation of the first aspect of the present application, the performing, by the first terminal, an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression, includes:

[0036] determining, by the first terminal, a prior signal-to-echo ratio according to the target residual signal and the estimate residual echo signal;

[0037] performing, by the first terminal, the initial gain calculation according to the prior signal-to-echo ratio.

[0038] In a sixth implementation of the first aspect of the present application, after the performing, by the first terminal, a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal, further includes:

[0039] generating, by the first terminal, a scenario identification information according to the reference signal, the picked-up signal and the target residual signal, where the scenario identification information comprises at least one of a magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path;

[0040] performing, by the first terminal, a dynamical adjustment to the estimate residual echo signal according to the scenario identification information.

[0041] In second aspect, embodiments of the present application provide an apparatus for acoustic echo cancellation, at a first terminal, where the first terminal conducts speech communications with a second terminal, including:

[0042] a signal acquiring module, configured to pick up a signal including an echo signal caused by playing a reference signal by the first terminal, where the reference signal is a speech signal received by the first terminal from the second terminal;

[0043] an adaptive filtering module, configured to perform an adaptive filtering to the reference signal and the picked-up signal by using a Kalman filter and a variable-step normalized least mean square (NLMS) filter, to obtain a first residual signal and a second residual signal, respectively;

[0044] a hybrid filtering module, configured to perform a hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal;

[0045] a residual echo estimation module, configured to perform a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal;

[0046] a residual echo suppression module, configured to perform a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression.

|0047] In a first implementation of the second aspect of the present application, the hybrid filtering module further includes:

[0048] a first energy determining module, configured to determine an energy of the first residual signal and an energy of the second residual signal, on multiple frequency bins, respectively; and

[0049] a first signal selecting module, configure to select, on each of the multiple frequency bins, a residual signal whose energy is smaller between the first residual signal and the second residual signal, to obtain the target residual signal.

[0050] In a second implementation of the second aspect of the present application, the residual echo estimation module further includes:

[0051] an estimate echo signal determining module, configured to determine the estimate echo signal using the Kalman filter and the variable-step NLMS filter together;

[0052] a power spectrum determining module, configured to determine a first echo power spectrum of the estimate echo signal;

[0053] a harmonic generation processing module, configured to perform a harmonic generation processing to the estimate echo signal, to obtain a power spectrum after harmonic generation;

[0054] a frequency spectrum splicing module, configured to perform a frequency spectrum splicing to the power spectrum after harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum;

[0055] a smoothing processing module, configured to perform a smoothing processing to the second echo power spectrum, to obtain a third echo power spectrum;

[0056] a second signal selecting module, configured to select, on each of the multiple frequency bins, an echo power spectrum whose energy of frequency bin is bigger between the third echo power spectrum and the second echo power spectrum, to obtain the estimate residual echo signal;

[0057] In a third implementation of the second aspect of the present application, the residual echo suppression module further includes:

[0058] a second energy determining module, configured to determine an energy of the target residual signal and an energy of the estimate residual echo signal;

[0059] an initial residual echo suppression module, configured to perform an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression; [0060] a pitch signal detecting module, configured to perform a pitch signal detection to the signal after initial residual echo suppression;

[0061] a harmonic enhancement module, configured to perform a harmonic enhancement to the signal after initial residual echo suppression to obtain a signal after harmonic enhancement, when the pitch signal is detected in the signal after initial residual echo suppression;

[0062] a secondary residual echo suppression module, configured to perform a secondary residual echo suppression to the signal after harmonic enhancement through a secondary gain calculation, to obtain a signal after secondary residual echo suppression;

[0063] a cepstral smoothing module, configured to perform a cepstral smoothing processing to the signal after secondary residual echo suppression, to obtain a signal after cepstral smoothing;

[0064] a final residual echo suppression module, configured to perform a final residual echo suppression to the signal after smoothing through a final gain calculation, to obtain a signal after echo suppression.

[0065] In a fourth implementation of the fourth implementation of the present application, [0066] the cepstral smoothing module, further configured to perform a cepstral smoothing processing to the signal after initial residual echo suppression, to obtain a signal after cepstral smoothing, when a pitch signal is not detected in the signal after initial residual echo suppression.

[0067] In a fifth implementation of the second aspect of the present application, the initial residual echo suppression module further includes:

[0068] a signal-to-echo ratio determining module, configured to determine a prior signal- to-echo ratio according to the target residual signal and the estimate residual echo signal;

[0069] an initial gain calculating module, configured to perform the initial gain calculation according to the prior signal-to-echo ratio.

[0070] In a sixth implementation of the second aspect of the present application, the first terminal further includes:

[0071] a scenario identification module, configured to generate a scenario identification information according to the reference signal, the picked-up signal and the target residual signal, where the scenario identification information comprises at least one of a magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path;

[0072] the echo estimation module is further configured to, after the scenario identification module generate a scenario identification information, perform a dynamical adjustment to the estimate residual echo signal according to the scenario identification information.

[0073] In third aspect, embodiments of the present application provide a computer-readable storage medium including instructions that when running on a computer cause the computer to execute any one of method of the first aspect to method of any implementations of the first aspect.

[0074] The present application provides a hybrid adaptive filter based on Kalman filtering and variable-step NLMS filtering, and designs a method for multiple times of NLP and an implicit single-talking/double-talking determining method, to perform more accurate residual echo suppression gain calculation without the need of performing delay estimation. The solution may notably improve robustness of the acoustic echo cancellation algorithm, and achieve a smooth full duplex call effect.

BRIEF DESCRIPTION OF DRAWINGS

[0075] To illustrate the technical solutions in embodiments of the present application more clearly, accompanying drawings needed in the embodiments or the prior art are illustrated briefly in the following. Apparently, the accompanying drawings show certain embodiments of the present application, and persons skilled in the art can derive other drawings from them without creative efforts.

[0076] FIG. 1 is a schematic diagram of an acoustic echo cancellation system applying a method for acoustic echo cancellation according to an embodiment of the present application;

[0077] FIG. 2 is a flow chart of a method for acoustic echo cancellation according to an embodiment of the present application;

[0078] FIG. 3 is a flow chart of a method for acoustic echo cancellation in a scenario according to an embodiment of the present application;

[0079] FIG. 4 is a flow chart of a residual echo suppression gain calculation in a scenario according to an embodiment of the present application; [0080] FIG. 5 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application;

[0081] FIG. 6 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application;

[0082] FIG. 7 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application;

[0083] FIG. 8 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application;

[0084] FIG. 9A is a schematic diagram of a first terminal according to an embodiment of the present application;

[0085] FIG. 9B is a schematic diagram of an adaptive filtering module according to an embodiment of the present application;

[0086] FIG. 9C is a schematic diagram of a hybrid filtering module according to an embodiment of the present application;

[0087] FIG. 9D is a schematic diagram of a residual echo estimation module according to an embodiment of the present application;

[0088] FIG. 9E is a schematic diagram of a residual echo suppression module according to an embodiment of the present application;

[0089] FIG. 9F is a schematic diagram of initial echo suppression module according to an embodiment of the present application;

[0090] FIG. 9G is a schematic diagram of another first terminal according to an embodiment of the present application;

[0091] FIG. 10 is a schematic diagram of an acoustic echo cancellation apparatus applying a method for acoustic echo cancellation according to an embodiment of the present application.

[0092] These drawings depict aspects of example embodiments for illustrative purposes, and variations, alternative configurations, alternative components and modifications may be made to these example embodiments.

DESCRIPTION OF EMBODIMENTS

[0093] Embodiments of the present application provide a method and an apparatus for acoustic echo cancellation for improving detection accuracy of acoustic echo cancellation and reducing echo residual.

[0094] The technical solutions in the embodiments of the present application are hereinafter described clearly and completely with reference to the accompanying drawings in the embodiments of the present application. Obviously, the embodiments described here are part of the embodiments of the invention and not all of the embodiments. All other embodiments obtained by persons skilled in the art on the basis of the embodiments of the present application without any creative efforts all fall within the scope of the invention.

[0095] FIG. 1 is a schematic diagram of an acoustic echo cancellation system applying a method for acoustic echo cancellation according to an embodiment of the present application. An embodiment of method for acoustic echo cancellation according to the present application is applied to speech double end devices conducting communication through microphone and speaker, for example, an acoustic echo cancellation system applying a method for acoustic echo cancellation, as shown in FIG. 1, the acoustic echo cancellation system includes a first terminal 10 and a second terminal 11. In the embodiments of the present application, the first terminal 10 communicates with the second terminal 11, for example, the first terminal 10 may communicate with the second terminal 11 through wireless network and the first terminal 10 conducts speech communication with the second terminal 11. Moreover, the first terminal 10 may also conduct speech communication with multiple terminals. The speech communication method between the first terminal 10 and multiple terminals is similar as speech communication method between the first terminal 10 and the second terminal 11 , which is referred to the illustration of multiple scenarios of the following embodiments of the present application.

[0096] FIG. 2 is a flow chart of a method for acoustic echo cancellation according to an embodiment of the present application, as shown in FIG. 2, the following steps are included.

[0097] S201, the first terminal picks up a signal comprising an echo signal caused by playing a reference signal by the first terminal, where the reference signal is a speech signal received by the first terminal from the second terminal.

[0098] In this embodiment, the second terminal 11 as far end transmits a speech signal to the first terminal 10 as near end, and the first terminal 10 receives the speech signal, where the speech signal received by the first terminal 10 is defined as a reference signal, which is treated as reference in the acoustic echo cancellation of the following embodiments. The first terminal

10 is equipped with speaker and microphone, where the speaker of the first terminal 10 plays the speech signal received by the first terminal 10, and the microphone of the first terminal 10 picks up the speech signal played by the speaker of the first terminal 10. The speech signal picked up by the microphone of the first terminal 10 is defined as picked-up signal, where the picked-up signal may be composed of, for example, an actual echo signal, and/or near-end speech, and/or background noise. The actual echo signal should be suppressed in case the actual echo signal is transmitted back to the second terminal 11 such that user of the second terminal

11 listens their own speech.

[0099] S202, the first terminal performs an adaptive filtering to the reference signal and the picked-up signal through Kalman filter and variable- step normalized least mean square (NLMS) filter, simultaneously, to obtain a first residual signal and a second residual signal, respectively.

[0100] In this embodiment, the first terminal simultaneously uses Kalman filter and variable-step NLMS filter in frequency domain so as to obtain robust adaptive filtering capability. The Kalman filter is also referred to Kalman adaptive filter. The convergence rates of the Kalman filter is relatively quick, and a convergence performance of the Kalman filter in double-talking is equivalent to that in single-talking, but an absolute filtering performance of Kalman filter is relatively weak. In contrast, the absolute filtering performance of the variable- step NLMS filter after the complete convergence is relatively strong, but the convergence rates of the variable-step NLMS filter is relatively slow and the frequency bins of variable-step NLMS filter is easy to be divergent. The use of a combination of the variable-step NLMS filter and the Kalman filter may make up the above respective disadvantage of the variable-step NLMS filter and the Kalman filter. Specifically, the convergence performance of the Kalman filter in double-talking is equivalent to that in single-talking, so as to make up for the disadvantage of the conventional variable-step NLMS filter, because the Kalman filter changes the step by directly using a covariance matrix of the filter coefficient error , which is treated as full automatic step change, without changing step artificially. Meanwhile, the hybrid adaptive filter based on the variable-step NLMS filter and the Kalman filter may cause the divergent frequency bins to be less, and cause the filtering performance to be more powerful.

[0101] The main process of the Kalman filter and variable-step NLMS filter in the hybrid adaptive filter are respectively illustrated hereafter.

[0102] S202, the first terminal performs an adaptive filtering to the reference signal and the picked-up signal using a Kalman filter and a variable-step normalized least mean square (NLMS) filter, simultaneously, to obtain a first residual signal and a second residual signal, respectively, specifically includes following steps.

[0103] S10, the first terminal performs adaptive filtering to the reference signal and the picked-up signal by using a Kalman filter coefficient updated in a previous frame, so as to output a first residual signal.

[0104] In this embodiment, the first residual signal may be composed of, for example, a residual echo signal of the Kalman filter, and/or a near-end speech, and/or a background noise. The reference signal has multiple frame and the picked-up signal has one frame. It is assumed that there is N frame and M frequency bin, therefore, the Kalman filter coefficient has N*M coefficient, where N and M are positive integer.

[0105] Sll, the first terminal calculates a covariance matrix of a residual signal of the Kalman filter by using a covariance matrix of the filter coefficient error.

[0106] In this embodiment, the residual signal of the Kalman filter is a difference between the actual echo signal in the picked-up signal and an echo signal estimated by the Kalman filter. The covariance matrix of the residual signal of the Kalman filter is calculated by:

S k = H k P k - 1 |*-i HJ + R k ,

where Sk is the covariance matrix of the residual signal of the Kalman filter, Pk-i \ k-i is covariance matrix of the filter coefficient error, Hk is the reference signal, Rk is noise signal, k is the current frame.

]0107] S12, the first terminal calculates a Kalman gain by using the covariance matrix of the residual signal of the Kalman filter.

[0108] In this embodiment, the Kalman gain is calculated by:

K = P k - \ [*— 1 H S ' ,

where Kk is the Kalman gain, Sk is the covariance matrix of the residual signal of the Kalman filter, Pk-i | k-i is covariance matrix of the filter coefficient error, Hk is the reference signal.

[0109] S13, the first terminal updates the Kalman filter coefficient by using the Kalman gain.

[0110] In this embodiment, the Kalman filter coefficient is calculated by:

Xk | k— Xk - 1 \k— 1 I- Kk Yk ,

where is the Kalman filter coefficient after updating, Xk-i \ k-i is the Kalman filter coefficient before updating, Kk is the Kalman gain, Yk is the residual signal after filtering.

[0111] S14, the first terminal updates the covariance matrix of the Kalman filter coefficient error for using in a next frame.

[0112] In this embodiment, the covariance matrix of the Kalman filter coefficient error after updating is calculated by:

[0114] where Pk\k is the covariance matrix of the Kalman filter coefficient error after updating, Pk-i \k-i is the covariance matrix of the Kalman filter coefficient error before updating, Kk is the Kalman gain, Hk is reference signal, Qk is the expected variance of KkYk.

[0115] S202, the first terminal performs an adaptive filtering to the reference signal and the picked-up signal using a Kalman filter and a variable-step normalized least mean square (NLMS) filter, simultaneously, to obtain a first residual signal and a second residual signal, respectively, specifically includes following steps.

[0116] S20, the first terminal performs the adaptive filtering to the reference signal and the picked-up signal according to a first variable-step NLMS filter coefficient, to obtain the second residual signal.

[0117] In this embodiment, the first terminal performs adaptive filtering to the reference signal and the picked-up signal according to an variable-step NLMS filter coefficient updated in a previous frame, to obtain the second residual signal, where the second residual signal may be composed of, for example, a residual echo signal of the variable-step NLMS filter, and/or a near-end speech, and/or a background noise. In general, the frame in speech is time segment in 10-30 ms, for example, for signal sampling in 8K, the frame in 10ms includes 80 sample points.

[0118] S21, the first terminal determines a smoothing energy of the reference signal and a smoothing energy of the second residual signal; and determines a low-speed smoothing energy of the second residual signal; and determines a frequency bin, at which the smoothing energy of the second residual signal is greater than the low-speed smoothing energy of the second residual signal; and performs the frequency bin constraint on the determined frequency bin to generate an third residual signal.

[0119] In this embodiment, the first terminal calculates a smoothing energy of the reference signal and a smoothing energy of the second residual signal, wherein a smoothing coefficient of the smoothing energy may be, for example, 0.9, and the first terminal calculates a low-speed smoothing energy of the second residual signal, wherein a smoothing coefficient of the low- speed smoothing energy may be, for example, 0.98, so as to determine frequency bin, at which the smoothing energy of the second residual signal is greater than the low-speed smoothing energy of the second residual signal; then the first terminal performs the frequency bin constraint on the determined frequency bin to generate an third residual signal.

[0120] S22, the first terminal adjusts a filter step according to the third residual signal and a preset threshold, to obtain an adjusted filter step.

[0121] In this embodiment, the first terminal uses the third residual signal and a preset threshold to adjust a filter step, so as to obtain an adjusted filter step, wherein the preset threshold may be, for example, 2e ~6 .

[0122] S23, the first terminal determines a second variable-step NLMS filter coefficient according to the smoothing energy of the reference signal, the smoothing energy of the second residual signal, and the adjusted filter step; and updates the variable-step NLMS filter according to the second variable-step NLMS filter coefficient.

[0123] In this embodiment, the first terminal calculates a new variable-step NLMS filter coefficient through using the smoothing energy of the reference signal, the smoothing energy of the second residual signal, and the adjusted filter step, and updates the variable-step NLMS filter through using the new variable-step NLMS filter coefficient.

[0124] S203, the first terminal performs hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal.

[0125] In this embodiment, after the first terminal performs the adaptive filtering through using the hybrid adaptive filter based on the Kalman filter and the variable-step NLMS filter, the first terminal may determine target residual signal of the hybrid adaptive filter according to the first residual signal and the second residual signal, where the target residual signal may be composed of, for example, an actual residual echo signal of the hybrid adaptive fdter, and/or a near-end speech, and/or a background noise. The target residual signal may represent the difference between the picked-up signal and an estimate echo signal, where the estimate echo signal is estimated by the hybrid adaptive filter that is a combination of the Kalman filter and the variable-step NLMS filter.

[0126] S203, the first terminal performs hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal, specifically includes the following steps.

[0127] S30, the first terminal calculates an energy of the first residual signal and an energy of the second residual signal, on multiple frequency bins, respectively and

[0128] S31, the first terminal selects, on each of the multiple frequency bins, a residual signal whose energy is smaller between the first residual signal and the second residual signal, so as to obtain the target residual signal.

[0129] S204, the first terminal performs a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal.

[0130] In this embodiment, when the first terminal obtains the target residual signal, the first terminal may also obtain an estimate echo signal estimated by the hybrid adaptive filter that is a combination of the Kalman filter and the variable-step NLMS filter. Based on the obtained estimate echo signal, the first terminal estimates the actual residual echo signal in the target residual signal, where the estimated residual echo is defined as the estimate residual echo signal.

[0131] S204, the first terminal performs a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal, specifically includes the following steps.

[0132] S40, determining, by the first terminal, the estimate echo signal using the Kalman filter and the variable-step NLMS filter together.

[0133] In this embodiment, the estimate echo signal may be obtained by calculating the deference between the picked-up signal and the target residual signal. The estimate echo signal may also be obtained by combining a first estimate echo signal estimated by the Kalman filter and a second estimate echo signal estimated by the variable-step NLMS filter.

[0134] S41, the first terminal determines a first echo power spectrum of the estimate echo signal.

[0135] S42, the first terminal performs a harmonic generation processing to the estimate echo signal, to obtain a power spectrum after harmonic generation.

[0136] S43, the first terminal performs a frequency spectrum splicing to the power spectrum after harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum.

[0137] S44, the first terminal performs smoothing processing to the second echo power spectrum, to obtain a third echo power spectrum.

[0138] S45, the first terminal selects, on each of the multiple frequency bins, an echo power spectrum whose energy of frequency bin is bigger between the third echo power spectrum and the second echo power spectrum, to obtain the estimate residual echo signal.

[0139] In this embodiment, the first terminal performs step 41 and step 42, where there is no sequence between step 41 and step 42. The first terminal performs harmonic generation processing to the estimate echo signal, for example, an energy of the estimate echo signal is calculated, then inverse Fourier transform is performed. Further, an absolute value is taken for a real part of energy of the estimate echo signal, and inverse transformation is performed to the result so as to perform an inverse transform to it to a frequency domain, then the real part thereof is taken, to complete harmonic generation processing. The use of harmonic generation may make up for, to an extent, frequency spectrum inconsistency between the reference signal and the picked-up signal caused by nonlinear distortion, so that the nonlinear echo is suppressed. Then the frequency spectrum splicing is performed to the power spectrum after the harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum. Finally, smoothing processing is performed to the second echo power spectrum, to obtain a third echo power spectrum, and, on each of the multiple frequency bins, maximum power spectrum between the third echo power spectrum and the second echo power spectrum is taken, so as to determine the power spectrum of the estimate residual echo signal. The use of smoothing may make an echo hangover of a reverberation scenario to be estimated in the energy spectrum of the picked-up signal, and in multiple times of nonlinear suppression, this type of echo reverberation signal may be suppressed.

[0140] S205, the first terminal performs a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression.

[0141] In this embodiment, after the first terminal estimates the residual echo signal, the first terminal performs a residual echo suppression to the target residual signal according to the estimate residual echo signal and output a signal after echo suppression through the gain calculation. The residual echo suppression may be performed one time or multiple times and the gain calculation may be completed by one gain calculation, or by multiple gain calculations, which is illustrated hereafter. The gain calculation may be performed by, for example, Maximum A Posteriori (MAP) method, wherein input parameters are a prior signal-to-echo ratio, a posterior signal-to-echo ratio, a gain adjustment constant and an inhibition intensity limit parameter. The gain calculation may also be performed by other method, for example, Wiener filter.

[0142] S205, the first terminal performs a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression, specifically includes the following steps.

[0143] S50, the first terminal determines an energy of the target residual signal and an energy of the estimate residual echo signal.

[0144] S51, the first terminal performs an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression.

[0145] In this embodiment, the first terminal tries to use the estimate residual echo signal to cancel the actual residual echo signal of the hybrid adaptive filter in the target residual signal.

[0146] S52, the first terminal performs a pitch signal detection to the signal after initial residual echo suppression.

[0147] In this embodiment, whether a pitch part exists in signal after initial residual echo suppression is determined by using a pitch signal detection method. This step actually is the implicit single-talking/double-talking detection method. If a pitch is detected, it indicates that a near-end speech exists.

[0148] S53, the first terminal performs a harmonic enhancement to the signal after initial residual echo suppression to obtain a signal after harmonic enhancement, when the pitch signal is detected in the signal after initial residual echo suppression. [0149] In this embodiment, the pitch signal detection can be performed by comparing energy of the signal after initial residual echo suppression with a preset pitch signal detection threshold. If the energy of the signal after initial residual echo suppression is larger than a preset pitch signal detection threshold, the pitch signal is detected. If the energy of the signal after initial residual echo suppression is less than a preset pitch signal detection threshold, the pitch signal is not detected. The pitch signal detection actually is the implicit single-talking/double- talking detection method.

[0150] S54, the first terminal performs a secondary residual echo suppression to the signal after harmonic enhancement through a secondary gain calculation, to obtain a signal after secondary residual echo suppression.

[0151] In this embodiment, the harmonic enhancement is performed in double-talking, so that an impaired near-end speech may be partially restored, so as to improve a subjective audition effect in double-talking.

[0152] S55, the first terminal performs a cepstral smoothing processing to the signal after secondary residual echo suppression, to obtain a signal after cepstral smoothing.

[0153] In this embodiment, the cepstral smoothing may further suppress a residual echo in target residual signal. The cepstral smoothing processing to the signal after secondary residual echo suppression, actually, is a smoothing to a prior signal-to-echo ratio.

[0154] S56, the first terminal performs a final residual echo suppression to the signal after smoothing through a final gain calculation, to obtain a signal after echo suppression.

[0155] In this embodiment, the foregoing S50 to S56 specifically illustrate a process of the multiple times of nonlinear suppression, thereby more accurate residual echo suppression gain calculation without the need of performing delay estimation is performed. Moreover, the foregoing S50 to S56 also specifically illustrate an implicit single-talking/double-talking determining method. Therefore, the embodiments of the present application notably improve robustness of the acoustic echo cancellation algorithm, and achieve a smooth full duplex call effect.

[0156] In this embodiment, after the first terminal performs a pitch signal detection to the signal after initial residual echo suppression, the following steps are included.

[0157] S60, the first terminal performs a cepstral smoothing processing to the signal after initial residual echo suppression, to obtain a signal after cepstral smoothing, when a pitch signal is not detected in the signal after initial residual echo suppression.

[0158] In this embodiment, the cepstral smoothing processing to the signal after initial residual echo suppression, actually, is a smoothing to a prior signal-to-echo ratio. The pitch signal detection can be performed by comparing energy of the signal after initial residual echo suppression with a preset pitch signal detection threshold. If the energy of the signal after initial residual echo suppression is larger than a preset pitch signal detection threshold, the pitch signal is detected. If the energy of the signal after initial residual echo suppression is less than a preset pitch signal detection threshold, the pitch signal is not detected. The pitch signal detection actually is the implicit single-talking/double-talking detection method.

[0159] S61 the first terminal performs a final residual echo suppression to the signal after cepstral smoothing through a final gain calculation, to obtain a signal after echo suppression.

[0160] In this embodiment, the cepstral smoothing may further suppress a residual echo in target residual signal.

[0161] For S52, if the pitch signal is detected, S53 is trigged to be performed, and if the pitch signal is not detected, S60 is trigged to be performed. If S60 is trigged to be performed, after the initial residual echo suppression is completed, a cepstral smoothing processing is directly performed to the signal after initial residual echo suppression to obtain a signal after cepstral smoothing. Finally, a final gain calculation is performed to the signal after smoothing for implementing a final residual echo suppression, to obtain a signal after echo suppression.

[0162] S51, the first terminal performs an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression, further includes the following steps.

[0163] S71, the first terminal determines a prior signal-to-echo ratio according to the target residual signal and the estimate residual echo signal.

[0164] S72, the first terminal performs the initial gain calculation according to the prior signal-to-echo ratio.

[0165] In this embodiment, after the terminal determines an energy of the target residual signal and an energy of the estimate residual echo signal, the first terminal calculates a prior signal-to-echo ratio by using a decision directed (DD) method, where the prior signal-to-echo ratio is the energy ratio in dB between an energy of calculated signal and an energy of the echo, which then being taken the logarithm and multiply by 10. For example, based on the target residual signal and the estimate residual echo signal, the first terminal may estimate, for example, a near-end speech or a combination of near-end speech and background noise, thereby the prior signal-to-echo ratio may be determined based on the estimate residual echo signal in combination with the estimated near-end speech or the estimated combination of near-end speech and background noise. The concept of the signal-to-echo ratio is similar to signal-to- noise ratio. After the prior signal-to-echo ratio is calculated, an initial gain calculation is performed and gain enabling processing is performed by using a Wiener filter to perform an initial residual echo suppression to the target residual signal.

[0166] In this embodiment, after the first terminal performs a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal, the following steps are included.

[0167] S81, The first terminal generates a scenario identification information according to the reference signal, the picked-up signal and the target residual signal, where the scenario identification information comprises at least one of a magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path.

[0168] In this embodiment, the echo reverberation is mainly a hangover of an echo direct sound. The echo direct sound is an element contained in the pick-up signal. In contrast, the hangover is an element not contained in the pick-up signal, which purely is a reverberation generated by near-end acoustic environment reflection. The distortion extent is used to judge the distinguish between a curve of an echo amplitude frequency response and a curve of the pick-up signal. If there is no distortion, the above two curves are parallel to each other. The change of an acoustic path refers to a process of changing acoustic path from the pick-up signal to microphone due to a block of microphone, a block of speaker or speech reflection.

[0169] S82, The first terminal performs dynamical adjustment to the estimate residual echo signal according to the scenario identification information.

[0170] In this embodiment, a scenario identification processing method is added. The scenario identification is used to determine at least one of a magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path according to a reference signal, a picked-up signal, and a hybrid filter output result, and the estimate residual echo signal is dynamically adjusted according to these pieces of information, to implement intelligent characteristics of ensuring smooth double-talking in an ideal scenario and ensuring acoustic echo cancellation in a severe scenario.

[0171] The present application provides a hybrid adaptive filter based on Kalman filtering and variable-step NLMS filtering for providing robust adaptive filtering capability. The convergence performance of the Kalman adaptive filter in double-talking is equivalent to that in single-talking, to make up for the disadvantage of the conventional variable-step NLMS filter, and use of a combination of the variable-step NLMS filter and the Kalman filter to output a filter output result may bring respective advantages of the two filters into play, so that divergent frequency bins are less, and the filtering capability is more powerful. Moreover, the present application provides an echo estimation method based on harmonic generation and echo smoothing for suppressing nonlinear echo and an echo reverberation hangover. Use of harmonic generation may make up for, to an extent, frequency spectrum inconsistency between a reference signal and a picked-up signal caused by nonlinear distortion, so that the nonlinear echo is suppressed. Moreover, use of smoothing may make an echo hangover of a reverberation scenario be estimated in the energy spectrum of the picked-up signal, and multiple times of nonlinear suppression may suppress this type of echo reverberation signal.

[0172] Further the present application provides method for multiple times of nonlinear suppression comprising a double-talking harmonic enhancement method based on pitch signal detection and a cepstral smoothing gain calculation method, where a double-talking near-end speech protection capability is powerful, and echo suppression residuals are less. Pitch signal detection on a residual signal is actually implicit single-talking/double-talking determining, harmonic enhancement is performed in double-talking, and an impaired near-end speech may be partially restored, to improve a subjective audition effect in double-talking. Cepstral smoothing may further suppress a residual echo, to further reduce an echo residual.

[0173] FIG. 3 is a flow chart of a method for acoustic echo cancellation in a scenario according to an embodiment of the present application, as shown in FIG. 3, the method comprises:

[0174] S301, the first terminal acquires a reference signal and a picked-up signal, where the reference signal is a speech signal transmitted from the second terminal to the first terminal and the picked-up signal is obtained by the first terminal through conducting speech pick up from the speech signal played by the first terminal;

[0175] S302, the first terminal performs an adaptive filtering to the reference signal and the picked-up signal through Kalman filter and variable-step normalized least mean square (NLMS) filter, simultaneously, to obtain a first residual signal and a second residual signal, respectively;

[0176] S303, the first terminal performs a hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal;

[0177] S304, the first terminal performs a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal, where the S304 specifically comprises:

[0178] The first terminal determines the estimate echo signal estimated by the Kalman filter and the variable-step NLMS filter together; the first terminal calculates a first echo power spectrum of the estimate echo signal; the first terminal performs a harmonic generation processing to the estimate echo signal, to obtain a power spectrum after harmonic generation; the first terminal performs a frequency spectrum splicing to the power spectrum after harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum; the first terminal performs a smoothing processing to the second echo power spectrum, to obtain a third echo power spectrum; the first terminal selects, on each of the multiple frequency bins, one whose energy of frequency bin is bigger between the third echo power spectrum and the second echo power spectrum, to obtain the estimate residual echo signal;

[0179] S305, the first terminal performs multiple residual echo suppressions to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression.

[0180] In this embodiment, the foregoing method for acoustic echo cancellation provides a combination use of a Kalman adaptive filter and a variable-step NLMS filter to achieve robust adaptive filtering capability. Specifically, convergence performance of the Kalman adaptive filter in double-talking is equivalent to that in single-talking, to make up for the disadvantage of the conventional variable-step NLMS filter, and use of a combination of the variable-step NLMS filter and the Kalman filter to output a filter output result can bring respective advantages of the two filters into play, so that divergent frequency bins are less, and the filtering capability is more powerful. Moreover, the foregoing method for acoustic echo cancellation uses harmonic generation, which may make up for, to an extent, frequency spectrum inconsistency between a reference signal and a picked-up signal caused by nonlinear distortion, so that the nonlinear echo is suppressed.

[0181] FIG. 4 is a flow chart of a residual echo suppression gain calculation in a scenario according to an embodiment of the present application, as shown in FIG. 4, a residual echo suppression gain calculation comprises:

[0182] S401, the first terminal determines an energy of the target residual signal and an energy of the estimate residual echo signal;

[0183] S402, the first terminal performs an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression, where the initial gain calculation is performed according to a prior signal-to-echo ratio and the prior signal-to-echo ratio is determined according to the target residual signal and the estimate residual echo signal;

[0184] S403, the first terminal performs a pitch signal detection to the signal after initial residual echo suppression, where the pitch signal detection can be performed by comparing energy of the signal after initial residual echo suppression with a preset threshold, and if the energy of the signal after initial residual echo suppression is larger than a preset pitch signal detection threshold, the pitch signal is detected and if the energy of the signal after initial residual echo suppression is less than a preset pitch signal detection threshold, the pitch signal is not detected;

[0185] S404, the first terminal performs a harmonic enhancement to the signal after initial residual echo suppression, when the pitch signal is detected in the signal after initial residual echo suppression;

[0186] S405, the first terminal performs a secondary residual echo suppression to the signal after harmonic enhancement through a secondary gain calculation, to obtain a signal after secondary residual echo suppression;

[0187] S406, the first terminal performs a cepstral smoothing processing to the signal after secondary residual echo suppression, to obtain a signal after cepstral smoothing, when the pitch signal is not detected in the signal after secondary residual echo suppression; [0188] S407, the first terminal performs a final residual echo suppression to the signal after smoothing through a final gain calculation, to obtain a signal after echo suppression.

[0189] In this embodiment, in the foregoing method for multiple times of nonlinear suppression, it may comprises a double-talking harmonic enhancement method based on pitch signal detection and a cepstral smoothing gain calculation method, such that a double-talking near-end speech protection capability is powerful, and echo suppression residuals are less. Pitch signal detection on a residual signal is actually implicit single-talking/double-talking determining, harmonic enhancement is performed in double-talking, and an impaired near-end speech may be partially restored, to improve a subjective audition effect in double-talking. The cepstral smoothing may further suppress a residual echo, to further reduce an echo residual.

[0190] FIG. 5 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application. The embodiment shown by FIG. 5 mainly differs from the embodiments shown by FIG. 3 and FIG. 4 in that a scenario identification module is added. The scenario identification module may identify whether the echo may be generated from an ideal scenario or a severe scenario. The first terminal determines magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path according to a reference signal, a picked-up signal, and a hybrid filter output result, and an estimated value of a residual echo is dynamically adjusted according to these pieces of information, to implement intelligent characteristics of ensuring smooth double- talking in an ideal scenario and ensuring acoustic echo cancellation in a severe scenario.

[0191] FIG. 6 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application. The embodiment shown by FIG. 6 mainly differs from the embodiments shown by FIG. 3 and FIG. 4 in that echo suppression processing is performed by using only one time of gain calculation and enabling. The embodiment is the most simplified processing in the present application, which is mainly used to reduce calculation overheads, and is applicable to a scenario in which an echo is relatively fixed, for example, a headset mode or handheld mode of a mobile phone.

[0192] FIG. 7 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application. The embodiment shown by FIG. 7 mainly differs from the embodiments shown by FIG. 3 and FIG. 4 in that echo suppression processing is performed by using only one time of gain calculation and enabling and a scenario identification module is added. The embodiment is applicable to a scenario in which distortion of an acoustic device is small but an echo path changes much.

[0193] FIG. 8 is a flow chart of a method for acoustic echo cancellation in another scenario according to an embodiment of the present application. The embodiment shown by FIG. 8 mainly differs from the embodiments shown by FIG. 3 and FIG. 4 in that an additional Kalman filter is added after the hybrid filtering processing. The embodiment is applicable to a scenario in which the front-end enhancement of the speech recognition is needed.

[0194] In the technical solution provided by the embodiments of the present application, hybrid adaptive filtering has an extremely high stability, a high convergence speed, and accurate residual echo estimation. Therefore, the present application may be used in both a device and an application that need acoustic echo cancellation, for example, products and applications such as a notebook computer, a tablet computer call, a video conference system, speech recognition and front-end enhancement.

[0195] FIG. 9A is a schematic diagram of a first terminal according to an embodiment of the present application, as shown in FIG. 9A, a first terminal specifically is a first terminal 900, where the first terminal 900 conducts speech communications with a second terminal, including: a signal acquiring module 901, an adaptive filtering module 902, a hybrid filtering module 903, a residual echo estimation module 904 and a residual echo suppression module 905.

[0196] The signal acquiring module 901 is configured to pick up a signal comprising an echo signal caused by playing a reference signal by the first terminal, where the reference signal is a speech signal received by the first terminal from the second terminal.

[0197] The adaptive filtering module 902 is configured to perform an adaptive filtering to the reference signal and the picked-up signal by using a Kalman filter and a variable-step normalized least mean square (NTMS) filter, to obtain a first residual signal and a second residual signal, respectively.

[0198] The hybrid filtering module 903 is configured to perform a hybrid filtering processing to the first residual signal and the second residual signal, to obtain a target residual signal.

[0199] The residual echo estimation module 904 is configured to perform a residual echo estimation according to an estimate echo signal, to obtain an estimate residual echo signal.

[0200] The residual echo suppression module 905 is configured to perform a residual echo suppression to the target residual signal according to the estimate residual echo signal, to output a signal after echo suppression.

[0201] FIG. 9B is a schematic diagram of an adaptive filtering module according to an embodiment of the present application, as shown in FIG. 9B, the adaptive filtering module 902 further includes a variable-step NLMS filtering module 9021, a residual signal generation module 9022, a step adjusting module 9023, a coefficient updating module 9024.

[0202] The variable-step NLMS filtering module 9021 is configured to perform the adaptive filtering to the reference signal and the picked-up signal according to a first variable- step NLMS filter coefficient, to obtain the second residual signal.

[0203] The residual signal generation module 9022 is configured to determine a smoothing energy of the reference signal and a smoothing energy of the second residual signal and a low- speed smoothing energy of the second residual signal, and determine a frequency bin, at which the smoothing energy of the second residual signal is greater than the low-speed smoothing energy of the second residual signal, and perform the frequency bin constraint on the determined frequency bin to generate an third residual signal.

[0204] The step adjusting module 9023 is configured to adjust a filter step according to the third residual signal and a preset threshold, to obtain an adjusted filter step.

[0205] The coefficient updating module 9024 is configured to determine a second variable- step NLMS filter coefficient according to the smoothing energy of the reference signal, the smoothing energy of the second residual signal, and the adjusted filter step and update the variable-step NLMS filter according to the second variable-step NLMS filter coefficient.

[0206] FIG. 9C is a schematic diagram of a hybrid filtering module according to an embodiment of the present application, as shown in FIG. 9C, the hybrid filtering module 903 further includes: a first energy determining module 9031 and a first signal selecting module 9032.

[0207] The first energy determining module 9031 is configured to determine an energy of the first residual signal and an energy of the second residual signal, on multiple frequency bins, respectively. [0208] The first signal selecting module 9032 is configure to select, on each of the multiple frequency bins, a residual signal whose energy is smaller between the first residual signal and the second residual signal, to obtain the target residual signal.

[0209] FIG. 9D is a schematic diagram of a residual echo estimation module according to an embodiment of the present application, as shown in FIG. 9D, the residual echo estimation module 904 further includes: an estimate echo signal determining module 9041, a power spectrum calculating module 9042, a harmonic generation processing module 9043, a frequency spectrum splicing module 9044, a smoothing processing module 9045 and a second signal selecting module 9046.

[0210] The estimate echo signal determining module 9041 is configured to determine the estimate echo signal using the Kalman filter and the variable-step NLMS filter together.

[0211] The power spectrum calculating module 9042 is configured to determine a first echo power spectrum of the estimate echo signal.

[0212] The harmonic generation processing module 9043 is configured to perform a harmonic generation processing to the estimate echo signal, to obtain a power spectrum after harmonic generation.

[0213] The frequency spectrum splicing module 9044 is configured to perform a frequency spectrum splicing to the power spectrum after harmonic generation and the first echo power spectrum, to obtain a second echo power spectrum.

[0214] The smoothing processing module 9045 is configured to perform a smoothing processing to the second echo power spectrum, to obtain a third echo power spectrum.

[0215] The second signal selecting module 9046 is configured to select, on each of the multiple frequency bins, an echo power spectrum whose energy of frequency bin is bigger between the third echo power spectrum and the second echo power spectrum, to obtain the estimate residual echo signal.

[0216] FIG. 9E is a schematic diagram of a residual echo suppression module according to an embodiment of the present application, as shown in FIG. 9E, the residual echo suppression module 905 further includes: a second energy determining module 9051, an initial residual echo suppression module 9052, a pitch signal detecting module 9053, a harmonic enhancement module 9054, a secondary residual echo suppression module 9055, a cepstral smoothing module 9056 and a final residual echo suppression module 9057.

[0217] The second energy determining module 9051 is configured to determine an energy of the target residual signal and an energy of the estimate residual echo signal.

[0218] The initial residual echo suppression module 9052 is configured to perform an initial residual echo suppression to the target residual signal according to the estimate residual echo signal through an initial gain calculation, to obtain a signal after initial residual echo suppression.

[0219] The pitch signal detecting module 9053 is configured to perform a pitch signal detection to the signal after initial residual echo suppression.

[0220] The harmonic enhancement module 9054 is configured to perform a harmonic enhancement to the signal after initial residual echo suppression to obtain a signal after harmonic enhancement, when the pitch signal is detected in the signal after initial residual echo suppression.

[0221] The secondary residual echo suppression module 9055 is configured to perform a secondary residual echo suppression to the signal after harmonic enhancement through a secondary gain calculation, to obtain a signal after secondary residual echo suppression.

[0222] The cepstral smoothing module 9056 is configured to perform a cepstral smoothing processing to the signal after secondary residual echo suppression, to obtain a signal after cepstral smoothing, when the pitch signal is not detected in the signal after secondary residual echo suppression, where the cepstral smoothing module 9056 further configured to perform a cepstral smoothing processing to the signal after initial residual echo suppression, to obtain a signal after cepstral smoothing.

[0223] The final residual echo suppression module 9057 is configured to perform a final residual echo suppression to the signal after smoothing through a final gain calculation, to obtain a signal after echo suppression, where the final residual echo suppression module 9057 further configured to perform a final residual echo suppression to the signal after smoothing through a final gain calculation, to obtain a signal after echo suppression.

[0224] FIG. 9F is a schematic diagram of initial echo suppression module according to an embodiment of the present application, as shown in FIG. 9F, the initial echo suppression module 9052 includes: a signal-to-echo ratio determining module 90521 and an initial gain calculating module 90522.

[0225] The signal-to-echo ratio determining module 90521, configured to determine a prior signal-to-echo ratio according to the target residual signal and the estimate residual echo signal.

[0226] The initial gain calculating module 90522, configured to perform the initial gain calculation according to the prior signal-to-echo ratio.

[0227] FIG. 9G is a schematic diagram of another first terminal according to an embodiment of the present application, as shown in FIG. 9G, the first terminal 900 further comprises a scenario identification module 906.

[0228] The scenario identification module 906 is configured to generate a scenario identification information according to the reference signal, the picked-up signal and the target residual signal, where the scenario identification information comprises at least one of a magnitude of an echo reverberation, a distortion extent of an acoustic device, and a change of an acoustic path.

[0229] The echo estimation module 904 is further configured to after the scenario identification module generate a scenario identification information, perform a dynamical adjustment to the estimate residual echo signal according to the scenario identification information.

[0230] FIG. 10 is a schematic diagram of an acoustic echo cancellation apparatus applying a method for acoustic echo cancellation according to an embodiment of the present application.

[0231] The acoustic echo cancellation apparatus includes various components connected via a bus 10400, such as a processor 10300, a memory 10100 and a transceiver 10200. The memory 10100 may store data 10110 and instructions 10120. The processor 10300 may implement the method disclosed in the present invention by executing the instructions 10120 and using the data 10110. The transceiver 10200 includes a transmitter 10210 and a receiver 10220, so that signals can be transmitted and received from the acoustic echo cancellation apparatus.

[0232] The steps of the method described herein can be embodied directly in hardware, in software executed by a processor, or in a combination of the two, and the software can be located in a computer-readable storage medium. Therefore, the essence of the technical solution of the present invention, or its contribution to the prior art, or all or a part of the technical solutions, may be embodied in a software product. The computer software product may be stored in a computer-readable storage medium and incorporates several instructions for instructing a computer device (for example, personal computer, server, or network device) to execute all or part of the steps of the method specified in any embodiment of the present invention. Examples of the computer-readable storage medium include various media capable of storing program codes, such as a USB flash disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk.

[0233] The previous description of the specific embodiments is provided to enable any person skilled in the art to implement or use the present invention. However, various modifications within a general principle of the present invention to embodiments of the present invention also fall within the protection scope of the present invention.