Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MULTI-MICROPHONE NOISE REDUCTION METHOD, APPARATUS AND TERMINAL DEVICE
Document Type and Number:
WIPO Patent Application WO/2019/112468
Kind Code:
A1
Abstract:
Disclosed are a multi-microphone noise reduction method, an apparatus and a terminal device. The method includes: performing harmonic detection on a primary microphone signal to obtain frequency bin VAD flag information; controlling, according to the frequency bin VAD flag information, a Kalman filter to filter out a target speech signal from a secondary microphone signal, to obtain a secondary microphone noise signal; mapping the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal; and calculating a noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and outputting a noise-reduced primary speech signal. The method has good robustness to location changes, various noises and application scenarios, and can be applied to both the handheld and hands-free modes.

Inventors:
SARANA, Dmitry Vladimirovich (Huawei Administration Building, Bantian Longgang District, Shenzhe, Guangdong 9, 518129, CN)
FAN, Fan (Huawei Administration Building, Bantian Longgang District, Shenzhe, Guangdong 9, 518129, CN)
VASILYEV, Vladislav Igorevich (Huawei Administration Building, Bantian Longgang District, Shenzhe, Guangdong 9, 518129, CN)
Application Number:
RU2017/000926
Publication Date:
June 13, 2019
Filing Date:
December 08, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECHNOLOGIES CO., LTD. (Huawei Administration Building, Bantian Longgang District, Shenzhe, Guangdong 9, 518129, CN)
SARANA, Dmitry Vladimirovich (Huawei Administration Building, Bantian Longgang District, Shenzhe, Guangdong 9, 518129, CN)
International Classes:
G10L21/0208; G10L21/0232; G10L21/0216
Foreign References:
US20130332156A12013-12-12
Other References:
VANDEN BERGHE JEFF ET AL: "An adaptive noise canceller for hearing aids using two nearby microphones", THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, AMERICAN INSTITUTE OF PHYSICS FOR THE ACOUSTICAL SOCIETY OF AMERICA, NEW YORK, NY, US, vol. 103, no. 6, 1 June 1998 (1998-06-01), pages 3621 - 3626, XP012000334, ISSN: 0001-4966, DOI: 10.1121/1.423066
TANAN SUBHASH C ET AL: "Acoustic echo and noise cancellation using Kalman filter in a modified GSC framework", 2014 48TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, IEEE, 2 November 2014 (2014-11-02), pages 477 - 481, XP032769350, DOI: 10.1109/ACSSC.2014.7094489
None
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY & PARTNERS" LTD. et al. (MITS Alexander Vladimirovich, POPOVA Elizaveta Vitalievna et al.B. Spasskaya str., 25, bldg., Moscow 0, 129090, RU)
Download PDF:
Claims:
CLAIMS

1. A multi-microphone noise reduction method, comprising:

performing (201) harmonic detection on a primary microphone signal to obtain frequency bin voice activity detection, VAD, flag information;

controlling (202), according to the frequency bin VAD flag information, a Kalman filter to filter out a target speech signal from a secondary microphone signal, to obtain a secondary microphone noise signal;

mapping (203) the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal; and

calculating (204) a noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and outputting a noise- reduced primary speech signal.

2. The method according to claim 1 , wherein the performing (201) harmonic detection on the primary microphone signal to obtain the frequency bin VAD flag information comprises:

training a harmonic model according to a speech database, wherein the harmonic model is used for detecting a speech harmonic characteristic in a cepstrum domain;

obtaining speech state information of the primary microphone signal using the harmonic model and a state transfer probability matrix, wherein the speech state information comprises a voiced state, an unvoiced state or a speechless state corresponding to each frequency bin;

calculating a cepstral excitation vector according to the speech state information; and performing harmonic selection on the primary microphone signal according to the cepstral excitation vector and the harmonic model to determine whether the speech harmonic wave exists in the primary microphone signal, and outputting the frequency bin VAD flag information, wherein the frequency bin VAD flag information is a Boolean value for indicating whether a speech harmonic wave exists in the primary microphone signal.

3. The method according to claim 1 or 2, wherein the controlling (202), according to the frequency bin VAD flag information, the Kalman filter to filter out the target speech signal from the secondary microphone signal, to obtain the secondary microphone noise signal comprises: obtaining a residual signal by using the primary microphone signal as a reference signal to adaptively remove the target speech signal in the secondary microphone signal using the Kalman filter, wherein the residual signal is the secondary microphone noise signal;

calculating a covariance matrix of the residual signal according to a covariance matrix of a filter coefficient error;

calculating a Kalman gain according to the covariance matrix of the residual signal;

determining, according to the frequency bin VAD flag information, whether the Kalman filter needs to be updated;

updating a filter coefficient according to the Kalman gain when the Kalman filter needs to be updated; and

updating the covariance matrix of the filter coefficient error according to the updated filter coefficient.

4. The method according to any one of claims 1 to 3, wherein after the controlling (202), according to the frequency bin VAD flag information, the Kalman filter to filter out the target speech signal from the secondary microphone signal, to obtain the secondary microphone noise signal, the method further comprises:

performing harmonic detection on the secondary microphone noise signal; and

accelerating updating of the Kalman filter when a speech harmonic wave exists in the secondary microphone noise signal.

5. The method according to any one of claims 1 to 4, wherein the mapping (203) the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain the primary microphone noise spectrum of the primary microphone signal comprises: calculating a prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal;

calculating a dynamic compensation coefficient of the primary microphone signal according to the primary microphone signal, the prior global speechless probability of the primary microphone signal and the secondary microphone noise signal; and

calculating the primary microphone noise spectrum of the primary microphone signal according to the dynamic compensation coefficient of the primary microphone signal and the secondary microphone noise signal.

6. The method according to claim 5, wherein the calculating the prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal comprises:

calculating a coherence function of a noise of a scattered field according to a distance between a primary microphone and a secondary microphone;

calculating a complex coherence function of the primary microphone signal and the secondary microphone signal;

calculating an incident angle parameter of the primary microphone signal according to the coherence function of the noise of the scattered field and the complex coherence function of the primary microphone signal;

calculating a complex coherence coefficient according to the incident angle parameter;

calculating a prior speechless probability according to the incident angle parameter and the complex coherence coefficient; and

performing smoothing on the prior speechless probability in a time-frequency domain, to obtain the prior global speechless probability of the primary microphone signal.

7. The method according to any one of claims 1 to 6, wherein the calculating (204) the noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal comprises:

obtaining a single-microphone noise spectrum of the primary microphone signal;

obtaining a total noise spectrum of the primary microphone signal according to the primary microphone noise spectrum of the primary microphone signal and the single -microphone noise spectrum of the primary microphone signal; and

calculating the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal.

8. The method according to claim 7, wherein the calculating the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal comprises:

calculating a prior signal-to-noise ratio, SNR, of the primary microphone signal according to the primary microphone signal and the total noise spectrum of the primary microphone signal;

calculating an initial gain for the primary microphone signal according to the prior SNR of the primary microphone signal to obtain an initial gain result;

performing harmonic enhancement on the primary microphone signal according to the initial gain result to obtain a harmonic enhanced primary microphone signal;

calculating a secondary gain for the harmonic enhanced primary microphone signal to obtain a secondary gain result;

performing cepstrum smoothing on the secondary gain result to obtain a cepstrum smoothed primary microphone signal;

performing harmonic replacement on the cepstrum smoothed primary microphone signal when an amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold, to obtain a harmonic replaced primary microphone signal;

performing inverse transformation to a frequency domain on the harmonic replaced primary microphone signal, to obtain a smoothed SNR; and

calculating the noise reduction gain for the primary microphone signal according to the smoothed SNR, and outputting the noise-reduced primary speech signal.

9. The method according to claim 8, wherein before the performing cepstrum smoothing on the secondary gain result to obtain the cepstrum smoothed primary microphone signal, the method further comprises:

performing harmonic selection according to the primary microphone noise spectrum of the primary microphone signal and pitch information of the primary microphone signal, to obtain a harmonic selection result;

determining, according to the harmonic selection result, whether a speech harmonic wave exists in the secondary gain result; and

setting the pitch information which needs to be detected during the cepstrum smoothing to 0 when there is no speech harmonic wave in the secondary gain result.

10. The method according to any one of claims 7 to 9, wherein the obtaining the single microphone noise spectrum of the primary microphone signal comprises:

calculating a posterior global SNR of the primary microphone signal using global smoothing, and a posterior local SNR of the primary microphone signal using local smoothing; calculating a speech occurrence probability according to the posterior global SNR, the posterior local SNR and pitch information of the primary microphone signal; and

estimating the single -microphone noise spectrum of the primary microphone signal according to the speech occurrence probability.

11. A multi-microphone noise reduction apparatus (1100), comprising:

a first harmonic detection module (1101), configured to perform harmonic detection on a primary microphone signal to obtain frequency bin voice activity detection, VAD, flag information; a filter control module (1102), configured to control, according to the frequency bin VAD flag information, a Kalman filter to filter out a target speech signal from a secondary microphone signal, to obtain a secondary microphone noise signal;

a mapping module (1103), configured to map the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal; and

a gain calculating module (1104), configured to calculate a noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and

an outputting module (1105), configured to output a noise-reduced primary speech signal.

12. The apparatus according to claim 11, wherein the harmonic detection module (1101) comprises:

a training unit (11011), configured to train the harmonic model according to a speech database, wherein the harmonic model is used for detecting a speech harmonic characteristic in a cepstrum domain;

a first obtaining unit (11012), configured to obtain speech state information of the primary microphone signal using the harmonic model and the state transfer probability matrix, wherein the speech state information comprises a voiced state, an unvoiced state or a speechless state corresponding to each frequency bin;

a first calculating unit (11013), configured to calculate a cepstral excitation vector according to the speech state information; and

a harmonic selection unit (11014), configured to perform harmonic selection on the primary microphone signal according to the cepstral excitation vector and the harmonic model to determine whether the speech harmonic wave exists in the primary microphone signal, and outputting the frequency bin VAD flag information, wherein the frequency bin VAD flag information is a Boolean value for indicating whether a speech harmonic wave exists in the primary microphone signal.

13. The apparatus according to claim 11 or 12, wherein the filter control module (1102) comprises:

a filtering unit (11021), configured to obtain a residual signal by using the primary microphone signal as a reference signal to adaptively remove the target speech signal in the secondary microphone signal using the Kalman filter, wherein the residual signal is the secondary microphone noise signal; a second calculating unit (11022), configured to calculate a covariance matrix of the residual signal according to a covariance matrix of a filter coefficient error; and calculate a Kalman gain according to the covariance matrix of the residual signal;

a determining unit (11023), configured to determine, according to the frequency bin VAD flag information, whether the Kalman filter needs to be updated; and

an updating unit (11024), configured to update a filter coefficient according to the Kalman gain when the Kalman filter needs to be updated; and update the covariance matrix of the filter coefficient error according to the updated filter coefficient.

14. The apparatus according to any one of claims 11 to 13, further comprises:

a second harmonic detection module (1106), configured to perform harmonic detection on the secondary microphone noise signal; and

an accelerating module (1107), configured to accelerate updating of the Kalman filter when a speech harmonic wave exists in the secondary microphone noise signal.

15. The apparatus according to any one of claims 11 to 14, wherein the mapping module (1103) comprises:

a third calculating unit (11031), configured to calculate a prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal; calculate a dynamic compensation coefficient of the primary microphone signal according to the primary microphone signal, the prior global speechless probability of the primary microphone signal and the secondary microphone noise signal; and calculate the primary microphone noise spectrum of the primary microphone signal according to the dynamic compensation coefficient of the primary microphone signal and the secondary microphone noise signal.

16. The apparatus according to claim 15, wherein the third calculating unit (11031) is specifically configured to:

calculate a coherence function of a noise of a scattered field according to a distance between a primary microphone and a secondary microphone;

calculate a complex coherence function of the primary microphone signal and the secondary microphone signal;

calculate an incident angle parameter of the primary microphone signal according to the coherence function of the noise of the scattered field and the complex coherence function of the primary microphone signal;

calculate a complex coherence coefficient according to the incident angle parameter;

calculate a prior speechless probability according to the incident angle parameter and the complex coherence coefficient; and

perform smoothing on the prior speechless probability in a time-frequency domain, to obtain the prior global speechless probability of the primary microphone signal.

17. The apparatus according to any one of claims 11 to 16, wherein the gain calculating module (1104) comprises:

a second obtaining unit (11041), configured to obtain a single-microphone noise spectrum of the primary microphone signal; and obtain a total noise spectrum of the primary microphone signal according to the primary microphone noise spectrum of the primary microphone signal and the single microphone noise spectrum of the primary microphone signal; and

a fourth calculating unit (11042), configured to calculate the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal.

18. The apparatus according to claim 17, wherein the fourth calculating unit (11042) is specifically configured to:

calculate a prior signal-to-noise ratio, SNR, of the primary microphone signal according to the primary microphone signal and the total noise spectrum of the primary microphone signal;

calculate an initial gain for the primary microphone signal according to the prior SNR of the primary microphone signal to obtain an initial gain result;

perform harmonic enhancement on the primary microphone signal according to the initial gain result to obtain a harmonic enhanced primary microphone signal; calculate a secondary gain for the harmonic enhanced primary microphone signal to obtain a secondary gain result;

perform cepstrum smoothing on the secondary gain result to obtain a cepstrum smoothed primary microphone signal;

perform harmonic replacement on the cepstrum smoothed primary microphone signal when an amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold, to obtain a harmonic replaced primary microphone signal;

perform inverse transformation to a frequency domain on the harmonic replaced primary microphone signal, to obtain a smoothed SNR; and

calculate the noise reduction gain for the primary microphone signal according to the smoothed SNR.

19. The apparatus according to claim 18, wherein the fourth calculating unit (11042) is further configured to:

perform harmonic selection according to the primary microphone noise spectrum of the primary microphone signal and pitch information of the primary microphone signal, to obtain a harmonic selection result;

determine, according to the harmonic selection result, whether a speech harmonic wave exists in the secondary gain result; and

set the pitch information which needs to be detected during the cepstrum smoothing to 0 when there is no speech harmonic wave in the secondary gain result.

20. The apparatus according to any one of claims 17 to 19, wherein the second obtaining unit (11041) is specifically configured to:

calculate a posterior global SNR of the primary microphone signal using global smoothing, and a posterior local SNR of the primary microphone signal using local smoothing;

calculate a speech occurrence probability according to the posterior global SNR, the posterior local SNR and pitch information of the primary microphone signal; and

estimate the single-microphone noise spectrum of the primary microphone signal according to the speech occurrence probability.

21. A terminal device (1700), comprising: a transmitter (1701), a receiver (1702), a processor (1703), a memory (1704), a primary microphone (1705) and a secondary microphone (1706), wherein the memory (1704) stores program instructions that when executed by the processor (1703) cause the processor (1703) to perform the method according to any one of claims 1-10.

Description:
MULTI-MICROPHONE NOISE REDUCTION METHOD, APPARATUS

AND TERMINAL DEVICE

TECHNICAL FIELD

[0001] The present application relates to the field of communication technologies, and particularly to a multi-microphone noise reduction method, an apparatus and a terminal device.

BACKGROUND

[0002] When a mobile phone operates in a handheld or hands-free mode, an uplink speech is inevitably to be interfered by various noises because of environment complexity. Common noises are divided into a scattered noise and a coherent noise from the perspective of sound field distribution, and are divided into a stationary noise, a non- stationary noise, and a transient noise from the perspective of noise stationarity. These noises and interferences are prone to affecting a target signal, and aural comfortableness and speech intelligibility of a collected speech are severely reduced. Therefore, noise suppression processing needs to be performed on the uplink speech.

[0003] Conventional noise suppression algorithms are generally divided into single-microphone noise reduction algorithms and multi-microphone noise reduction algorithms according to the quantity of microphones in the device. Because spatial information of signals cannot be obtained, the single-microphone noise reduction algorithms have a quite limited capability of suppressing non stationary noises and transient noises. In the multi-microphone noise reduction algorithms, noise reduction is performed mainly by using the spatial characteristic and time-frequency domain characteristic of signals, and therefore the multi-microphone noise reduction algorithms are superior to the single-microphone noise reduction algorithms with regard to suppression on non-stationary noises.

[0004] Currently, in a smartphone, dual-microphone noise reduction methods are adopted for the handheld mode; while for the hands-free mode, single-microphone noise reduction methods are still used as a noise suppression solution for an overwhelming majority of mobile phones. Therefore, the quality of the speech and the stationary comfortableness of background noises that are experienced by a peer user are quite different when a mobile phone is in the handheld mode and the hands-free mode, respectively.

[0005] In an existing dual-microphone noise reduction method used when a mobile phone is in the handheld mode, dual-microphone noise reduction is performed by using an energy difference between speeches collected by a bottom microphone and a top microphone of the mobile phone, which is also referred to as interaural level difference (ILD).

[0006] In another existing multi-microphone noise reduction method, a microphone array beamforming technology is used. When a device includes two or more microphones, a beam is formed by using a spatial characteristic of signals to point to the direction of a target speech, filter calculation is performed by using a particular noise field model or an actual noise field model, and a signal after beamforming is obtained by means of filtering output. If a noise needs to be further suppressed, single-microphone noise reduction processing may be performed after the beamforming.

[0007] However, both the ILD-based and beamforming-based noise reduction algorithms are confronted with problems that the algorithm robustness and the applicability to various application scenarios are relatively poor.

SUMMARY

[0008] Accordingly, embodiments of the present application provide a multi-microphone noise reduction method, an apparatus and a terminal device, so as to solve the problem in the prior art that the robustness of existing noise reduction algorithms and the applicability to various application scenarios are relatively poor.

[0009] A first aspect of the present application provides a multi-microphone noise reduction method, including: performing harmonic detection on a primary microphone signal to obtain frequency bin voice activity detection (VAD) flag information; controlling, according to the frequency bin VAD flag information, a Kalman filter to filter out a target speech signal from a secondary microphone signal, to obtain a secondary microphone noise signal; mapping the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal; and calculating a noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and outputting a noise-reduced primary speech signal.

[0010] In a first possible implementation by combining the first aspect, the performing harmonic detection on the primary microphone signal to obtain the frequency bin VAD flag information includes: obtaining the frequency bin YAD flag information using a harmonic model and a state transfer probability matrix, where the harmonic model is used for detecting a speech harmonic characteristic in a cepstrum domain, and the frequency bin VAD flag information is a Boolean value for indicating whether a speech harmonic wave exists in the primary microphone signal.

[0011] In a second possible implementation by combining the first possible implementation of the first aspect, the obtaining the frequency bin VAD flag information using the harmonic model and the state transfer probability matrix includes: training the harmonic model according to a speech database; obtaining speech state information of the primary microphone signal using the harmonic model and the state transfer probability matrix, where the speech state information includes a voiced state, an unvoiced state or a speechless state corresponding to each frequency bin; calculating a cepstral excitation vector according to the speech state information; and performing harmonic selection on the primary microphone signal according to the cepstral excitation vector and the harmonic model to determine whether the speech harmonic wave exists in the primary microphone signal, and outputting the frequency bin VAD flag information.

[0012] By using the above harmonic detection process, since a time-frequency distribution characteristic of the speech and the state transfer probability matrix are used to determine whether a speech exists at a frequency bin utilizing more dimensions, a higher accuracy can be achieved and whether a speech exists at a frequency bin can be detected more accurately.

[0013] In a third possible implementation by combining the first aspect or the first possible implementation of the first aspect, the controlling, according to the frequency bin VAD flag information, the Kalman filter to filter out the target speech signal from the secondary microphone signal, to obtain the secondary microphone noise signal includes: obtaining a residual signal by using the primary microphone signal as a reference signal to adaptively remove the target speech signal in the secondary microphone signal using the Kalman filter, where the residual signal is the secondary microphone noise signal; calculating a covariance matrix of the residual signal according to a covariance matrix of a filter coefficient error; calculating a Kalman gain according to the covariance matrix of the residual signal; determining, according to the frequency bin VAD flag information, whether the Kalman filter needs to be updated; updating a filter coefficient according to the Kalman gain when the Kalman filter needs to be updated; and updating the covariance matrix of the filter coefficient error according to the updated filter coefficient.

[0014] By using the above Kalman adaptive filtering process, only the target speech signal in the secondary microphone signal is removed through filtration, while the secondary microphone noise signal is reserved, so that noise spectrum estimation after dynamic mapping can be more accurate. And the capability of filtering out the target speech can be enhanced since the Kalman filter is updated according to the frequency bin Boolean value obtained through the harmonic detection, and meanwhile the location change of the target speech source can be rapidly tracked, so as to achieve sound pickup in any orientation.

[0015] In a fourth possible implementation by combining the third possible implementation of the first aspect, the determining, according to the frequency bin VAD flag information, whether the Kalman filter needs to be updated includes: determining that the Kalman filter needs to be updated when a value of the frequency bin VAD flag information is 1 ; and/or determining that updating of the Kalman filter needs to be suspended when a value of the frequency bin VAD flag information is 0. It should be noted that the value of the frequency bin VAD flag information, which is used for indicating the presence of a speech signal, is not limited in the present invention.

[0016] In a fifth possible implementation by combining the first aspect or any one of the foregoing possible implementations of the first aspect, after the controlling, according to the frequency bin VAD flag information, the Kalman filter to filter out the target speech signal from the secondary microphone signal, to obtain the secondary microphone noise signal, the method further includes: performing harmonic detection on the secondary microphone noise signal; and accelerating updating of the Kalman filter when a speech harmonic wave exists in the secondary microphone noise signal.

[0017] By using the above filter updating process, the capability of filtering out the target speech signal can be more strong and good robustness to the location change can be achieved.

[0018] In a sixth possible implementation by combining the first aspect or any one of the foregoing possible implementations of the first aspect, the mapping the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain the primary microphone noise spectrum of the primary microphone signal includes: calculating a prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal; calculating a dynamic compensation coefficient of the primary microphone signal according to the primary microphone signal, the prior global speechless probability of the primary microphone signal and the secondary microphone noise signal; and calculating the primary microphone noise spectrum of the primary microphone signal according to the dynamic compensation coefficient of the primary microphone signal and the secondary microphone noise signal.

[0019] By using the above dynamic noise spectrum mapping process, the acoustic transfer function and frequency response difference of the primary and secondary microphones can be dynamically calculated, making the noise spectrum estimation more accurate.

[0020] In a seventh possible implementation by combining the sixth possible implementation of the first aspect, the calculating the prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal includes: calculating a coherence function of a noise of a scattered field according to a distance between a primary microphone and a secondary microphone; calculating a complex coherence function of the primary microphone signal and the secondary microphone signal; calculating an incident angle parameter of the primary microphone signal according to the coherence function of the noise of the scattered field and the complex coherence function of the primary microphone signal; calculating a complex coherence coefficient according to the incident angle parameter; calculating a prior speechless probability according to the incident angle parameter and the complex coherence coefficient; and performing smoothing on the prior speechless probability in a time-frequency domain, to obtain the prior global speechless probability of the primary microphone signal.

[0021] In an eighth possible implementation by combining the first aspect or any one of the foregoing possible implementations of the first aspect, the calculating the noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal includes: obtaining a single-microphone noise spectrum of the primary microphone signal; obtaining a total noise spectrum of the primary microphone signal according to the primary microphone noise spectrum of the primary microphone signal and the single-microphone noise spectrum of the primary microphone signal; and calculating the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal, and outputting the noise- reduced primary speech signal.

[0022] By combining the primary microphone noise spectrum and the single-microphone noise spectrum to obtain the total noise spectrum, the non-stationary noises in the primary microphone signal can be estimated more accurately in real time.

[0023] In a ninth possible implementation by combining the eighth possible implementation of the first aspect, the calculating the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal includes: calculating the noise reduction gain for the primary microphone signal multiple times according to the total noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal.

[0024] In a tenth possible implementation by combining the ninth possible implementation of the first aspect, the calculating the noise reduction gain for the primary microphone signal multiple times according to the total noise spectrum of the primary microphone signal, and outputting the noise- reduced primary speech signal includes: calculating a prior signal-to-noise ratio (SNR) of the primary microphone signal according to the primary microphone signal and the total noise spectrum of the primary microphone signal; calculating an initial gain for the primary microphone signal according to the prior SNR of the primary microphone signal to obtain an initial gain result; performing harmonic enhancement on the primary microphone signal according to the initial gain result to obtain a harmonic enhanced primary microphone signal; calculating a secondary gain for the harmonic enhanced primary microphone signal to obtain a secondary gain result; performing cepstrum smoothing on the secondary gain result to obtain a cepstrum smoothed primary microphone signal; performing harmonic replacement on the cepstrum smoothed primary microphone signal when an amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold, to obtain a harmonic replaced primary microphone signal; performing inverse transformation to a frequency domain on the harmonic replaced primary microphone signal, to obtain a smoothed SNR; and calculating the noise reduction gain for the primary microphone signal according to the smoothed SNR, and outputting the noise -reduced primary speech signal.

[0025] By using the above cepstrum smoothing process, cooperative use of cepstrum smoothing and harmonic detection can be achieved. The pitch calculation result in harmonic detection can be used and combined with the determining based on the cepstrum pitch threshold, and an interference frequency bin can be filtered out by using harmonic selection. Thus, the protection over the target speech signal can be better, and the noise residual can be more stationary.

[0026] In an eleventh possible implementation by combining the tenth possible implementation of the first aspect, before the performing cepstrum smoothing on the secondary gain result to obtain the cepstrum smoothed primary microphone signal, the method further includes: performing harmonic selection according to the primary microphone noise spectrum of the primary microphone signal and pitch information of the primary microphone signal, to obtain a harmonic selection result; determining, according to the harmonic selection result, whether a speech harmonic wave exists in the secondary gain result; and setting the pitch information which needs to be detected during the cepstrum smoothing to 0 when there is no speech harmonic wave in the secondary gain result.

[0027] By setting the value of the pitch to 0 in the case where pitch exists while no harmonic wave exists, the pitch detection error in the cepstrum smoothing when the non-stationary noise remains can be prevented, which can achieve double judgments on the pitch detection and improve the accuracy of the pitch detection.

[0028] In a twelfth possible implementation by combining any one of the eighth to eleventh possible implementations of the first aspect, the obtaining the single-microphone noise spectrum of the primary microphone signal includes: calculating a posterior global SNR of the primary microphone signal using global smoothing, and a posterior local SNR of the primary microphone signal using local smoothing; calculating a speech occurrence probability according to the posterior global SNR, the posterior local SNR and pitch information of the primary microphone signal; and estimating the single-microphone noise spectrum of the primary microphone signal according to the speech occurrence probability.

[0029] By using the above single-microphone noise spectrum estimation algorithm, the noise spectrum can be updated in real time by using the speech occurrence probability, which avoids the selection of a time window, thereby achieving noise tracking in real-time.

[0030] In a thirteenth possible implementation by combining the first aspect or any one of the foregoing possible implementations of the first aspect, when the primary microphone signal is collected in a handheld mode, after the performing harmonic detection on the primary microphone signal to obtain the frequency bin VAD flag information, the method further includes: calculating interaural level difference (ILD) information between a primary microphone and a secondary microphone; and controlling a call angle for the primary microphone signal according to the ILD information and the frequency bin VAD flag information.

[0031] By using the multi-microphone noise reduction method based on ILD information, call angle control can be performed, and the harmonic detection result can be controlled based on a microphone energy ratio, so that whether the filter is updated is precisely controlled at a frame level, thereby controlling degree of noise spectrum estimation.

[0032] A second aspect of the present application provides a multi-microphone noise reduction apparatus, including: a first harmonic detection module, configured to perform harmonic detection on a primary microphone signal to obtain frequency bin voice activity detection (VAD) flag information; a filter control module, configured to control, according to the frequency bin VAD flag information, a Kalman filter to filter out a target speech signal from a secondary microphone signal, to obtain a secondary microphone noise signal; a mapping module, configured to map the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal; and a gain calculating module, configured to calculate a noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and an outputting module, configured to output a noise-reduced primary speech signal.

[0033] In a first possible implementation by combining the second aspect, the first harmonic detection module is specifically configured to obtain the frequency bin VAD flag information using a harmonic model and a state transfer probability matrix, where the harmonic model is used for detecting a speech harmonic characteristic in a cepstrum domain, and the frequency bin VAD flag information is a Boolean value for indicating whether a speech harmonic wave exists in the primary microphone signal.

[0034] In a second possible implementation by combining the first possible implementation of the second aspect, the harmonic detection module includes: a training unit, configured to train the harmonic model according to a speech database; a first obtaining unit, configured to obtain speech state information of the primary microphone signal using the harmonic model and the state transfer probability matrix, where the speech state information includes a voiced state, an unvoiced state or a speechless state corresponding to each frequency bin; a first calculating unit, configured to calculate a cepstral excitation vector according to the speech state information; and a harmonic selection unit, configured to perform harmonic selection on the primary microphone signal according to the cepstral excitation vector and the harmonic model to determine whether the speech harmonic wave exists in the primary microphone signal, and outputting the frequency bin VAD flag information.

[0035] By using the above harmonic detection process, since a time-frequency distribution characteristic of the speech and the state transfer probability matrix are used to determine whether a speech exists at a frequency bin utilizing more dimensions, a higher accuracy can be achieved and whether a speech exists at a frequency bin can be detected more accurately.

[0036] In a third possible implementation by combining the second aspect or the first possible implementation of the second aspect, the filter control module includes: a filtering unit, configured to obtain a residual signal by using the primary microphone signal as a reference signal to adaptively remove the target speech signal in the secondary microphone signal using the Kalman filter, where the residual signal is the secondary microphone noise signal; a second calculating unit, configured to calculate a covariance matrix of the residual signal according to a covariance matrix of a filter coefficient error; and calculate a Kalman gain according to the covariance matrix of the residual signal; a determining unit, configured to determine, according to the frequency bin VAD flag information, whether the Kalman filter needs to be updated; and an updating unit, configured to update a filter coefficient according to the Kalman gain when the Kalman filter needs to be updated; and update the covariance matrix of the filter coefficient error according to the updated filter coefficient.

[0037] By using the above Kalman adaptive filtering process, only the target speech signal in the secondary microphone signal is removed through filtration, while the secondary microphone noise signal is reserved, so that noise spectrum estimation after dynamic mapping can be more accurate. And the capability of filtering out the target speech can be enhanced since the Kalman filter is updated according to the frequency bin Boolean value obtained through the harmonic detection, and meanwhile the location change of the target speech source can be rapidly tracked, so as to achieve sound pickup in any orientation.

[0038] In a fourth possible implementation by combining the third possible implementation of the second aspect, the determining unit is specifically configured to: determine that the Kalman filter needs to be updated when a value of the frequency bin VAD flag information is 1 ; and/or determine that updating of the Kalman filter needs to be suspended when a value of the frequency bin VAD flag information is 0. lt should be noted that the value of the frequency bin VAD flag information, which is used for indicating the presence of a speech signal, is not limited in the present invention.

[0039] In a fifth possible implementation by combining the second aspect or any one of the foregoing possible implementations of the second aspect, the apparatus further includes: a second harmonic detection module, configured to perform harmonic detection on the secondary microphone noise signal; and an accelerating module, configured to accelerate updating of the Kalman filter when a speech harmonic wave exists in the secondary microphone noise signal.

[0040] By using the above filter updating process, the capability of filtering out the target speech signal can be more strong and good robustness to the location change can be achieved.

[0041] In a sixth possible implementation by combining the second aspect or any one of the foregoing possible implementations of the second aspect, the mapping module includes: a third calculating unit, configured to calculate a prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal; calculate a dynamic compensation coefficient of the primary microphone signal according to the primary microphone signal, the prior global speechless probability of the primary microphone signal and the secondary microphone noise signal; and calculate the primary microphone noise spectrum of the primary microphone signal according to the dynamic compensation coefficient of the primary microphone signal and the secondary microphone noise signal.

[0042] By using the above dynamic noise spectrum mapping process, the acoustic transfer function and frequency response difference of the primary and secondary microphones can be dynamically calculated, making the noise spectrum estimation more accurate.

[0043] In a seventh possible implementation by combining the sixth possible implementation of the second aspect, the third calculating unit is specifically configured to: calculate a coherence function of a noise of a scattered field according to a distance between a primary microphone and a secondary microphone; calculate a complex coherence function of the primary microphone signal and the secondary microphone signal; calculate an incident angle parameter of the primary microphone signal according to the coherence function of the noise of the scattered field and the complex coherence function of the primary microphone signal; calculate a complex coherence coefficient according to the incident angle parameter; calculate a prior speechless probability according to the incident angle parameter and the complex coherence coefficient; and perform smoothing on the prior speechless probability in a time-frequency domain, to obtain the prior global speechless probability of the primary microphone signal.

[0044] In an eighth possible implementation by combining the second aspect or any one of the foregoing possible implementations of the second aspect, the gain calculating module includes: a second obtaining unit, configured to obtain a single-microphone noise spectrum of the primary microphone signal; and obtain a total noise spectrum of the primary microphone signal according to the primary microphone noise spectrum of the primary microphone signal and the single-microphone noise spectrum of the primary microphone signal; and a fourth calculating unit, configured to calculate the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal.

[0045] By combining the primary microphone noise spectrum and the single-microphone noise spectrum to obtain the total noise spectrum, the non-stationary noises in the primary microphone signal can be estimated more accurately in real time.

[0046] In a ninth possible implementation by combining the eighth possible implementation of the second aspect, the fourth calculating unit is specifically configured to calculate the noise reduction gain for the primary microphone signal multiple times according to the total noise spectrum of the primary microphone signal.

[0047] In a tenth possible implementation by combining the ninth possible implementation of the second aspect, the fourth calculating unit is specifically configured to: calculate a prior signal -to- noise ratio (SNR) of the primary microphone signal according to the primary microphone signal and the total noise spectrum of the primary microphone signal; calculate an initial gain for the primary microphone signal according to the prior SNR of the primary microphone signal to obtain an initial gain result; perform harmonic enhancement on the primary microphone signal according to the initial gain result to obtain a harmonic enhanced primary microphone signal; calculate a secondary gain for the harmonic enhanced primary microphone signal to obtain a secondary gain result; perform cepstrum smoothing on the secondary gain result to obtain a cepstrum smoothed primary microphone signal; perform harmonic replacement on the cepstrum smoothed primary microphone signal when an amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold, to obtain a harmonic replaced primary microphone signal; perform inverse transformation to a frequency domain on the harmonic replaced primary microphone signal, to obtain a smoothed SNR; and calculate the noise reduction gain for the primary microphone signal according to the smoothed SNR.

[0048] By using the above cepstrum smoothing process, cooperative use of cepstrum smoothing and harmonic detection can be achieved. The pitch calculation result in harmonic detection can be used and combined with the determining based on the cepstrum pitch threshold, and an interference frequency bin can be filtered out by using harmonic selection. Thus, the protection over the target speech signal can be better, and the noise residual can be more stationary.

[0049] In an eleventh possible implementation by combining the tenth possible implementation of the second aspect, the fourth calculating unit is further configured to: perform harmonic selection according to the primary microphone noise spectrum of the primary microphone signal and pitch information of the primary microphone signal, to obtain a harmonic selection result; determine, according to the harmonic selection result, whether a speech harmonic wave exists in the secondary gain result; and set the pitch information which needs to be detected during the cepstrum smoothing to 0 when there is no speech harmonic wave in the secondary gain result.

[0050] By setting the value of the pitch to 0 in the case where pitch exists while no harmonic wave exists, the pitch detection error in the cepstrum smoothing when the non- stationary noise remains can be prevented, which can achieve double judgments on the pitch detection and improve the accuracy of the pitch detection.

[0051] In a twelfth possible implementation by combining any one of the eighth to eleventh possible implementations of the second aspect, the second obtaining unit is specifically configured to: calculate a posterior global SNR of the primary microphone signal using global smoothing, and a posterior local SNR of the primary microphone signal using local smoothing; calculate a speech occurrence probability according to the posterior global SNR, the posterior local SNR and pitch information of the primary microphone signal; and estimate the single-microphone noise spectrum of the primary microphone signal according to the speech occurrence probability.

[0052] By using the above single-microphone noise spectrum estimation algorithm, the noise spectrum can be updated in real time by using the speech occurrence probability, which avoids the selection of a time window, thereby achieving noise tracking in real-time.

[0053] In a thirteenth possible implementation by combining the second aspect or any one of the foregoing possible implementations of the second aspect, the apparatus further includes: an interaural level difference (ILD) calculating module, configured to calculate ILD information between a primary microphone and a secondary microphone; and a call angle control module, configured to control a call angle for the primary microphone signal according to the ILD information and the frequency bin VAD flag information.

[0054] By using the multi-microphone noise reduction method based on ILD information, call angle control can be performed, and the harmonic detection result can be controlled based on a microphone energy ratio, so that whether the filter is updated is precisely controlled at a frame level, thereby controlling degree of noise spectrum estimation.

[0055] A third aspect of the present application provides a terminal device, including: a transmitter, a receiver, a processor, a memory, a primary microphone and a secondary microphone, where the memory stores program instructions that when executed by the processor cause the processor to perform the method according to any one of the above aspects.

[0056] A fourth aspect of the present application provides a computer readable storage medium, including non-transitory computer program instructions that when executed by a processor cause the processor to perform the method according to any one of the above described aspects.

[0057] A fifth aspect of the present application provides a computer program product, including non-transitory computer program instructions that when executed by a processor cause the processor to perform the method according to any one of the above described aspects.

[0058] A sixth aspect of the present application provides a computer program, including program code that when executed by a processor cause the processor to perform the method according to any one of the above described aspects.

[0059] In embodiments of the present application, harmonic detection is performed on the primary microphone signal to obtain the frequency bin VAD flag information; the Kalman filter is controlled according to the frequency bin VAD flag information to filter out the target speech signal from the secondary microphone signal, to obtain the secondary microphone noise signal; then the secondary microphone noise signal is mapped to the primary microphone signal through dynamic noise spectrum mapping, to obtain the primary microphone noise spectrum of the primary microphone signal; and the noise reduction gain is calculated for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and finally a noise-reduced primary speech signal (which is also referred to as the primary speech signal after noise reduction) is outputted. Seen from the above solutions, the multi-microphone noise reduction can be achieved based on the Kalman adaptive filter, which does not need to depend on ILD information and has a strong capability of filtering out the target speech signal. Thus the multi microphone noise reduction method according to the present application has good robustness to location changes, various noises and application scenarios. The method also has an improved speech protection capability, and can be applied to both the handheld mode and the hands-free mode, which achieves close noise reduction effects in the two modes, and improves consistency of subjective experiences of a call during mode switching.

BRIEF DESCRIPTION OF DRAWINGS

[0060] In order to illustrate the technical solutions in embodiments of the present application more clearly, accompanying drawings needed in the embodiments are illustrated briefly as follows. Apparently, the accompanying drawings are merely certain of embodiments of the application, and persons skilled in the art can derive other drawings from them without creative efforts.

[0061] FIG. 1 is a schematic diagram of a speech communication system to which a multimicrophone noise reduction method according to an embodiment of the present application is applied;

[0062] FIG. 2 is a flowchart of a multi-microphone noise reduction method according to an embodiment of the present application;

[0063] FIG. 3 is a schematic diagram of a multi-microphone noise reduction method according to an embodiment of the present application;

[0064] FIG. 4 is a schematic diagram of a state transfer probability;

[0065] FIG. 5 is a schematic diagram of an enhanced minimum mean square error (MMSE) single-microphone noise spectrum estimation algorithm according to an embodiment of the present application;

[0066] FIG. 6 is a schematic diagram of cepstrum smoothing process in a multi-microphone noise reduction method in a specific scenario according to an embodiment of the present application;

[0067] FIG. 7 is a schematic diagram of a multi-microphone noise reduction method in a specific scenario according to an embodiment of the present application;

[0068] FIG. 8 is a schematic diagram of a multi -microphone noise reduction method in another specific scenario according to an embodiment of the present application; [0069] FIG. 9 is a schematic diagram of an application scenario of a multi-microphone noise reduction method based on ILD information according to an embodiment of the present application;

[0070] FIG. 10 is a schematic diagram of a microphone array beamforming technology;

[0071] FIG. 11 is a schematic structural diagram of a multi-microphone noise reduction apparatus according to an embodiment of the present application;

[0072] FIG. 12 is a schematic structural diagram of a first harmonic detection module according to an embodiment of the present application;

[0073] FIG. 13 is a schematic structural diagram of a filter control module according to an embodiment of the present application;

[0074] FIG. 14 is a schematic structural diagram of a multi-microphone noise reduction apparatus according to another embodiment of the present application;

[0075] FIG. 15 is a schematic structural diagram of a mapping module according to an embodiment of the present application;

[0076] FIG. 16 is a schematic structural diagram of a gain calculating module according to an embodiment of the present application; and

[0077] FIG. 17 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

DESCRIPTION OF EMBODIMENTS

[0078] Hereinafter, embodiments of the present application provide a multi-microphone noise reduction method, an apparatus and a terminal device.

[0079] In order to have persons skilled in the art better understand technical solutions of embodiments of the present application, and make the above objects, features, and advantages of embodiments of the present application more comprehensible, the technical solutions of embodiments of the present application are hereinafter described in detail with reference to the accompanying drawings.

[0080] FIG. 1 is a schematic diagram of a speech communication system to which a multi microphone noise reduction method according to an embodiment of the present application is applied. As shown in FIG. 1, the speech communication system can include a first terminal device and a second terminal device, each having at least a primary microphone and a secondary microphone. The terminal device of the present application may be any electronic device that has a dual-microphone or multi-microphone sound pickup capability, such as a mobile phone (or be referred as a“cellular” phone), a computer with a mobile terminal and etc., and the terminal device may also be portable, mini, hand-held, computer built-in or vehicle -mounted mobile devices. The multi-microphone noise reduction method of the present application may be applied to, for example, products and applications such as a notebook computer, a tablet computer call, a video conference system, speech recognition and front-end enhancement, etc., which is not limited in the present application. In embodiments of the present application, a communication connection can be established between the first terminal device and the second terminal device, for example, the first terminal device and the second terminal device can be connected via a wireless network, and the first terminal device and the second terminal device can conduct one-to-one speech communication. In embodiments of the present application, the first terminal device can also maintain a communication connection with the second terminal device and a third terminal device, that is, the first terminal device can perform conference communication with a plurality of terminal devices at the same time, which is not limited herein. The communication mode between the first terminal device and the second terminal device is similar to that of the first terminal device with the plurality of terminal devices, which will be described in combination with various application scenarios in the following embodiments of the present application.

[0081] FIG. 2 is a flowchart of a multi-microphone noise reduction method according to an embodiment of the present application. The method can be executed by a terminal device, such as the first terminal device or the second terminal device shown in FIG. 1. The terminal device can have at least a primary microphone and a secondary microphone, where the primary microphone may also be referred to as a bottom microphone while the secondary microphone may also be referred to as a top microphone. The number of the microphones in the terminal device is not limited in the present application. The terminal device can have a plurality of microphones, for example, the terminal device may have a plurality of bottom microphones and one top microphone, or may have a plurality of bottom microphones and a plurality of top microphones. The method can also be referred to as a dual microphone noise reduction method in the case where one primary microphone and one secondary microphone are used. As shown in FIG. 2, the multi-microphone noise reduction method can include:

[0082] Step 201 : performing harmonic detection on a primary microphone signal to obtain frequency bin VAD flag information.

[0083] In embodiments of the present application, the primary microphone signal can be obtained by performing primary sound pickup on a signal inputted by the primary microphone of the terminal device. After the primary microphone signal is obtained, harmonic detection can be performed on the primary microphone signal, so as to determine whether the primary microphone signal includes a speech signal corresponding to each frequency bin, which can be represented by the frequency bin VAD flag information. In an implementation, the harmonic detection can be realized through a harmonic model and a state transfer probability matrix, where the harmonic model is used for detecting a speech harmonic characteristic in a cepstrum domain. The voice activity detection (VAD) is used for detecting whether there is a speech signal in the current primary microphone signal, i.e., determining the input signal and distinguishing the speech signal from various background noise signals. For example, when the value of the frequency bin VAD flag information (i.e. VAD flag information of a certain frequency bin) is 1 , it is indicated that the primary microphone signal includes a speech signal corresponding to this frequency bin; when the value of the frequency bin VAD flag information is 0, it is indicated that the primary microphone signal includes no speech signal corresponding to this frequency bin. It should be noted that the value of the frequency bin VAD flag information, which is used for indicating the presence of a speech signal, is not limited in the present invention.

[0084] Step 202: controlling, according to the frequency bin VAD flag information, a Kalman filter to filter out a target speech signal from a secondary microphone signal, to obtain a secondary microphone noise signal.

[0085] In embodiments of the present application, the secondary microphone signal can be obtained by performing secondary sound pickup on a signal inputted by the secondary microphone of the terminal device, and the secondary microphone signal can also be referred to as a reference microphone signal. After the secondary microphone signal is obtained, the terminal device can control a Kalman filter according to the frequency bin VAD flag information obtained in step 201 to filter out a target speech signal from the secondary microphone signal. In an implementation, when the value of the frequency bin VAD flag information is 1, it is determined that the Kalman filter needs to be updated; when the value of the frequency bin VAD flag information is 0, it is determined that updating of the Kalman filter needs to be suspended. [0086] In this step, the target speech signal can be filtered out from the secondary microphone signal adaptively. In the handheld mode, the target speech signal may be a single target speech; while in a hands-free conference mode, the target speech signal may be a plurality of target speeches. The Kalman filter can also be referred to as a Kalman adaptive filter. For the Kalman filter, a target signal can be the secondary microphone signal, a reference signal can be the primary microphone signal, and an output signal can be the secondary microphone noise signal. In an implementation, the Kalman filter uses a covariance matrix of a filter coefficient error directly to change the step size, which is equivalent to automatic step size changes, and there is no need to change the step size manually, thereby achieving a good convergence performance of the Kalman filter.

[0087] Step 203: mapping the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal.

[0088] In embodiments of the present application, after the secondary microphone noise signal is estimated from the secondary microphone signal, the terminal device can use a manner of dynamic noise spectrum mapping to dynamically map the secondary microphone noise signal to the primary microphone signal, so as to estimate a primary microphone noise spectrum of the primary microphone signal. In an implementation, a dynamic compensation coefficient of the primary microphone signal can be calculated according to the primary microphone signal, a prior global speechless probability of the primary microphone signal and the secondary microphone noise signal; and then the primary microphone noise spectrum of the primary microphone signal can be estimated according to the dynamic compensation coefficient of the primary microphone signal and the secondary microphone noise signal.

[0089] Step 204: calculating a noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and outputting a noise -reduced primary speech signal.

[0090] In embodiments of the present application, after the primary microphone noise spectrum of the primary microphone signal is estimated, the terminal device can use the primary microphone noise spectrum of the primary microphone signal to calculate noise reduction gain for the primary microphone signal. In an implementation, the noise reduction gain can be calculated multiple times for the primary microphone signal to obtain a final gain and output the noise-reduced primary speech signal.

[0091] In an implementation, a total noise spectrum of the primary microphone signal can be obtained by combining the primary microphone noise spectrum of the primary microphone signal and a single-microphone noise spectrum of the primary microphone signal; and then the noise reduction gain can be calculated multiple times for the primary microphone signal according to the total noise spectrum of the primary microphone signal, so that the non-stationary noise in the primary microphone signal can be estimated more accurately in real time.

[0092] The multi-microphone noise reduction method of the present application can be achieved based on harmonic detection and the Kalman adaptive filter, which does not need to depend on ILD information and does not need to know an orientation of the target speech in advance, and thus can be applied to a scenario in which the location of the target acoustic source relative to the terminal device quickly changes in both handheld mode and hands-free mode, thereby improving the speech protection capability. Moreover, when the location of the target acoustic source relative to the terminal device does not change, stationary noises and non-stationary noises of a scattered field can be suppressed and an excellent speech protection capability can be realized. Thus the method can be applied to both the handheld mode and the hands-free mode, which achieves close noise reduction effects in the two modes, and improves consistency of subjective experiences of a call during mode switching.

[0093] FIG. 3 is a schematic diagram of a multi-microphone noise reduction method according to an embodiment of the present application. First, harmonic detection is performed on a primary microphone signal, to obtain an accurate frequency bin VAD flag information, which is used for controlling the updating of a Kalman adaptive filter. The target signal of the Kalman adaptive filter is a secondary microphone signal, and the reference signal is the primary microphone signal. When the value of the frequency bin VAD flag information is 1 , the adaptive filter coefficient can be updated. When the value of the frequency bin VAD flag information is 0, the updating of the adaptive filter is suspended. In this step, the target speech signal can be filtered out from the secondary microphone signal adaptively by the Kalman adaptive filter. In the handheld mode, the target speech signal may be a single target speech; while in a hands-free conference mode, the target speech signal may be a plurality of target speeches. The secondary microphone noise signal outputted by the Kalman adaptive filter is used as a reference component of noise spectrum estimation and mapped to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal, which is then combined with a single- microphone noise spectrum of the primary microphone signal to obtain a total noise spectrum estimation of the primary microphone signal. Then the noise reduction gain can be calculated multiple times according to the total noise spectrum information to obtain a final gain and enable to output a noise-reduced primary speech signal.

[0094] According to some embodiments of the present application, the step 201 of performing harmonic detection on the primary microphone signal to obtain the frequency bin VAD flag information can include: obtaining the frequency bin VAD flag information using a harmonic model and a state transfer probability matrix, where the harmonic model is used for detecting a speech harmonic characteristic in a cepstrum domain, and the frequency bin VAD flag information is a Boolean value for indicating whether a speech harmonic wave exists in the primary microphone signal.

[0095] Further, according to some embodiments of the present application, the obtaining the frequency bin VAD flag information using the harmonic model and the state transfer probability matrix can include:

[0096] Step 2011 : training the harmonic model according to a speech database;

[0097] Step 2012: obtaining speech state information of the primary microphone signal using the harmonic model and the state transfer probability matrix, where the speech state information includes a voiced state, an unvoiced state or a speechless state corresponding to each frequency bin;

[0098] Step 2013: calculating a cepstral excitation vector according to the speech state information; and

[0099] Step 2014: performing harmonic selection on the primary microphone signal according to the cepstral excitation vector and the harmonic model to determine whether the speech harmonic wave exists in the primary microphone signal, and outputting the frequency bin VAD flag information.

[0100] The terminal device can perform the aforementioned harmonic detection process on the primary microphone signal. In step 2011, the speech database used for training the harmonic model may be, for example, a TIMIT speech database. The TIMIT speech database stores accurate phonetic dimensions, and thus can be applied to speech segmentation performance evaluation. And the database contains a large number of speeches of speakers, and thus can be used to evaluate a speech of a speaker. The TIMIT speech database can be used for training a harmonic model of speeches in the cepstrum domain, and the trained harmonic model can be represented using a harmonic masking coefficient matrix. The harmonic model can be used for detecting a speech harmonic characteristic in the cepstrum domain, and can be trained using a Gaussian of mixture hidden Markov model, for example. It should be noted that the specific training method is not limited in the present application. The harmonic model of the present application can indicate the relationship between the speech and the energy corresponding to each frequency bin, which can be deemed as a distribution curve of amplitude and frequency response and represented by the harmonic masking coefficient matrix. In step 2012, the harmonic model and the state transfer probability matrix can be used to obtain speech state information of the primary microphone signal. It can be assumed that each frame of the input primary microphone signal can be in one of M+l ranges. For example, the“M” ranges refers to M frequency ranges (expressed in the cepstrum domain) in which the pitch of the signal is located, while the“1” range refers to an unvoiced or speechless segment. In this step, the state transfer probability matrix (e.g. (M+l)x(M+l)) can be designed so as to distinguish the transfer probability of the speech state of different frequencies (i.e. different periods in the cepstrum domain), where the speech state corresponding to a frequency bin may be a voiced state, an unvoiced state or a speechless state. As an example, for frequency ranges within 70-500Hz, if the speech state of the current frame is voiced, the probability of the speech state of the next frame being voiced is close to 1 ; and for frequency ranges within 2000-3000Hz, if the speech state of the current frame is voiced, the probability of the speech state of the next frame being voiced is relatively small. FIG. 4 shows a schematic diagram of a state transfer probability corresponding to a frequency range. According to some embodiments of the present application, the voiced state transfer probability of sequential frames can represent the possibility of transferring from the pitch period to the pitch period q^ If the pitch periods of two sequential frames are close, the transfer probability is relatively high, otherwise the transfer probability is relatively low. That is, the state transfer probability matrix can represent probabilities of the pitch transferring among different periods, and can be an (M+l )x (M+l) matrix, for example. After the state transfer probability matrix is designed, the state transfer probability matrix can be used to obtain the speech state information of the primary microphone signal in combination with the harmonic model. As an example, the speech state of each frequency bin is firstly determined according to the speech state of the last frame and the state transfer probability matrix, and then compared with the energy of corresponding frequency bin from the harmonic model, so as to determine an actual speech state of each frequency bin of the primary microphone signal. In step 2013, the cepstral excitation vector can be calculated according to the speech state information. As an example, after the speech state information of the primary microphone signal is obtained, pitch information can be obtained accordingly, and the cepstral excitation vector can be calculated according to the pitch information. The cepstral excitation vector can indicate harmonic information distribution in the cepstrum domain. In step 2014, the harmonic selection can be performed according to the cepstral excitation vector and the harmonic masking coefficient matrix of the harmonic model, so as to obtain the Boolean information of whether a speech harmonic wave corresponding to a frequency bin exists, and output the Boolean information as the frequency bin VAD flag information.

[0101] In the above harmonic detection process, the speech harmonic characteristic can be trained in the cepstrum domain and the state transfer probability matrix can be calculated. Since a time- frequency distribution characteristic of the speech and the state transfer probability matrix are used to determine whether a speech exists at a frequency bin utilizing more dimensions, a higher accuracy can be achieved and whether a speech exists at a frequency bin can be detected more accurately.

[0102] According to some embodiments of the present application, the step 202 of controlling, according to the frequency bin VAD flag information, the Kalman filter to filter out the target speech signal from the secondary microphone signal, to obtain the secondary microphone noise signal can include:

[0103] Step 2021 : obtaining a residual signal by using the primary microphone signal as a reference signal to adaptively remove the target speech signal in the secondary microphone signal using the Kalman filter, where the residual signal is the secondary microphone noise signal;

[0104] Step 2022: calculating a covariance matrix of the residual signal according to a covariance matrix of a filter coefficient error;

[0105] Step 2023: calculating a Kalman gain according to the covariance matrix of the residual signal;

[0106] Step 2024: determining, according to the frequency bin VAD flag information, whether the Kalman filter needs to be updated;

[0107] Step 2025: updating a filter coefficient according to the Kalman gain when the Kalman filter needs to be updated; and [0108] Step 2026: updating the covariance matrix of the filter coefficient error according to the updated filter coefficient.

[0109] The terminal device can perform the aforementioned adaptive filtering process on the secondary microphone signal. An adaptive filtering process of a frame of the secondary microphone signal will be described as an example. In step 2021, the adaptive filtering can be performed on the primary microphone signal and the secondary microphone signal using the Kalman filter, and a filtered residual signal (which is also referred to as a residual signal after filtering) can be obtained. The primary microphone signal includes a speech signal and the primary microphone signal can be used as a reference signal to remove the speech signal in the secondary microphone signal using the Kalman filter in step 202, and the filtered residual signal is the secondary microphone noise signal.

[0110] In step 2022, the covariance matrix of the residual signal can be calculated by the following formula:

[0111] Sk = HkP k - 1 \ k -\ HJ + R k ,

[0112] where S k is the covariance matrix of the residual signal, Pk -\ \k-\ is the covariance matrix of the filter coefficient error, H k is the reference signal (e.g.. the primary microphone signal in the embodiments), R k is a noise signal, and k represents the current frame.

[0113] In step 2023, the Kalman gain can be calculated using the covariance matrix of the residual signal, and the Kalman gain can be calculated by the following formula:

[0114] K k = Pk - 1 \ k - HJ S k ,

[0115] where Kk is the Kalman gain, S k is the covariance matrix of the residual signal,

Pk - \ \k-\ is the covariance matrix of the filter coefficient error, and Hk is the reference signal (e.g.. the primary microphone signal in the embodiments).

[0116] In step 2024, the frequency bin VAD flag information can be used for determining whether the Kalman filter needs to be updated. In an implementation, it can be determined that the Kalman filter needs to be updated when the value of the frequency bin VAD flag information is 1 ; and/or it can be determined that updating of the Kalman filter needs to be suspended when the value of the frequency bin VAD flag information is 0.

[0117] In step 2025, the Kalman filter coefficient can be updated using the Kalman gain, and the Kalman filter coefficient can be calculated by the following formula: [0118] Xk |l— Xk - i |A:-1 -f- KkYk ,

[0119] where X k \ k is the updated Kalman filter coefficient, Xk- \ \ k~\ is the Kalman filter coefficient before the updating, K k is the Kalman gain, and Y k is the filtered residual signal.

[0120] In step 2026, the covariance matrix of the Kalman filter coefficient error can be updated, which can be used as the Kalman filter coefficient for the next frame. The updated covariance matrix of the Kalman filter coefficient error can be calculated by the following formula:

[0122] where Pk\k is the updated covariance matrix of the Kalman filter coefficient error, Pk - \ k -i is the covariance matrix of the filter coefficient error before the updating, K k is the Kalman gain, H k is the reference signal, and Qk is the variance expected value of KkYk .

[0123] By using the above Kalman adaptive filtering process, the filter coefficient updating can be achieved based on the harmonic detection. Only the target speech signal in the secondary microphone signal is removed through filtration, while the secondary microphone noise signal is reserved, so that noise spectrum estimation after dynamic mapping can be more accurate. The capability of filtering out the target speech can be enhanced since the Kalman filter is updated according to the frequency bin Boolean value obtained through the harmonic detection, and meanwhile the location change of the target speech source can be rapidly tracked, so as to achieve sound pickup in any orientation. Moreover, after obtaining the accurate frequency bin VAD flag information, the terminal device can use the frequency bin VAD flag information to control the updating of the Kalman adaptive filter. For example, when the value of the frequency bin VAD flag information is 1 , which indicates that a speech signal exists in the primary microphone signal, the Kalman filter coefficient can be updated; when the value of the frequency bin VAD flag information is 0, which indicates that no speech signal exists in the primary microphone signal, the updating of the Kalman filter can be suspended. That is, the frequency bin VAD flag information can be used to determine whether a target speech signal exists, and to control the updating of the Kalman filter coefficient accordingly, so as to update the Kalman filter coefficient only when the target speech signal exists, thereby achieving the purpose of filtering out only the target speech while reserving the interference noises. [0124] According to some embodiments of the present application, after the step 202 of controlling, according to the frequency bin VAD flag information, the Kalman filter to fdter out the target speech signal from the secondary microphone signal, to obtain the secondary microphone noise signal, the method can further include:

[0125] Step 2027: performing harmonic detection on the secondary microphone noise signal; and

[0126] Step 2028: accelerating updating of the Kalman filter when a speech harmonic wave exists in the secondary microphone noise signal.

[0127] In order to improve the adaptive filtering efficiency, the terminal device can also accelerate the updating of the Kalman filter when there is still a speech harmonic wave in the secondary microphone noise signal. For example, the terminal device can detect whether there is a speech harmonic wave in the filtered residual signal obtained in step 2021 , which can be realized by harmonic detection. If there is a speech harmonic wave in both of the residual signal and the primary microphone signal, a minimum, i.e. a lower limit, can be set for the covariance matrix of the residual signal and the updating of the Kalman filter can be accelerated. By increasing the step size of the filter updating, the refresh rate of the filter coefficient can be accelerated; by adjusting the relevant parameters of the Kalman filter, the update speed of the filter can be enhanced.

[0128] In the above filter updating process, the filter updating is controlled by using a Boolean value obtained through the first harmonic detection, and the second harmonic detection is used to determine whether target speech leakage exists and to accelerate the filter convergence, thus the capability of filtering out the target speech signal can be more strong and good robustness to the location change can be achieved.

[0129] According to some embodiments of the present application, the step 203 of mapping the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain the primary microphone noise spectrum of the primary microphone signal can include:

[0130] Step 2031 : calculating a prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal;

[0131] Step 2032: calculating a dynamic compensation coefficient of the primary microphone signal according to the primary microphone signal, the prior global speechless probability of the primary microphone signal and the secondary microphone noise signal; and [0132] Step 2033 : calculating the primary microphone noise spectrum of the primary microphone signal according to the dynamic compensation coefficient of the primary microphone signal and the secondary microphone noise signal.

[0133] In step 2031, the prior global speechless probability of the primary microphone signal can be calculated according to the primary microphone signal and the secondary microphone signal, and this probability can be used as a smoothing coefficient of a speechless period for dynamic noise spectrum mapping.

[0134] Optionally, a coherent noise in the primary microphone signal can be filtered out by a Kalman filter, to obtain a coherent-suppressed primary microphone signal (which is also referred to as a primary microphone signal after coherent suppression). The noises can be divided into scattering noises and coherent noises from the perspective of sound field distribution. The terminal device can perform adaptive filtering on the primary microphone signal and the secondary microphone noise signal using a Kalman filter, so that the coherent noise in the primary microphone signal can be filtered out, and the coherent-suppressed primary microphone signal can be obtained. That is, by filtering out the coherent noise, the coherent suppression of the primary microphone signal can be achieved. It should be noted that there is no sequential order between the step 2031 and the step of obtaining the coherent-suppressed primary microphone signal.

[0135] In step 2032, the terminal device can calculate the dynamic compensation coefficient of the primary microphone signal according to the primary microphone signal (or optionally the coherent-suppressed primary microphone signal), the prior global speechless probability and the secondary microphone noise signal. The dynamic compensation coefficient of the primary microphone signal can be continuously updated, for example, the dynamic compensation coefficient of the current frame can be updated using the primary microphone signal (or optionally the coherent- suppressed primary microphone signal), the prior global speechless probability and the secondary microphone noise signal obtained from the previous frame. For example, the dynamic compensation coefficient may be a ratio of primary microphone smoothing energy to secondary microphone smoothing energy. In step 2033, the primary microphone noise spectrum of the primary microphone signal can be calculated according to the updated dynamic compensation coefficient and the secondary microphone noise signal.

[0136] In the above dynamic noise spectrum mapping process, since the primary microphone signal contains the target speech signal, there may be a problem of over-estimating or under estimating if the noise spectrum is estimated directly using the primary microphone signal. Thus, by mapping the secondary microphone signal with speech removed (i.e. the residual secondary microphone noise signal) to the primary microphone signal and estimating the primary microphone noise spectrum, the acoustic transfer function and frequency response difference of the primary and secondary microphones can be dynamically calculated, making the noise spectrum estimation more accurate.

[0137] Further, according to some embodiments of the present application, the step 2031 of calculating the prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal can include:

[0138] Step 20311 : calculating a coherence function of a noise of a scattered field according to a distance between a primary microphone and a secondary microphone;

[0139] Step 20312: calculating a complex coherence function of the primary microphone signal and the secondary microphone signal;

[0140] Step 20313: calculating an incident angle parameter of the primary microphone signal according to the coherence function of the noise of the scattered field and the complex coherence function of the primary microphone signal;

[0141] Step 20314: calculating a complex coherence coefficient according to the incident angle parameter;

[0142] Step 20315: calculating a prior speechless probability according to the incident angle parameter and the complex coherence coefficient; and

[0143] Step 20316: performing smoothing on the prior speechless probability in a time-frequency domain, to obtain the prior global speechless probability of the primary microphone signal.

[0144] The calculation process of the prior global speechless probability based on a complex coherence function is described in steps 20311-20316. In step 20311, the coherence function of the noise of the scattering field can be calculated according to the distance between the primary microphone and the secondary microphone. The coherence function in step 20311 can be a coherence function of an ideal scattering field based on theoretical assumption, which is independent of specific signals and only related to the distance between microphones, frequency and sound velocity. In step 20312, the complex coherence function of the two microphone signals can be calculated, and the real and imaginary parts can be obtained. The complex coherence function in step 20312 can be the calculated actual complex coherence function between the signals collected on the two microphones. In step 20313, the incident angle parameter of the speech can be calculated from the above information. In step 20314, the complex coherence coefficient can be calculated according to the incident angle parameter. In step 20315, the prior speechless probability can be calculated from the above information, which is then smoothed in the time-frequency domain to obtain the prior global speechless probability. The prior global speechless probability can be taken as the smoothing coefficient of the speechless period and used for the dynamic noise spectrum mapping, and the dynamic noise spectrum mapping can be used for estimating the energy of the smoothed primary and secondary microphone signals.

[0145] According to some embodiments of the present application, the step 204 of calculating the noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and output the noise-reduced primary speech signal can include:

[0146] Step 2041 : obtaining a single-microphone noise spectrum of the primary microphone signal;

[0147] Step 2042: obtaining a total noise spectrum of the primary microphone signal according to the primary microphone noise spectrum of the primary microphone signal and the single microphone noise spectrum of the primary microphone signal; and

[0148] Step 2043: calculating the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal, and outputting the noise- reduced primary speech signal.

[0149] According to some embodiments of the present application, the step 2041 of obtaining a single-microphone noise spectrum of the primary microphone signal can include:

[0150] Step 20411 : calculating a posterior global signal-to-noise ratio (SNR) of the primary microphone signal using global smoothing, and a posterior local SNR of the primary microphone signal using local smoothing;

[0151] Step 20412: calculating a speech occurrence probability according to the posterior global SNR, the posterior local SNR and pitch information of the primary microphone signal; and

[0152] Step 20413: estimating the single-microphone noise spectrum of the primary microphone signal according to the speech occurrence probability.

[0153] FIG. 5 is a schematic diagram of an enhanced minimum mean square error (MMSE) single-microphone noise spectrum estimation algorithm according to an embodiment of the present application. In step 20411, the posterior global SNR of the primary microphone signal can be calculated using global smoothing, and the posterior local SNR of the primary microphone signal can be calculated using local smoothing, as shown in FIG. 5, where y is a posterior SNR, f gi0bai is the posterior global SNR, y iocai is the posterior local SNR, Y is a complex coefficient of short-time Fourier transform of a current mixed signal of speech and noise, and represents power spectral density estimation of the noise. In step 20412, the speech occurrence probability can be calculated according to the posterior global SNR, the posterior local SNR and pitch information of the primary microphone signal. For example, as shown in FIG. 5, a global likelihood ratio Ag lobai can be calculated according to the posterior global SNR, a local likelihood ratio A local can be calculated according to the posterior local SNR, and a speech occurrence probability SPP can be calculated based on the posterior global SNR, the posterior local SNR, the global likelihood ratio and the local likelihood ratio, where Ki and K2 are constants. Then smoothing and lag protection can be performed on the speech occurrence probability. Meanwhile, the pitch information of the primary microphone signal can be obtained through pitch detection, and the cepstrum pitch threshold T pjtch can be used for achieving speech occurrence probability pitch protection, where Q pit c h represents a cepstrum period range over which the pitch may be distributed, q is the current cepstrum period. In step 20413, the single-microphone noise spectrum of the primary microphone signal can be estimated according to the speech occurrence probability. For example, a transient noise spectrum \N\ can be estimated according to the speech occurrence probability, and then a final noise spectrum can be estimated based on the transient noise spectrum, as shown in FIG. 5.

[0154] In the above enhanced MMSE single-microphone noise spectrum estimation algorithm, the noise spectrum can be updated in real time by using the speech occurrence probability, which avoids the selection of a time window, thereby achieving noise tracking in real-time. Moreover, in order to estimate the speech occurrence probability more accurately, and overcome short-time fluctuation of the speech and noise, global smoothing and local smoothing are used for calculating the posterior SNR, so that the noise estimation (especially for non- stationary noises) can be more accurate. [0155] In step 2042, a total noise spectrum of the primary microphone signal can be obtained according to the primary microphone noise spectrum of the primary microphone signal and the single- microphone noise spectrum of the primary microphone signal. The terminal device can superimpose the primary microphone noise spectrum of the primary microphone signal and the single-microphone noise spectrum of the primary microphone signal so as to obtain the total noise spectrum of the primary microphone signal. In an implementation, the terminal device can control the degree of noise reduction by adjusting the proportion coefficients of the primary microphone noise spectrum and the single-microphone noise spectrum in the total noise spectrum. By combining the primary microphone noise spectrum and the single-microphone noise spectrum and calculating the total noise spectrum according to the present application, the non- stationary noises in the primary microphone signal can be estimated more accurately in real time.

[0156] According to some embodiments of the present application, the step 2043 of calculating the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal can include: calculating the noise reduction gain for the primary microphone signal multiple times according to the total noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal.

[0157] Further, according to some embodiments of the present application, the calculating the noise reduction gain for the primary microphone signal multiple times according to the total noise spectrum of the primary microphone signal, and outputting the noise-reduced primary speech signal can include:

[0158] Step 20431 : calculating a prior SNR of the primary microphone signal according to the primary microphone signal and the total noise spectrum of the primary microphone signal;

[0159] Step 20432: calculating an initial gain for the primary microphone signal according to the prior SNR of the primary microphone signal to obtain an initial gain result;

[0160] Step 20433: performing harmonic enhancement on the primary microphone signal according to the initial gain result to obtain a harmonic enhanced primary microphone signal;

[0161] Step 20434: calculating a secondary gain for the harmonic enhanced primary microphone signal to obtain a secondary gain result;

[0162] Step 20435: performing cepstrum smoothing on the secondary gain result to obtain a cepstrum smoothed primary microphone signal;

[0163] Step 20436: performing harmonic replacement on the cepstrum smoothed primary microphone signal when an amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold, to obtain a harmonic replaced primary microphone signal;

[0164] Step 20437: performing inverse transformation to a frequency domain on the harmonic replaced primary microphone signal, to obtain a smoothed SNR; and

[0165] Step 20438: calculating the noise reduction gain for the primary microphone signal according to the smoothed SNR, and outputting the noise-reduced primary speech signal.

[0166] The process of multiple times of the noise reduction gain calculation is described in steps 20431 to 20438. In step 20431, the terminal device can calculate the prior SNR of the primary microphone signal according to the primary microphone and the total noise spectrum of the primary microphone signal. It should be noted that the terminal device can also calculate the prior SNR of the primary microphone signal according to the coherent-suppressed primary microphone signal and the total noise spectrum of the primary microphone signal, and the process of obtaining the coherent- suppressed primary microphone signal has been described above and will not be repeated here. The prior SNR can be calculated for example using a decision directed (DD) method, and the prior SNR (in dB) can be obtained by calculating the energy ratio of the signal to the noise, taking the logarithm thereof and multiplying the logarithm by 10. In step 20432, after the prior SNR is calculated, an initial gain can be calculated for example using a Wiener filter and a gain enabled process can be performed, to obtain the initial gain result, which can also be referred to as a signal after initial noise reduction. In step 20434, the secondary gain for the harmonic enhanced primary microphone signal can be calculated to obtain the secondary gain result, which can also be referred to as a signal after secondary noise reduction. In step 20436, the harmonic replacement can be used for saving the harmonic structure of the speech. For example, the result of the cepstrum smoothing can be used for a certain frequency range (for example, the formant range), this result including the enhancement of harmonic components, while for other frequency ranges (for example, the overtone range), the result of the cepstrum smoothing without enhancement can be directly used, and the two can be spliced to obtain the final result. In these steps, the cepstrum smoothed SNR can be calculated by performing cepstrum smoothing on the initial prior SNR, and the cepstrum smoothing can be smoothing of the estimated prior SNR, which is the most important parameter for calculating the noise reduction gain. In these steps, the calculation of the prior SNR, the calculation of the initial gain, the harmonic enhancement, the calculation of the second gain, the cepstrum smoothing and the calculation of the final gain can be regarded as an overall noise reduction gain calculation process. Through the multiple times of the noise reduction gain calculation, noise reduction processing for the primary microphone signal can be achieved, and the noise-reduced primary speech signal can be outputted finally.

[0167] According to the cepstrum smoothing process of the present application, cooperative use of cepstrum smoothing and harmonic detection can be achieved. The pitch calculation result in harmonic detection can be used and combined with the determining based on the cepstrum pitch threshold, and an interference frequency bin can be filtered out by using harmonic selection. However, a conventional method uses only the determining based on the cepstrum pitch threshold, which is prone to being interfered by a speech-like noise, and the accuracy of a distinguishing degree of a speech and a noise is relatively low. Thus, compared with the conventional cepstrum smoothing method, the protection over the target speech signal can be better, and the noise residual can be more stationary.

[0168] According to some embodiments of the present application, before the step 20435 of performing cepstrum smoothing on the secondary gain result to obtain the cepstrum smoothed primary microphone signal, the method can further include: performing harmonic selection according to the primary microphone noise spectrum of the primary microphone signal and pitch information of the primary microphone signal, to obtain a harmonic selection result; determining, according to the harmonic selection result, whether a speech harmonic wave exists in the secondary gain result; and setting the pitch information which needs to be detected during the cepstrum smoothing to 0 when there is no speech harmonic wave in the secondary gain result. In the foregoing embodiments provided by the present application, the harmonic selection can be a sub-process of the harmonic detection. The noise spectrum information and pitch information obtained from the harmonic detection can be used to determine whether a harmonic wave exists in the current frame, and further guide the cepstrum smoothing and updating if there is no harmonic wave by setting the pitch in the cepstrum smoothing to 0. For example, in the case where pitch exists while no harmonic wave exists, which may indicate that residual noises still exist, then the value of the pitch can be set to 0, which prevents the pitch detection error in the cepstrum smoothing when the non-stationary noise remains, and can achieve double judgments on the pitch detection and improve the accuracy of the pitch detection.

[0169] FIG. 6 is a schematic diagram of cepstrum smoothing process in a multi-microphone noise reduction method in a specific scenario according to an embodiment of the present application. First, cepstral Fourier transformation can be performed on the signal after secondary noise reduction so as to transform the signal after secondary noise reduction to the cepstrum domain, and harmonic selection can be performed according to the noise spectrum information and pitch information to obtain a harmonic selection result. Whether there is a speech harmonic wave in the noise spectrum can be determined according to the harmonic selection result, and if there is no harmonic wave, the pitch information which needs to be detected during the cepstrum smoothing can be set to 0. Then cepstrum smoothing can be performed on the signal after secondary noise reduction that is transformed, and whether the amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold can be determined. When the amplitude of the cepstrum smoothed primary microphone signal within the pitch distribution range is greater than the preset threshold, the corresponding smoothed place (i.e. corresponding to a certain cepstrum period) is replaced with an original value, to obtain a harmonic replaced primary microphone signal. The corresponding smoothed place refers to a place corresponding to the cepstrum domain, and places corresponding to the cepstrum domain can be understood as different periods of the signal. The original value refers to the current value without being smoothed. Finally, inverse transformation to the frequency domain can be performed on the harmonic replaced primary microphone signal, to obtain a smoothed SNR, which is then used for calculation of the final noise reduction gain.

[0170] Seen from the above solutions, the multi-microphone noise reduction can be achieved based on the Kalman adaptive filter, which does not need to depend on ILD information and has a strong capability of filtering out the target speech signal. Thus the multi-microphone noise reduction method according to the present application has good robustness to location changes, various noises and application scenarios. The method also has an improved speech protection capability, and can be applied to both the handheld mode and the hands-free mode, which achieves close noise reduction effects in the two modes, and improves consistency of subjective experiences of a call during mode switching. Moreover, by obtaining the total noise spectrum of the primary microphone signal according to the primary microphone noise spectrum of the primary microphone signal and the single- microphone noise spectrum of the primary microphone signal, and calculating the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal, the short-term fluctuation of the speech and noise can be overcome, so that the noise estimation can be more accurate, the noise residual and speech impairment can be reduced, and the effect of speech noise reduction can be improved.

[0171] To facilitate better understanding and implementation of the solutions in the embodiments of the present application, the solutions of the present application will be described in combination with the following specific application scenarios.

[0172] FIG. 7 is a schematic diagram of a multi-microphone noise reduction method in a specific scenario according to an embodiment of the present application. The multi-microphone noise reduction method can be performed by a terminal device, such as the first or second terminal device shown in FIG. 1. The method can include:

[0173] Step 601 : performing windowing Fourier transformation on a primary microphone time- domain signal and a secondary microphone time-domain signal to obtain a primary microphone signal and a secondary microphone signal.

[0174] Step 602: performing single-microphone noise spectrum estimation on a primary microphone time-frequency signal to obtain a single-microphone noise spectrum.

[0175] Step 603: performing harmonic detection on the primary microphone signal to obtain frequency bin VAD flag information.

[0176] Step 604: adaptively filtering out a target speech in the secondary microphone signal according to the frequency bin VAD flag information to obtain a filtered secondary microphone signal (which is also referred to as a secondary microphone signal after filtering).

[0177] Step 605 : performing harmonic detection on the filtered secondary microphone signal, and accelerating the filter using the frequency bin VAD flag information.

[0178] Step 606: performing global speechless probability calculation on the primary microphone signal and the secondary microphone signal based on a complex coherence function, and outputting a global speechless probability.

[0179] Step 607: adaptively filtering out a coherent noise in the primary microphone signal, and outputting a coherent-suppressed primary microphone signal.

[0180] Step 608: calculating a dynamic compensation coefficient to be used in the next frame, according to the coherent-suppressed primary microphone signal, the global speechless probability and the filtered secondary microphone signal and a signal after secondary noise reduction.

[0181] Step 609: calculating a total noise spectrum by using the secondary microphone signal, the dynamic compensation coefficient and the single-microphone noise spectrum.

[0182] Step 610: calculating a prior SNR of the primary microphone signal according to the coherent-suppressed primary microphone signal and the total noise spectrum of the primary microphone signal. In an implementation of the present application, a prior SNR smoothing coefficient can be calculated according to the coherent-suppressed primary microphone signal and the total noise spectrum of the primary microphone signal, and then the prior SNR can be calculated according to the prior SNR smoothing coefficient.

[0183] Step 611 : calculating an initial gain for the primary microphone signal according to the prior SNR of the primary microphone signal to obtain an initial gain result.

[0184] Step 612: performing harmonic enhancement on the primary microphone signal according to the initial gain result to obtain a harmonic enhanced primary microphone signal.

[0185] Step 613: calculating a secondary gain for the harmonic enhanced primary microphone signal to obtain a secondary gain result.

[0186] Step 614: performing cepstrum smoothing on the secondary gain result to obtain a cepstrum smoothed primary microphone signal.

[0187] Step 615: performing harmonic replacement on the cepstrum smoothed primary microphone signal when the amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold, to obtain a harmonic replaced primary microphone signal.

[0188] Step 616: performing inverse transformation to a frequency domain on the harmonic replaced primary microphone signal, to obtain a smoothed SNR.

[0189] Step 617: calculating the noise reduction gain for the primary microphone signal according to the smoothed SNR, and outputting the noise-reduced primary speech signal.

[0190] The specific processes or implementations of these steps are similar as those described in the foregoing embodiments, and will not be repeated here.

[0191] FIG. 8 is a schematic diagram of a multi-microphone noise reduction method in another specific scenario according to an embodiment of the present application. The embodiment shown in FIG. 8 is similar to the embodiment shown in FIG. 7, and the main difference lies in that the embodiment shown in FIG. 8 introduces ILD information of the primary (e.g. bottom) and secondary (e.g. top) microphones for call angle control.

[0192] FIG. 9 is a schematic diagram of an application scenario of a multi-microphone noise reduction method based on ILD information according to an embodiment of the present application. As an example, the spacing between the bottom microphone and the top microphone may be approximately 12 cm. In the handheld mode, the distance from the bottom microphone to the mouth of a speaker may be approximately 5 cm. Since voice near-field propagation is characterized in a quickly decreased sound pressure, the speech energy collected by the bottom microphone is much greater than the speech energy collected by the top microphone. The difference between the two energies is generally referred to as an interaural level difference (ILD) whose value is generally between 6 dB and 15 dB. It is assumed that a noise is in a far field, and energies of noises received by the two microphones are basically close. Therefore, a speech segment and a noise segment can be distinguished by using this characteristic, thereby performing noise suppression processing. Since energy distribution of the noise energies at the two microphones is unrelated to noise stationarity, the method can also achieve a relatively powerful suppression effect for a non-stationary noise.

[0193] The multi-microphone noise reduction method based on ILD information may be mainly used in the handset mode. As shown in FIG. 8, the call angle control is based on the calculation of a frame ILD, and then determining based on a fixed threshold of ILD is performed so as to determine whether the Kalman filter coefficient needs to be updated, thereby controlling the pickup angle of the call. The target speech of the secondary microphone signal is filtered out only within a certain range of angle, while the speech signals outside the threshold range of the ILD are regarded as interferences and will not be filtered out, and will eventually be suppressed as noises, thereby better suppressing background vocals and music interferences. Using the ILD information can achieve better noise reduction in pickup areas.

[0194] An existing ILD-based dual-microphone noise reduction method is confronted with several difficulties. First, since this method depends on ILD information, and ILD information changes as the relative location of the mobile phone changes, if a fixed threshold of ILD is used, misjudgment of a speech segment will be caused, and consequently speech impairment will be generated. Even if a dynamic threshold of ILD is used, when the distance of the bottom microphone of the mobile phone relative to the mouth of a speaker is the same as that of the top microphone relative to the mouth of the speaker, the ILD is close to 0. In this case, speech and noise information in the two microphones cannot be distinguished and thus the single-microphone noise reduction method has to be used. Thus, the noise reduction effect may be degraded into that of the single- microphone noise reduction method when dynamic ILD threshold is used, while a severe speech impairment may be generated when fixed ILD threshold is used. Additionally, when the energy of a target speech is relatively small, the speech mixing energy cannot reflect an obvious distinguishing degree between the two microphones, and a noise residual or a speech impairment will also be caused.

[0195] However, according to this embodiment of the multi-microphone noise reduction method based on ILD information of the present application, the terminal device can perform call angle control, and control the harmonic detection result based on a microphone energy ratio. The frame smoothing energy ratio and the determining based on a threshold are used, which are then used for controlling the frequency bin VAD result of the harmonic detection, so that whether the filter is updated is precisely controlled at a frame level, thereby controlling degree of noise spectrum estimation. If whether a noise is updated is controlled directly by using the energy ratio, a residual will occur in the noise after the noise reduction when the primary microphone noise is greater than the secondary microphone noise. However, after the harmonic detection is used, a harmonic wave will not be detected in this case, and the noise spectrum is still dynamically updated. The noise residual problem can be greatly improved. The method of this embodiment can support a call in a particular angle range, and a change of the angle range can be supported by setting a single parameter.

[0196] According to some embodiments of the present application, in still another application scenario, the multi-microphone noise reduction method as described in FIG. 7 can also be achieved in combination with microphone array beamforming technology.

[0197] FIG. 10 is a schematic diagram of a microphone array beamforming technology, where .v(r ,/) is the speech signal, h N {Y,t ) represents the channel condition, X N is the microphone signal, and W b * fN (e '° ) is the filter coefficient. In this embodiment, a beam is formed by using a spatial characteristic of signals from multiple microphones to point to the direction of a target speech, filter calculation is performed by using a particular noise field model or an actual noise field model, and a signal after beamforming is obtained by means of filtering output. [0198] In an existing noise reduction method, single-microphone noise reduction processing may be performed after the above beamforming process. This beamforming-based multi-microphone noise reduction algorithm is confronted with several difficulties. First, beamforming needs to obtain orientation information of the target speech. When the mobile phone is in the hands-free mode, the location of a user relative to the mobile phone usually changes, and consequently a beam direction has to be usually adjusted. When multiple interference sources occur, a beam may point to an erroneous direction to cause a severe speech impairment. Second, the beamforming has relatively high requirements on arrangement locations of and the quantity of microphones, however the mobile phone includes a relatively small quantity of microphones, and most mobile phones include only two microphones. Therefore, a gain effect generated by the beamforming is not obvious. Additionally, if beamforming in a fixed direction is used, the target speech can form a positive gain of speech enhancement only in the beam. In a low signal-to-noise ratio (SNR) and reverberation scenario, acoustic source localization may have a relatively large deviation, causing a speech impairment. In a fixed beam method used on the mobile phone, it is generally assumed that the target speech is in a direction of the horizontal plane of the mobile phone, and this method usually does not have a sound pickup capability in any orientation. Moreover, from the perspective of the algorithm principle, the beamforming needs a particular noise field model or a dynamic estimation noise field model. When the particular noise field model is used, the beamforming effect will be degraded if the actual noise field is not consistent with that in the assumption. When the dynamic estimation noise field model is used, a noise spectrum needs to be accurately estimated. However, it is difficult to accurately estimate a non-stationary noise, and the beamforming effect will also be degraded, causing a noise residual or speech impairment.

[0199] However, according to some embodiments of the present application, after the above beamforming process, the multi-microphone noise reduction method as described in FIG. 7 can be further performed, and thus good robustness to location changes, various noises and application scenarios can be obtained. In an implementation, the above beamforming process can be used as front end processing. The signal generated after the above beamforming process can be used as the primary microphone signal in the aforementioned embodiments, while a certain path of signals picked up by the microphone array can be used as the secondary microphone signal. Other specific processes or implementations are similar as those described in the foregoing embodiments, and will not be repeated here.

[0200] It should be noted that, for simplicity of description, the foregoing method embodiments are described as a combination of a series of actions. However, those skilled in the art should understand that the present application is not limited to the described sequence of actions, and some of the steps may be performed in other sequences or concurrently according to the present application. Moreover, those skilled in the art should also know that the actions (or modules) involved in embodiments described above are not necessarily required in the present application.

[0201] In order to better implement the above solutions of the embodiments of the present application, related apparatus for implementing the solutions of the present application are provided below.

[0202] FIG. 11 is a schematic structural diagram of a multi-microphone noise reduction apparatus according to an embodiment of the present application. As shown in FIG. 11, the apparatus 1100 can include:

[0203] a first harmonic detection module 1101, configured to perform harmonic detection on a primary microphone signal to obtain frequency bin voice activity detection (VAD) flag information;

[0204] a filter control module 1102, configured to control, according to the frequency bin VAD flag information, a Kalman filter to filter out a target speech signal from a secondary microphone signal, to obtain a secondary microphone noise signal;

[0205] a mapping module 1103, configured to map the secondary microphone noise signal to the primary microphone signal through dynamic noise spectrum mapping, to obtain a primary microphone noise spectrum of the primary microphone signal; and

[0206] a gain calculating module 1104, configured to calculate a noise reduction gain for the primary microphone signal according to at least the primary microphone noise spectrum of the primary microphone signal, and

[0207] an outputting module 1105, configured to output a noise-reduced primary speech signal.

[0208] According to some embodiments, the first harmonic detection module 1101 can be specifically configured to obtain the frequency bin VAD flag information using a harmonic model and a state transfer probability matrix, where the harmonic model is used for detecting a speech harmonic characteristic in a cepstrum domain, and the frequency bin VAD flag information is a Boolean value for indicating whether a speech harmonic wave exists in the primary microphone signal.

[0209] FIG. 12 is a schematic structural diagram of a first harmonic detection module according to an embodiment of the present application. As shown in FIG. 12, the first harmonic detection module 1101 can include:

[0210] a training unit 11011, configured to train the harmonic model according to a speech database;

[0211] a first obtaining unit 11012, configured to obtain speech state information of the primary microphone signal using the harmonic model and the state transfer probability matrix, where the speech state information includes a voiced state, an unvoiced state or a speechless state corresponding to each frequency bin;

[0212] a first calculating unit 11013, configured to calculate a cepstral excitation vector according to the speech state information; and

[0213] a harmonic selection unit 11014, configured to perform harmonic selection on the primary microphone signal according to the cepstral excitation vector and the harmonic model to determine whether the speech harmonic wave exists in the primary microphone signal, and outputting the frequency bin VAD flag information.

[0214] FIG. 13 is a schematic structural diagram of a filter control module according to an embodiment of the present application. As shown in FIG. 13, the filter control module 1102 can include:

[0215] a filtering unit 11021 , configured to obtain a residual signal by using the primary microphone signal as a reference signal to adaptively remove the target speech signal in the secondary microphone signal using the Kalman filter, where the residual signal is the secondary microphone noise signal;

[0216] a second calculating unit 11022, configured to calculate a covariance matrix of the residual signal according to a covariance matrix of a filter coefficient error; and calculate a Kalman gain according to the covariance matrix of the residual signal;

[0217] a determining unit 11023, configured to determine, according to the frequency bin VAD flag information, whether the Kalman filter needs to be updated; and

[0218] an updating unit 11024, configured to update a filter coefficient according to the Kalman gain when the Kalman filter needs to be updated; and update the covariance matrix of the filter coefficient error according to the updated filter coefficient.

[0219] According to some embodiments, the determining unit 11023 can be specifically configured to: determine that the Kalman filter needs to be updated when a value of the frequency bin VAD flag information is 1 ; and/or determine that updating of the Kalman filter needs to be suspended when a value of the frequency bin VAD flag information is 0.

[0220] FIG. 14 is a schematic structural diagram of a multi-microphone noise reduction apparatus according to another embodiment of the present application. As shown in FIG. 14, the apparatus can further include:

[0221] a second harmonic detection module 1106, configured to perform harmonic detection on the secondary microphone noise signal; and

[0222] an accelerating module 1107, configured to accelerate updating of the Kalman filter when a speech harmonic wave exists in the secondary microphone noise signal.

[0223] FIG. 15 is a schematic structural diagram of a mapping module according to an embodiment of the present application. As shown in FIG. 15, the mapping module 1103 can include:

[0224] a third calculating unit 11031, configured to calculate a prior global speechless probability of the primary microphone signal according to the primary microphone signal and the secondary microphone signal; calculate a dynamic compensation coefficient of the primary microphone signal according to the primary microphone signal, the prior global speechless probability of the primary microphone signal and the secondary microphone noise signal; and calculate the primary microphone noise spectrum of the primary microphone signal according to the dynamic compensation coefficient of the primary microphone signal and the secondary microphone noise signal.

[0225] According to some embodiments, the third calculating unit 11031 can be specifically configured to: calculate a coherence function of a noise of a scattered field according to a distance between a primary microphone and a secondary microphone; calculate a complex coherence function of the primary microphone signal and the secondary microphone signal; calculate an incident angle parameter of the primary microphone signal according to the coherence function of the noise of the scattered field and the complex coherence function of the primary microphone signal; calculate a complex coherence coefficient according to the incident angle parameter; calculate a prior speechless probability according to the incident angle parameter and the complex coherence coefficient; and perform smoothing on the prior speechless probability in a time-frequency domain, to obtain the prior global speechless probability of the primary microphone signal.

[0226] FIG. 16 is a schematic structural diagram of a gain calculating module according to an embodiment of the present application. As shown in FIG. 16, the gain calculating module 1104 can include:

[0227] a second obtaining unit 11041 , configured to obtain a single-microphone noise spectrum of the primary microphone signal; and obtain a total noise spectrum of the primary microphone signal according to the primary microphone noise spectrum of the primary microphone signal and the single microphone noise spectrum of the primary microphone signal; and

[0228] a fourth calculating unit 11042, configured to calculate the noise reduction gain for the primary microphone signal according to the total noise spectrum of the primary microphone signal.

[0229] According to some embodiments, the fourth calculating unit 11042 can be specifically configured to calculate the noise reduction gain for the primary microphone signal multiple times according to the total noise spectrum of the primary microphone signal.

[0230] According to some embodiments, the fourth calculating unit 11042 can be specifically configured to: calculate a prior signal-to-noise ratio (SNR) of the primary microphone signal according to the primary microphone signal and the total noise spectrum of the primary microphone signal; calculate an initial gain for the primary microphone signal according to the prior SNR of the primary microphone signal to obtain an initial gain result; perform harmonic enhancement on the primary microphone signal according to the initial gain result to obtain a harmonic enhanced primary microphone signal; calculate a secondary gain for the harmonic enhanced primary microphone signal to obtain a secondary gain result; perform cepstrum smoothing on the secondary gain result to obtain a cepstrum smoothed primary microphone signal; perform harmonic replacement on the cepstrum smoothed primary microphone signal when an amplitude of the cepstrum smoothed primary microphone signal within a pitch distribution range is greater than a preset threshold, to obtain a harmonic replaced primary microphone signal; perform inverse transformation to a frequency domain on the harmonic replaced primary microphone signal, to obtain a smoothed SNR; and calculate the noise reduction gain for the primary microphone signal according to the smoothed SNR.

[0231] In some embodiments, the fourth calculating unit 11042 can be further configured to: perform harmonic selection according to the primary microphone noise spectrum of the primary microphone signal and pitch information of the primary microphone signal, to obtain a harmonic selection result; determine, according to the harmonic selection result, whether a speech harmonic wave exists in the secondary gain result; and set the pitch information which needs to be detected during the cepstrum smoothing to 0 when there is no speech harmonic wave in the secondary gain result.

[0232] In some embodiments, the second obtaining unit 11041 can be specifically configured to: calculate a posterior global SNR of the primary microphone signal using global smoothing, and a posterior local SNR of the primary microphone signal using local smoothing; calculate a speech occurrence probability according to the posterior global SNR, the posterior local SNR and pitch information of the primary microphone signal; and estimate the single-microphone noise spectrum of the primary microphone signal according to the speech occurrence probability.

[0233] In some embodiments, as shown in FIG. 14, the apparatus can further include:

[0234] an interaural level difference (ILD) calculating module 1108, configured to calculate ILD information between a primary microphone and a secondary microphone; and

[0235] a call angle control module 1109, configured to control a call angle for the primary microphone signal according to the ILD information and the frequency bin YAD flag information.

[0236] By using the multi-microphone noise reduction apparatus according to embodiments of the present application, the multi-microphone noise reduction can be achieved based on the Kalman adaptive filter, which does not need to depend on ILD information and has a strong capability of filtering out the target speech signal. Thus the multi-microphone noise reduction apparatus according to the present application has good robustness to location changes, various noises and application scenarios. The apparatus also has an improved speech protection capability, and can be applied to both the handheld mode and the hands-free mode, which achieves close noise reduction effects in the two modes, and improves consistency of subjective experiences of a call during mode switching.

[0237] FIG. 17 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in FIG. 17, the terminal device 1700 can include: a transmitter 1701, a receiver 1702, a processor 1703, a memory 1704, a primary microphone 1705 and a secondary microphone 1706, where the memory 1704 stores program instructions that when executed by the processor 1703 cause the processor 1703 to perform the method according to any one of the above embodiments.

[0238] In some embodiments of the present application, the transmitter 1701, the receiver 1702, the processor 1703, the memory 1704, the primary microphone 1705 and the secondary microphone 1706 can be connected by a bus or other modes, which will not be limited herein.

[0239] The memory 1704 can include read only memory and random access memory, and can provide instructions and data to the processor 1703. The memory 1704 can store an operating system and operating instructions, an executable module or a data structure, or a subset thereof, or an extended set thereof. The operating instructions can include various operating instructions for implementing various operations. The operating system can include various system programs for implementing various basic services as well as handling hardware-based tasks.

[0240] The processor 1703 controls operations of a multi-microphone noise reduction apparatus. The processor 1703 can also be referred to as a central processing unit (CPU). In specific applications, components of the multi-microphone noise reduction apparatus can be coupled through a bus system, which can include a power bus, a control bus, a status signal bus and the like in addition to a data bus.

[0241] The method disclosed in the foregoing embodiments of the present application can be applied to the processor 1703 or implemented by the processor 1703. The processor 1703 may be an integrated circuit chip with signal processing capabilities. During the implementation, the steps of the above method may be implemented by integrated logic circuits of hardware in the processor 1703 or instructions in software. The processor 1703 may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable field- programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application can be implemented. The general purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

[0242] The transmitter 1701 can include a display device such as a display screen, and the transmitter 1701 can be configured to output numerical or character information through an external interface. The receiver 1702 can be configured to receive inputted numerical or character information and generate signal input related to settings and function control.

[0243] Persons skilled in the art can clearly know that, for convenience and brevity of description, the detailed working procedures of the apparatus and terminal device described above can be deduced effortlessly from the corresponding procedures in the method embodiments, and will not be repeated here. [0244] According to some embodiments, the present application also provides a computer readable storage medium, including non-transitory computer program instructions that when executed by a processor cause the processor to perform the method according to any one of the above embodiments.

[0245] According to some embodiments, the present application also provides a computer program product, including non-transitory computer program instructions that when executed by a processor cause the processor to perform the method according to any one of the above embodiments.

[0246] According to some embodiments, the present application also provides a computer program, including program code that when executed by a processor cause the processor to perform the method according to any one of the above embodiments.

[0247] Embodiments disclosed herein may be implemented by using hardware only or software or some combination thereof. Based on such understandings, the technical solution may be embodied in the form of a software product. The software product may be stored in a non-volatile or non- transitory storage medium, which can be a compact disk read-only memory (CD-ROM), USB flash disk, or a removable hard disk. The software product includes a number of instructions that enable a computer device (personal computer, server, or network device) to execute the methods provided in the embodiments.

[0248] Program code may be applied to input data to perform the functions described herein and to generate output information. The output information is applied to one or more output devices. In some embodiments, the communication interface may be a network communication interface. In embodiments in which elements are combined, the communication interface may be a software communication interface, such as those for inter-process communication. In still other embodiments, there may be a combination of communication interfaces implemented as hardware, software, and combination thereof.

[0249] Each computer program may be stored on a storage media or a device (e.g., ROM, magnetic disk, optical disc), readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. Embodiments of the system may also be considered to be implemented as a non-transitory computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.

[0250] Furthermore, the systems and methods of the described embodiments are capable of being distributed in a computer program product including a physical, non-transitory computer readable medium that bears computer usable instructions for one or more processors. The medium may be provided in various forms, including one or more diskettes, compact disks, tapes, chips, magnetic and electronic storage media, volatile memory, non-volatile memory and the like. Non-transitory computer-readable media may include all computer-readable media, with the exception being a transitory, propagating signal. The term non-transitory is not intended to exclude computer readable media such as primary memory, volatile memory, RAM and so on, where the data stored thereon may only be temporarily stored. The computer useable instructions may also be in various forms, including compiled and non-compiled code.

[0251] Numerous references will be made regarding servers, services, interfaces, portals, platforms, or other systems formed from hardware devices. It should be appreciated that the use of such terms is deemed to represent one or more devices having at least one processor configured to execute software instructions stored on a computer readable tangible, non-transitory medium. One should further appreciate the disclosed computer-based algorithms, processes, methods, or other types of instruction sets can be embodied as a computer program product comprising a non-transitory, tangible computer readable media storing the instructions that cause a processor to execute the disclosed steps.

[0252] Various example embodiments are described herein. Although each embodiment represents a single combination of inventive elements, the inventive subject matter is considered to include all possible combinations of the disclosed elements. Thus if one embodiment comprises elements A, B, and C, and a second embodiment comprises elements B and D, then the inventive subject matter is also considered to include other remaining combinations of A, B, C, or D, even if not explicitly disclosed.

[0253] As used herein, and unless the context dictates otherwise, the term“coupled to” is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms“coupled to” and“coupled with” are used synonymously.

[0254] The embodiments described herein are implemented by physical computer hardware embodiments. The embodiments described herein provide useful physical machines and particularly configured computer hardware arrangements of computing devices, servers, processors, memory, networks, for example. The embodiments described herein, for example, are directed to computer apparatuses, and methods implemented by computers through the processing and transformation of electronic data signals.

[0255] The embodiments described herein may involve computing devices, servers, receivers, transmitters, processors, memory, display, networks particularly configured to implement various acts. The embodiments described herein are directed to electronic machines adapted for processing and transforming electromagnetic signals which represent various types of information. The embodiments described herein pervasively and integrally relate to machines, and their uses; and the embodiments described herein have no meaning or practical applicability outside their use with computer hardware, machines, a various hardware components.

[0256] Substituting the computing devices, servers, receivers, transmitters, processors, memory, display, networks particularly configured to implement various acts for non-physical hardware, using mental steps for example, may substantially affect the way the embodiments work.

[0257] Such hardware limitations are clearly essential elements of the embodiments described herein, and they cannot be omitted or substituted for mental means without having a material effect on the operation and structure of the embodiments described herein. The hardware is essential to the embodiments described herein and is not merely used to perform steps expeditiously and in an efficient manner.

[0258] Although the present application and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the application as defined by the appended claims.

[0259] Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present application, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed, that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present application. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.