DIGITAL FILTER DESIGN METHOD AND APPARATUS FOR NOISE SUPPRESSION BY SPECTRAL SUBSTRACTION

Title:

DIGITAL FILTER DESIGN METHOD AND APPARATUS FOR NOISE SUPPRESSION BY SPECTRAL SUBSTRACTION

Document Type and Number:

WIPO Patent Application WO/2001/018961

Kind Code:

A1

Abstract:

A digital filter design apparatus for noise suppression by spectral subtraction includes a first spectrum estimator (12) for determining a high frequency-resolution noisy speech power spectral density estimate from a noisy speech signal block. A second spectrum estimator (24) determines a high frequency-resolution background noise power spectral density estimate from a background noise signal block. Averaging units (20, 26) form a piece-wise constant noisy speech power spectral density estimate and a piece-wise constant background noise power spectral density estimate. These averaging units are controlled by means (14, 16, 18) for adapting the length of individual segments to the shape of the high frequency-resolution noisy speech power spectral density estimate and for using the same segmentation in both piece-wise constant estimates. Means (28) determine a piece-wise constant digital filter transfer function using spectral subtraction based on the piece-wise constant noisy speech power spectral density estimate and the piece-wise constant background noise power spectral density estimate.

Inventors:

ERIKSSON ANDERS

Application Number:

PCT/SE2000/001609

Publication Date:

March 15, 2001

Filing Date:

August 23, 2000

Export Citation:

Click for automatic bibliography generation Help

Assignee:

ERICSSON TELEFON AB L M (SE)

International Classes:

G10L21/0208; H03H17/02; (IPC1-7): H03H17/02; H04B15/02; G10L21/02

Domestic Patent References:

WO1996024128A1	1996-08-08
WO1996012128A2	1996-04-25

Foreign References:

US5394473A	1995-02-28
US4658426A	1987-04-14

Attorney, Agent or Firm:

Hedberg, Åke (Aros Patent AB P.O. Box 1544 Uppsala, SE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1.

A digital filter design method for noise suppression by spectral subtrac tion, including the steps of determining a high frequencyresolution noisy speech power spectral density estimate from a noisy speech signal block, forming a piecewise constant noisy speech power spectral density esti mate by averaging power densities within frequency bin segments of said noisy speech high frequencyresolution power spectral density estimate, determining a high frequencyresolution background noise power spectral density estimate from a background noise signal block, forming a piecewise constant background noise power spectral density estimate by averaging power densities within frequency bin segments of said high frequencyresolution background noise power spectral density estimate, determining a piecewise constant digital filter transfer function using spectral subtraction based on said piecewise constant noisy speech power spectral density estimate and said piecewise constant background noise power spectral density estimate, characterized by adapting the length of individual segments to the shape of said high fre quencyresolution noisy speech power spectral density estimate; and using the same segments for both said high frequencyresolution noisy speech power spectral density estimate and said high frequencyresolution background noise power spectral density estimate.

2.	The method of claim 1, characterized by centering segments on local maxima of said high frequencyresolution noisy speech power spectral density estimate.

3.	The method of claim 1 or 2, characterized by increasing said segment length for high frequencies in accordance with the human auditory system.

4.

A digital filter design method for nonlinear echo cancellation, including the steps of determining a high frequencyresolution speech power spectral density estimate from a residual echo containing speech signal block, forming a piecewise constant speech power spectral density estimate by averaging power densities within frequency bin segments of said speech high frequencyresolution power spectral density estimate, determining a high frequencyresolution residual echo power spectral density estimate from an echo signal block, forming a piecewise constant residual echo power spectral density es timate by averaging power densities within frequency bin segments of said high frequencyresolution residual echo power spectral density estimate, determining a piecewise constant digital filter transfer function using said piecewise constant speech power spectral density estimate and said piecewise constant residual echo power spectral density estimate, character ized by adapting the length of individual segments to the shape of said high fre quencyresolution speech power spectral density estimate; and using the same segments for both said high frequencyresolution speech power spectral density estimate and said high frequencyresolution residual echo power spectral density estimate.

5.	The method of claim 4, characterized by centering segments on local maxima of said high frequencyresolution speech power spectral density estimate.

6.	The method of claim 4 or 5, characterized by increasing said segment length for high frequencies in accordance with the human auditory system.

7.

A digital filter design method, including the steps of determining a high frequencyresolution power spectral density estimate from an input signal block, forming a piecewise constant power spectral density estimate by aver aging power densities within frequency bin segments of said high frequency resolution power spectral density estimate, determining a piecewise constant digital filter transfer functions using said piecewise constant power spectral density estimate, characterized by adapting the length of individual segments to the shape of said high fre quencyresolution power spectral density estimate.

8.	The method of claim 1, characterized by centering segments on local maxima of said high frequencyresolution power spectral density estimates.

9.

A digital filter design apparatus for noise suppression by spectral subtraction, including means for determining a high frequencyresolution noisy speech power spectral density estimate from a noisy speech signal block, means for forming a piecewise constant noisy speech power spectral density estimate by averaging power densities within frequency bin segments of said noisy speech high frequencyresolution power spectral density esti mate, means for determining a high frequencyresolution background noise power spectral density estimate from a background noise signal block, means for forming a piecewise constant background noise power spec tral density estimate by averaging power densities within frequency bin segments of said high frequencyresolution background noise power spectral density estimate, means for determining a piecewise constant digital filter transfer func tion using spectral subtraction based on said piecewise constant noisy speech power spectral density estimate and said piecewise constant back ground noise power spectral density estimate, characterized by means (14,16,18) for adapting the length of individual segments to the shape of said high frequencyresolution noisy speech power spectral density estimate; and means (18) for using the same segments for both said high frequency resolution noisy speech power spectral density estimate and said high fre quencyresolution background noise power spectral density estimate.

10.	The apparatus of claim 9, characterized by means (18) for centering segments on local maxima of said high frequencyresolution noisy speech power spectral density estimate.

11.	The apparatus of claim 9 or 10, characterized by means (18) for increasing said segment length for high frequencies in accordance with the human auditory system.

12.

A digital filter design apparatus for nonlinear echo cancellation, including means for determining a high frequencyresolution speech power spectral density estimate from a residual echo containing speech signal block, means for forming a piecewise constant speech power spectral density estimate by averaging power densities within frequency bin segments of said speech high frequencyresolution power spectral density estimate, means for determining a high frequencyresolution residual echo power spectral density estimate from an echo signal block, means for forming a piecewise constant residual echo power spectral density estimate by averaging power densities within frequency bin segments of said high frequencyresolution residual echo power spectral density esti mate, means for determining a piecewise constant digital filter transfer func tion using said piecewise constant speech power spectral density estimate and said piecewise constant residual echo power spectral density estimate, characterized by means (14,16,18) for adapting the length of individual segments to the shape of said high frequencyresolution speech power spectral density esti mate; and means (18) for using the same segments for both said high frequency resolution speech power spectral density estimate and said high frequency resolution residual echo power spectral density estimate.

13.	The apparatus of claim 12, characterized by means (18) for centering segments on local maxima of said high frequencyresolution speech power spectral density estimate.

14.	The apparatus of claim 12 or 13, characterized by means (20) for increasing said segment length for high frequencies in accordance with the human auditory system.

15.

A digital filter design apparatus, including means for determining a high frequencyresolution power spectral density estimate from an input signal block, means for forming a piecewise constant power spectral density estimate by averaging power densities within frequency bin segments of said high frequencyresolution power spectral density estimate, means for determining a piecewise constant digital filter transfer func tions using said piecewise constant power spectral density estimate, charac terized by means (14,16,18) for adapting the length of individual segments to the shape of said high frequencyresolution power spectral density estimate.

16.	The apparatus of claim 15, characterized by means (18) for centering segments on local maxima of said high frequencyresolution power spectral density estimates.

Description:

A DIGITAL FILTER DESIGN METHOD AND APPARTUS FOR NOISE SUPPRESSION BY SPECTRAL SUBTRACTION TECHNICAL FIELD The present invention relates to a digital filter design, and especially to filter design in the frequency domain.

BACKGROUND There are several applications in which digital filters H [k] are designed"on the fly"in the frequency domain. One example is noise suppression using spectral subtraction. Another example is design of frequency selective non- linear processors for echo cancellation. A characteristic feature of such applications is that the filter design method is quite complex. Since the filters are updated frequently this puts a heavy burden on the hard- ware/software that implements these design algorithms.

Reference [1] describes a method that divides the frequency domain k into segments of equal or unequal length, and uses a constant value in each segment for H [k] and the underlying power spectral density estimates xfkj of the noisy or echo contaminated speech signal. This reduces the complex- ity, since the filter H [k] only has to be determined for the frequency segments and not for each frequency bin k. However, this method also has the draw- back that it may split a peak of H [k] into two different segments. This may lead to fluctuating peaks, which produces annoying"music noise". It also reduces spectral sharpness, which further reduces speech quality.

SUMMARY An object of the present invention is to reduce or eliminate these drawbacks of the prior art.

This object is achieved in accordance with the attached claims.

Briefly, the present invention dynamically adapts the segment lengths and positions to the current shape of the power spectrum of the speech signal.

The peaks and valleys of the spectrum are determined, and the method makes sure that peaks are not split between different segments when the segments are distributed over the frequency domain. Preferably each peak is covered by a segment centered on the peak. The segment length is preferably controlled by the frequency characteristics of the human auditory system.

This method has the advantage of reducing the complexity of the filter calculation without sacrificing accuracy at the important spectrum peaks.

Furthermore, the method also reduces the variance of the spectrum from frame to frame, which improves speech quality.

BRIEF DESCRIPTION OF THE DRAWINGS The invention, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which: Fig. 1 is a diagram illustrating the power spectral density estimate of a noisy speech signal; Fig. 2 is a diagram illustrating the power spectral density estimate in fig.

1 after segmentation and averaging in accordance with the prior art; Fig. 3 is a diagram similar to the diagram in fig. 1; Fig. 4 is a diagram illustrating the power spectral density estimate in fig.

1 after segmentation and averaging in accordance with the present invention; Fig. 5 is a flow chart illustrating an exemplary embodiment of the method in accordance with the present invention; and Fig. 6 is a block diagram illustrating an exemplary embodiment of a filter design and filtering apparatus in accordance with the present invention.

DETAILED DESCRIPTION In some applications the filter is determined in the frequency domain. For example, in telephony applications noise suppression based on spectral subtraction is often used (see [2,3]). In this case the filter is determined as a function H (co) of frequency: where α, ß # are constants and (D) and 6) are estimates of the power spectral density of the pure noise and noisy speech, respectively. This expression is obtained from the model: x [n] = y [n] + v [n] where v [n] is the noise signal, x [n] is the noisy speech signal and y [n] is the desired signal. An estimate of the desired signal y [n] is obtained by applying the filter represented by H (co) to the noisy signal x [n].

Another example of an application in which the filter is determined in the frequency domain is a frequency selective non-linear processor for echo cancellation. In this case the filter is defined by the function: f(#x(#),#e(#))H(#)= where q>x (#) represent an estimate of the power spectral density of a signal x [n] contaminated by residual echo and Cg (#) represents represents estimate estimate the power spectral density of the residual echo signal e [n]. An example of a suitable function f is the function above for noise suppression with #x(#)

now representing the power spectral density estimate of the residual echo contaminated signal and 4), () being replaced by the residual echo power spectral density estimate The filter H (co) is based on the model: x [n] = y [n] + e [n] where y [n] is the desired signal. An estimate of the desired signal y [n] is obtained by applying the filter represented by H (co) to the residual echo contaminated signal x [n].

In the above examples the filter is described by a real-valued continuous- frequency transfer function H (o). This function is sampled to obtain a discrete- frequency transfer function H [k]. This step is typically performed when the estimates are based on parametric estimation methods. However, it is also possible to obtain the discrete-frequency transfer function H [k] directly, for example by using periodogram based estimation methods. An advantage of parametric estimation methods is that the estimates typically have lower variance from frame to frame than estimates from periodogram based meth- ods.

The following description will be restricted to noise suppression, but it is appreciated that the same principles may also be used in other applications, such as echo cancellation.

The discrete-frequency power spectral estimates C [k], (D, [k] are initially known with high frequency resolution, typically 128 or 256 frequency bins.

Fig. 1 is a diagram illustrating the power spectral density estimate (,, [k] of a noisy speech signal, in this case with 256 frequency bins. This spectrum is obtained from a parametric estimation method (typically an auto-regressive model, for example of order 10). If the filter transfer function H [k] is to be determined with the same resolution, this will put a heavy computational

burden on the hardware/software that implements the noise suppression algorithm described above. In accordance with [1] the frequency range is therefore divided into constant length segments, and an average of Ox [k] is formed within each segment, as illustrated in fig. 2. This average 4) X [segment] is used instead of Ox [k] in the computation of H [k]. A similarly segmented and averaged estimate v [segment] is used instead of (^D, [k]. In this way a single value of H [k], denoted H [segment] and defined by a R H [segment] = 1-, 5- q) [segment] may be used for an entire segment and not just for a value k. A drawback of this method is that, due to the constant length and location of the segments, the important peaks of the spectrum may be split between several segments, as illustrated at MAX2 and MAX3 in fig. 2. This leads to a poor resolution of these peaks. Furthermore, since the peaks may shift in position from speech frame to speech frame, they are sometimes split and sometimes not, which leads to very annoying"musical noise".

As illustrated by the exemplary algorithm presented below, the present invention solves this problem by dynamically adapting the length and position of the segments to the shape of the current estimate Ox [k]. Briefly, the algorithm starts by finding local maxima (peak positions) and local minima (valley positions) of x [k]. It then centers a segment on each maximum and distributes segments between the peaks to cover the valleys.

When locating maxima and minima, a parametric spectrum estimate based on auto-regression is especially attractive, since such a spectrum is guaran- teed to have at most M/2 peaks, where M is the model order.

Preferably the length of individual segments is adapted to the properties of the human auditory system, which has been studied in [4]. Based on this information the following relation between segment center fc and segment length may be obtained (for a frequency range of 256 bins and a sampling frequency of 8000 Hz, which gives a frequency resolution of 31.25 Hz/bin): 93 Hz (3 bins), 0<906Hz (5bins),906<fc#1417Hz155Hz (7bins),1417<fc#1812Hz218Hz 281 Hz (9 bins), 1812 < fc # 2250 Hz segment(fc)=# 343 Hz 2250<fc#2593Hzbins), (13bins),2593<fc#2937Hz406Hz 468 Hz (15 bins), 2937 < fC < 3250 Hz 531 Hz (17 bins), 3250 < fC < 4000 Hz A conversion to the discrete frequency domain k gives: 3 bins, kc#29# 5 bins, kc#45< 7 bins, 45 < kc < 58 58<kc#729bins, #segment(kc)= 72<kc#8311bins, 13 bins, 83<kC <94 15 bins, 94 < 104 17 bins, kc#128< Using this relation, the following algorithm, which is also illustrated in fig. 5, may be used to determine a segmented and averaged filter transfer function H [k] with dynamically determined segment lengths and positions:

S 1: Get next signal block of x [n] S2: Determine x [k] of signal block S3: Determine local maxima and minima of 4) x [k] S4: Set kmax to k-value of first maximum S5: Set kmin to k-value of first minimum S6: Set kc=kmax S7: Determine average of Ox [k] in segment (kc) centered on kc Determine average of (b, [k] in the same segment Determine H [segment] using averaged C>x [k] and #v[k] S8: If kc-segment (kc)/2>kmin, then perform S9-S10 S9: Set kc=kc-segment (kc) (the old value kc is used for segment (kc)) S10: Determine average of #x[k] in segment (kc) centered on kc Determine average of Ov [k] in the same segment Determine H [segment] using averaged #x[k] and 4) [k] Go to S8 S11: kc=kmax S12: Set kmin to k-value of next minimum S13: If kc+segment(kc)/2<kmin then perform S14-S15 S14: Set kc=kc-segment (kc) (the old value kc is used for segment (kc)) S15: Determine average of 4), [k] in segment (kc) centered on kc Determine average of (D, [k] in the same segment Determine H [segment] using averaged Ox [k] and Cy [k] Go to S13 S16: If kmax is the last maximum, go to S1 S17: Set kmax to k-value of next maximum and go to S6 Using this algorithm on the spectrum in fig. 1 produces the segmented and averaged spectrum in fig. 4. As an illustration, in fig. 4 the local maxima are located at:

MAX1: k= 20 MAX2: k = 41 MAX3: k= 73 and the local minima are located at: MIN 1: k= 0 MIN2: k= 31 MIN3: k= 61 MIN4: k=128 Applying the algorithm above to the second maximum at k=41, for example, gives a 5 bin segment centered on k=41, two 5 bin segments to the left of the maximum and two 7 bin segments to the right of the maximum. When comparing fig. 2 to fig. 1, it is noted that segments covering a peak are always centered on the peak. Furthermore, it is noted that lower frequencies result in shorter segments. As noted above, if the peaks change position from frame to frame, the algorithm will guarantee that the segments are still centered on the peaks and that the segment width is adapted to the location of the peaks.

Fig. 6 is a block diagram illustrating an exemplary embodiment of a filter design apparatus in accordance with the present invention, in this case used for noise suppression by spectral subtraction. A stream of noisy speech samples x [n] are forwarded to a buffer 10, which collects a block or frame of samples. A spectrum estimator 12 finds the AR parameters of this block and uses these parameters to determine the power spectral density estimate (D, [k] of the current block of the noisy speech signal x [n]. Typically this estimate has 128 or 256 samples. A max-min detector 14 searches the estimate for local maxima and minima. The locations of the local maxima and minima are forwarded to a segment distributor 18 that distributes the

segments in accordance with the method described with reference to fig. 5.

The segment locations and lengths are forwarded to an averager 20. Averager 20 receives the samples of estimate (D,, [k] and forms the average in each specified segment. During a block without speech, a block of background noise signal v [n] is collected in a buffer 22. A spectrum estimator 24 finds the AR parameters of this block and uses these parameters to determine the power spectral density estimate 4), [k] of the block of the background noise signal v [n]. This estimate has the same number of samples as estimate q),, [k]. The segment locations and lengths from segment distributor 18 are also forwarded to another averager 26. Averager 26 receives the samples of estimate 0, [k] and forms the average in each specified segment. Both averagers 20,26 forward the average values in each segment to a filter calculator 28, which determines a value of the filter transfer function for each segment. This produces a segmented filter H [k] represented by block 30. This filter is forwarded to an input of a multiplier 32. The signal block in buffer 10 is also forwarded to a Fast Fourier Transform (FFT) block 34, which transforms the block to the frequency domain. The length of the transformed block is the same as the length of the segmented filter. The transformed signal is forwarded to another input of multiplier 32, where it is multiplied by the segmented filter. Finally, the filtered signal is transformed back to the time domain in an Inverse Fast Fourier Transform (IFFT) block 36.

Typically the different blocks in fig. 6 are implemented by one or several micro processors or micro/signal processor combinations. They may, however, also be implemented by one or several ASICs (application specific integrated circuits).

A similar structure may be used for non-linear filtering in echo cancellation.

In this case x [n] represents the residual echo contaminated signal and v [n] is replaced by an estimate of the residual echo e [n]. Another difference is that

in this case the estimates Ox [k] and (I) e [k] are from the same speech frame (in noise suppression by spectral subtraction the noise spectrum is consid- ered stationary and estimated during speech pauses)..

It will be understood by those skilled in the art that various modifications and changes may be made to the present invention without departure from the scope thereof, which is defined by the appended claims.

REFERENCES [1] U. S. Patent No. 5,839,101 (A. Vahitalo et al).

[2] J. S. Lim and A. V. Oppenheim,"Enhancement and bandwidth com- pression of noisy speech", Proc. of the IEEE, Vol. 67, No. 12,1979, pp.

1586-1604.

[3] S. F. Boll,"Suppression of acoustic noise in speech using spectral subtraction", IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. ASSP-27, No. 2,1979, pp. 113-120.

[4] U. Zölser,"Digital audio signal processing", John Wiley & Sons, Chichester, U. K., 1997, pp. 252-253.

Previous Patent: VARIABLE AMPLITUDE EQUALIZER

Next Patent: FLIP-FLOP CIRCUIT, AND METHOD OF HOLDING AND SYNCHRONIZING DATA USING CLOCK SIGNAL