METHOD AND DEVICE AT STEREO ACOUSTIC ECHO CANCELLATION

Title:

METHOD AND DEVICE AT STEREO ACOUSTIC ECHO CANCELLATION

Document Type and Number:

WIPO Patent Application WO/1999/022460

Kind Code:

A2

Abstract:

The present invention relates to a method and device at stereo acoustic echo cancellation. Acoustic echo cancellation in stereo is considerably more difficult than echo cancellation in mono, due to strong correlation between the stereo channels. This invention is based on utilization of a perceptual audio coder to reduce the correlation between the stereo channels, without introducing audible distorsion. This will result in that the stereo canceller converges towards the correct echo paths and therefore gives a more stable echo cancellation which is not dependent on the transmission room (far-end). The core of the invention is that one can reduce the correlation in excess of that which is made by the audio coder, by modifying its decoder. Extra, uncorrelated (between the channels) noise is added (in the decoder) to such an extent that it is not audible, by information provided from the coder being used in combination with an estimated perceptual masking threshold. The solution consequently is flexible and does not require that the coding standard which is used is changed in any way. Only a small number of operations need to be included in the decoder.

See also references of EP 1031191A2

Attorney, Agent or Firm:

Pragsten, Rolf (Telia Research AB Vitsandsgatan 9 Farsta, SE)

Download PDF:

View/Download PDF PDF Help

Claims:

PATENT CLAIMS

1.

Method at stereo acoustic echo cancellation, where the echo is created on a connection for transmission of a stereo acoustic signal, which signal is coded on the transmitter side (F), and decoded on the receiver side (N), c h a r a c t e r i z e d in that a perceptual audio coding is introduced, that side information in the coded signal is utilized, and that the echo can be identified and cancelled.

2.	Method according to patent claim 1, c h a r a c t e r i z e d in that the perceptual coding is realized by, for instance, MPEGcoding, that the perceptual coding allows that the channel correlation is reduced between the stereo channels.

3.	Method according to patent claim 1 and 2, c h a r a c t e r i z e d in that the perceptual coding with advantage is utilized at frequencies exceeding 2 kHz.

4.	Method according to patent claim 1, c h a r a c t e r i z e d in that the side information preferably is utilized at frequencies up to 2 kHz.

5.	Method according to patent claim 1 and 4, c h a r a c t e r i z e d in that, for respective sub band in the signal, the utilization of the signal, and which quantizer that is used at the coding, is indicated.

6.	Method according to patent claim 1,4 and 5, c h a r a c t e r i z e d in that the quantizer is selected at the coding on basis of an analysed segment of the signal, and that a masking threshold, indicating not audible distorsion levels within the segment, is selected.

7.	Method according to patent claim 1,4,5 and 6, c h a r a c t e r i z e d in that the selection of the masking threshold is made so that a margin to a just noticeable distorsion is attained, and in that uncorrelated noise between the channels is added to the margin.

8.

Device at stereo acoustic echo cancellation, where a signal is registered by a sound registering equipment on the transmitter side, F, and the signal is coded in a coder (C) and transmitted on a connection to a decoder (D) on the receiver side (N), c h a r a c t e r i z e d in that the coder (C) is arranged to perform a perceptual coding of the signal, that side information in the coded signal is utilized, and that a stereo acoustic echo canceller is arranged to identify the echo and reduce it.

9.	Device according to patent claim 8, c h a r a c t e r i z e d in that the perceptual coding is performed in the coder (C), and that, for instance, MPEGcoding is utilized for reduction of channel correlation between channels.

10.	Device according to patent claim 8, c h a r a c t e r i z e d in that the stereo acoustic echo canceller (AEC) is arranged to analyse segment of the signal to appoint a masking threshold defining unaudible distorsion levels within the segment. ll.

11.	Device according to patent claim 8 or 10, c h a r a c t e r i z e d in that the coder (C) is arranged to select the quantizer.

12.

Device according to patent claim 8,9,10, or 11, c h a r a c t e r i z e d in that the stereo acoustic echo canceller (AEC) is arranged to select the masking threshold so that a margin to a just noticeable distorsion is attained, that the stereo acoustic echo canceller (AEC) is arranged to add an uncorrelated noise between the channels, to the signal.

Description:

TITLE OF THE INVENTION: METHOD AND DEVICE AT STEREO ACCOUSTIC ECHO CANCELLATION TECHNICAL FIELD The present invention relates to echo cancellation in combination with signal coding.

TECHNICAL PROBLEM Acoustic echo cancellation in stereo channels is a more difficult problem than corresponding mono case. This is due to the fact that each channel carries similar speech signals, which results in problems for the adaptive algorithm that is used. The fields of application for stereo cancellation is/is expected to be/high-quality video conference systems and the field of tele-games. These fields, however, have different demands on quality, bandwidth etc.

In the mono case, NLMS (Normalized Least Mean Square Algorithm) is used exclusively, due to its robustness against noise and signal variations (none-stationariness).

The disadvantage of this algorithm is that it has a convergence which is dependent on the spectral characteristics of incoming signals (far end). A strong (in time) auto correlated signal gives slow convergence and vice versa. In the stereo case, the speech signal is correlated in time, but also between respective channels which slows down the convergence speed for NLMS to such an extent that it will be useless. Echo cancellation then must be performed with some other kind of algorithm than NLMS.

Essentially there are two types of algorithms to chose from, sub-band algorithms, or full length RLS (Recursive Least Square). These two of course have different advantages and disadvantages at implementation. The channel correlation also results in that there is no theoretical

estimate of echo paths which the echo canceller converges towards, but a lot of solutions which all are dependent on the transmitter room (far end), Figure 1. This results in an unstable echo cancellation, and the echo canceller diverges with irregular intervals. To make the echo canceller converge in a stable way towards the correct echo paths, the stereo signals have to be modifed before they reach the echo canceller as reference signals.

Stereo cancellation includes the following complex of problems: * The echo paths wl (n), w2 (n), Figure 1, in the near end, N, which shall be estimated by AEC is not uniquely indentifiable from measure data.

* The echo cancellation of the canceller is dependent on the variability of the channels, gl (n), g2 (n) in the far end, F.

Assume that the signals from the microphone of the far end is given by, Figure 3, xi (n) = gi (n) *s (n), i=1,2 where s (n) is the source signal and gi (n), i=1,2 is the echo paths of the far end with the length M,"*"describes convolution.

The residual echo/echoes after the echo canceller is

hiN, i=1,2 is the real response of the length N from the near end, and hi, L=1,2 is the estimated response of the lenght L.

Minimization of the weighted least square criteria results in the solution of the linear equation system where rXx (n) is the estimated cross correlation vector, and Rxx (n) is the correlation matrix, The problem at stereophonic echo cancellation is the conditional number for this matrix. Further has been shown L#M # Rxx(n) is singular n L<M => R, (n) is poorly conditioned<BR> misalignment#(n)#0,n##L#N# L<N # misalignment E (n)-+0-+8n where the misalignment is A poorly conditioned Rxx (n) increases the misalignment.

Consequently there is a contradiction in the solution if

L « M is better conditioned, on the other hand the misalignment is reduced if L ? N, but practically is L<M=N.

The solution of this misalignment is to reduce the correlation between the stereo channels.

The eigenvalues of the correlation matrix can be limited in downward direction by I where y (f) is the coherence between the stereo channels. Misalignment therefore can be measured with the coherence function, which then serves as a measure of achieved decorrelation.

The present inventions therefore are intended to solve the above mentioned problems.

PRIOR ART Two important applications for stereo acoustic echo cancellation is high-quality video conference and tele games. In the future, also desk-top based conference systems will have a need for stereo acoustic echo cancellers (AEC). These systems have different demands on bandwidth, bit rate etc.

Stereo acoustic echo cancelling, however, has turned out to be more complicated than the mono channel case. This is due to that, in the two channel case, the signals are linearly depending, which results in convergence problems for the echo canceller. Because of the linear dependence between the channels, there are theoretically no unique solutions for the echo canceller to identify. Furthermore, all not unique solutions are depending on the echo paths at the far end of the connection, F, (far end). In real situations, however, the solution is not singular, but only poorly conditioned due to uncorrelated microphone noise and infinitely long impulse responses on the echo paths of the far end. The convergence degree of the NMLS-algorithm is to

a great extent depending on the number of the system conditions, so more sophisticated algorithms are needed at stereo acoustic echo cancellation.

Beside the utilization of more sophisticated algorithms, problems remain with unstable estimates of the echo paths.

In order to stabilize the solution, the correlation between the stereo channels must be reduced without introduction of distorting distorsion. Different solutions to solve this have been presented, but have been rejected for different reasons (see for instance M. M. Sondhi, D. R. Morgan and J. L.

Gall:"Stereophonic acoustic cancellation-an overview of the fundamental Problem" ; IEEE Signal Processing Letters, 2 (8): 148-151,1955). The most promising solution at present is to distort the stereo channels non-linearly (for instance J. Benesty, R. Morgan and M. M. Sondhi:"A better understanding and an improved solution to the problem of stereophonic acoustic echo cancellation". IEE Trans. On Speech and Audio Processing. To appear; A short version can be found in Proc. of ICASSP 1997 pp 303-306) where half- wave rectified parts of the signal are added to the signal itself. This distorsion does not destroy the stereophonic perception, but introduces noise which for the most part is inaudible, but can be registered depending on the extent of non-linearity.

At transmission of acoustic signals between parties in, for instance, a telecommunication, a certain part of the own sound is brought back and creates an echo. In most cases one wants to have this echo at least reduced to a level which is not disturbing. This is achieved by means of a so called echo canceller. The principle for these is that a part of the own signal is identified and subtracted from the received signal. It consequently is known to utilize echo cancellation in the mono case. At this previously known principes are utilized which i. a. are described in

the patent literature, for instance US 5668865, US 5664011, US 5610909. In the patent documents US 5661813, US 574545, US 5323459, US 5369554, US 5555310 and US 5513265, the problems of stereo acoustic echo cancellation are dealt with more specifically.

THE SOLUTION The present invention relates to a method at stereo acoustic echo cancellation, where the echo is created on a connection for transmission of a stereo acoustic signal.

The signal is coded on the transmitter side, F, and decoded on the receiver side, N. A perceptual audio coding is introduced. By perceptual coding is meant that the signal can consist of different frequencies which are transmitted at the same time, there one of these signals dominates over the other but gives no additional contribution to the received information. Furthermore, the side information of the coded signal is utilized. The echo after that can be identified and cancelled. By utilization of, for instance, MPEG-coding, perceptual coding which allows that the channel correlation between the stereo channels is reduced, is achieved. At frequencies over 2 kHz the perceptual coding is advantageous. Below 2 kHz, the side information can be utilized to further reduce the correlation. Each sub-band into which the signal is divided, indicates the utilization of the signal and the quantizer which is used at the coding. Quantizer is selected at the coding at which at an analysed segment of the signal is utilized. Further, a masking threshold is appointed which defines distorsion levels which cannot be heard within the segment. The masking threshold is selected so that a just noticeable distorsion is attained. Uncorrelated noise between the channels are added to the margin in the decoder, at which an improved echo cancellation can be achieved.

WO 99/22460

The invention further relates to a device at stereo acoustic echo cancellation. A sound registering equipment on the transmitter side, F, registers the signal which is coded in a coder, C, and transmitted to a decoder, D, on the receiver side, N. In the coder, C, a perceptual coding of the signal is performed. Side information in the coded signal is further utilized. For identification of the echo and the cancellation of this, a stereo acoustic echo cancellator, AEC, is utilized. Perceptual coding is performed in the coder, C, by utilization of, for instance, MPEG-coding for reduction of the channel correlation between the channels. The decoder analyses segment of the signal for deciding a masking threshold which defines inaudible distorsion levels within the segment. The coder, C, further selects quantizer, dq. Appointment of masking threshold is made in the decoder in such a way that a margin to just noticeable distorsion is attained.

Uncorrelated noise, between the channels, is added by the decoder to the signal.

ADVANTAGES The invention makes possible that methods for cancellation of echoes, on connections over which stereo transmissions are made, are executed. The introduction of the invention is possible without addition of extra equipment, which may be expensive. By utilization of perceptual coders/decoders a possibility is given to implement the solution on the decoder side, without the coder having need for knowing this. The solution further has the advantage that a good conditioning is attained, without introducing distorison, which can interfere with the communication.

DESCRIPTION OF FIGURES Figure 1 illustrates microphone and loudspeaker near-end, N, respective far-end, F. Within the frame with the broken

line is the acoustic echo canceller (AEC), the stereo case.

Only one of the back channels is shown.

Figure 2 illustrates the far-end room, stereo acoustic echo canceller, AEC, (stereo AEC) and perceptual audio coder, C/D (coder/decoder).

Figure 3 illustrates an MPEG-1 layer III decoder. The following designations have been used: pi: PCM input af: Filter bank analysis md: MDCT sq: Scaling device and Quantizer hc: Huffman coding mp: Multiplexer dm: Demultiplexer hd: Huffman decoding dd: Dequantizer and descaling device im: Inverse MDCT sfb: Synthesis Filter Bank po: PCM output dt: decide masking thresholds si: side information di: Decoding of side information b: MPEG layer III bit stream Figure 4 illustras the masking threshold. The dotted areas are masked by the tone. The sound pressure is indicated in dB. The frequency is indicated on a log scale. DN signifies decorrelating noise level. Q signifies quantizing noise level.

PREFERRED EMBODIMENT In the following the invention is described on basis of the figures and the terms in them. Acoustic echo cancellation in stereo is considerably more difficult than echo

cancellation in mono, due to strong correlation between the stereo channels.

This invention is based on utilizing a perceptual audio coder to reduce the correlation between the stereo channels without introducing audible distorsion. This will result in that the stereo canceller converges towards the correct echo paths and therefore gives a more stable echo cancellation which is not depending on the transmission room (far-end). The core of the invention is that one can reduce the correlation beyond that which the audio coder gives, by modifying its decoder. Extra uncorrelated (between the channels) noise is added (in the decoder) to such an extent that it is not audible, by the information from the coder being used in combination with an estimated perceptual masking threshold.

The solution consequently is flexible and does not require that the used coding standard is changed in any way. Only a small number of operations need to be included in the decoder.

The invention is based on that the distorsion is introduced as noise addition to the speech signal without interfering with this. Further, the qualities of the speech/audio coder (for instance MPEG-coder) which is on the transmission channel, C/D, between the near-end and the far-end, is utilized. For the purpose, a perceptual audio coder, which introduces the effect that the channel correlation is reduced between the stereo channels, is utilized. The coherence will go down below 0,95 for frequencies over 2 kHz with the MPEG Layer III coder. A coherence below 0,95 is aimed at, to condition the solution which the echo canceller shall find. At frequencies below 2 kHz the coherence still is high, so further modification of the signal is necessary in the range below 2 kHz. For this

purpose, side information which is in the coded signal is utilized, without disturbing distorsion being introduced.

Within each sub-band of the signal which is decoded, the utilization of the signal is indicated, and which quantizer that the coder has utilized. The coder selects quantizer on basis of the amount of energy that is in the analysed segment of speech (or audio signal), and the so called masking threshold which indicates not audible distorsion levels in the segment. Selection is made with knowledge of that there often is a margin to the just noticeable distorsion level. The margin left is utilized by uncorrelated noise between the channels being added to the signal. By this measure, a coherence reduction is attained to find stable unique estimates of the echo paths in the near-end, N.

The most advanced part in the PMEG-1 standard is layer III, which typically compresses stereo sound up to 12 times without significant loss of quality of the sound. It is included in the standards such as H. 310 audiovisual, broadband communications system, and H. 323 visual telephone systems and equipmnnt for local networks. Layer III coders usually also are utilized as high quality coders in World Wide Web (WWW).

The high compression is possible by removing parts in the signal which are not audible, or are lacking information for the ear. At simultaneous masking, larger frequency components will screen off the smaller ones in nearby frequency bands, whereas at temporary masking, i. e. components just before or after (in the time domain), a big sound component is screened off. The audio coder estimates the global masking threshold, the just noticeable distorsion, as a function of frequency and time segment.

The sound decoder operates parallell with the global algorithm for estimation of the masking. The signal of the sound source in divided into 32 critically sampled down bandpass signals in a filter bank. In layer III the frequency selection is increased by each band pass signal being worked upon with a discrete cosinus transform (MDCT).

The lenght of the MDCT-window is signal dependent and is either 6 or 18, where the shorter window is utilized for transients in the sound source. The MDCT-components are scaled and quantized after the decompression. The key for noticing coder is that sufficient number of quantizing levels in each sub-band exist for keping the introduced quantizing noise below the global masking threshold. The data redundancy is reduced by utilizing Huffman coding on the signal before it is transmitted in the channel.

When two signals are not identical, the introduced quantization noise in the two channels is almost independent. This will result in that the correlation between the channels is reduced. Decoding is essentially performed in the same way as the coding, but just the reverse.

The correlation beween the channels are reduced even more if independent noise is added to the channels. Each of the DCT-bands cannot be optimally quantized due to big overhead. They are instead divided into five ranges with a defined number of quantizing levels. Define noise to mask relation (QMR) as the difference between the level of the quantizing noise and the level which is just audible in a given MDCT-band. After that, noise which is not audible can be added to the MDCT where QMR is positive. In the frequency ranges where the channel correlation need to be reduced to fulfil QMR (j) >0 > Xmdct Xmdc, +f (Q (j))

QMR(j)##jmdct=Xjmdct0 where X, is the MDCT-component in band j and f (.) amplifies the noise component v which is added. A block implementing this channel decorrelation is added to the decoder just before the inverting of MDCT.

The global masking information is not accessible in the decoder, but thanks to the high frequency solution of MDCT, a global masking estimate, the calculation complexity of which is simplified, is produced. Independent noise after that is added before the inverse MDCT in the MDCT- components which have sufficiently high SMR.

The invention is not restricted to the in the above described example of embodiment, or to the following patent claims, but may be subject to modifications within the frame of the idea of the invention.

Previous Patent: PROTECTION SWITCHING SYSTEM

Next Patent: METHOD AND RECEIVER FOR TRANSMITTING DATA IN A COMMUNICATION SYSTEM

WO/1992/002083	PROGRAMMABLE DIGITAL ACQUISITION AND TRACKING CONTROLLER
JP2003140697	METHOD AND DEVICE FOR ENCODING AUDIO SIGNAL
WO/2016/113493	SYSTEM FOR MONITORING THE PEAK POWER FOR AN RF POWER AMPLIFICATION AND ASSOCIATED METHOD OF CALCULATING PEAK VALUE AND OF SELECTING SUPPLY VOLTAGE

US5661813A	1997-08-26
US5323459A	1994-06-21
US5513265A	1996-04-30
US5555310A	1996-09-10