METHOD, DEVICES AND SYSTEM FOR GENERATING BACKGROUND NOISE IN A TELECOMMUNICATIONS SYSTEM

Title:

METHOD, DEVICES AND SYSTEM FOR GENERATING BACKGROUND NOISE IN A TELECOMMUNICATIONS SYSTEM

Document Type and Number:

WIPO Patent Application WO/2000/057400

Kind Code:

A1

Abstract:

Method and devices for generating background noise in a telecommunication system comprising the steps of dividing the input signal into frames, computing the coefficients for an inverse filter A(z) for the present frame, wherein A(z) is the z-transform of the inverse filter, and computing auto correlation functions from original candidate vectors and an output signal after the inverse filter and applying a distance measure between transformed candidate vectors and a transformed output signal from an inverse filter. A best candidate vector is selected by means of a distance measure in the ACF domain in order to provide a good match in the autocorrelation domain. The method improves the perceived quality of regenerated background noise, for instance such as a reduction of swirling and fluttering noise caused by a limited set of spectral shapes caused by quantization of the signal spectrum. The cost for said improvement in extra bits sent is very moderate.

More Like This:

JP5006975	Background noise information decoding method and background noise information decoding means
JP2940464	VOICE DECODING DEVICE
WO/2020/148109	AUDIO PROCESSING

Inventors:

JOHANSSON INGEMAR

Application Number:

PCT/SE2000/000531

Publication Date:

September 28, 2000

Filing Date:

March 17, 2000

Export Citation:

Click for automatic bibliography generation Help

Assignee:

ERICSSON TELEFON AB L M (SE)

International Classes:

G10L19/012; G10L19/12; (IPC1-7): G10L19/00

Foreign References:

EP0501420A2

1992-09-02

Other References:

MORIYA T ET AL: "TRANSFORM CODING OF SPEECH USING A WEIGHTED VECTOR QUANTIZER", IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, vol. 6, no. 2, 1 February 1988 (1988-02-01), pages 425 - 431, XP000616836
PSUTKA J: "THE USE OF THE LPC RESIDUAL ERROR AUTOCORRELATION TO PITCH PERIOD EXTRACTION", PROCEEDINGS OF THE EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY (EUROSPEECH), PARIS, SEPT. 26 - 28, 1989, vol. 1, no. CONF. 1, 26 September 1989 (1989-09-26), TUBACH J P;MARIANI J J, pages 609 - 612, XP000209936

Attorney, Agent or Firm:

Mildh, Christer (Ericsson Radio Systems AB Ericsson Research/Patent Support Unit Stockholm, SE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1.

A method for generating comfort noise in digital cellular telecommunication system including a speech coder, wherein the comfort noise is generated at a receiver end, based on a type of environment in which the transmitter end is located, the environment being determined by transforming an input signal to the autocorrelation domain and applying a distance measure between transformed candidate vectors and a transformed output signal obtained from an inverse filter, to select the best candidate vector, characterized in that the method comprises, the steps of dividing the input signal into frames, computing the coefficients for an inverse filter of the z transform for the present frame, and computing auto correlation functions from original candidate vectors and an output signal after the inverse filter and applying a distance measure between transformed candidate vectors and a transformed output signal from an inverse filter.

2.	A method according to claim 1, characterized in that a best candidate vector is selected by means of a distance measure in the autocorrelation function domain in order to provide a good match in the frequency domain.

3.	A method according to claim 2, characterized in that the distance measure is a Minimum Squared Error Criterion.

4.	A method according to any of the claims 13, characterized in that a new set of codebook vectors are chosen from a Pseudo random Noise sequence for each frame, at the coder end and at the decoder end.

5.	A method according to any of the claims 13, characterized in that a fix codebook is used and that the phase information in the candidate vector is scrambled.

6.	A method according to any of the claims 13, characterized in that the autocorrelation vectors are modifie with coefficients of a perceptual weighting filter.

7.

A device for generating comfort noise in a digital cellular telecommunication system including a speech coder, wherein comfort noise is generated at a receiver end, based on a type of environment in which a transmitter end is located, the environment being determined by transforming an input signal to a autocorrelation domain and applying a distance measure between transformed candidate vectors and an transformed output signal obtained from an inverse filter, characterized by comprising: means for dividing the input signal into frames, means for computing the coefficients for an inverse filter for the present frame, and means for computing auto correlation functions from original candidate vectors and an output signal obtained from the inverse filter and applying a distance measure between the transformed candidate vectors and a transformed output signal from an inverse filter.

8.	A device according to claim 7, characterized by means for selecting a best candidate vector through means for computing a distance measure in the autocorrelation function domain in order to provide a good match in the frequency domain.

9.	A device according to claim 8, characterized in that the distance measure is a Minimum Squared Error Criterion.

10.	A device according to claim 79, characterized by means for randomly generating a new set of codebook vectors for each frame, at the coder end and at the decoder end.

11.	A device according to claim 79, characterized in that a fix codebook is used and means for scrambling the phase information in the candidate vector.

12.	A device according to claim 79, characterized in the autocorrelation vectors are modified with coefficients of a perceptual weighting filter.

13.

A digital cellular telecommunication system including a speech coder, wherein comfort noise is generated at a receiver end, based on the type of environment in which another part is located, the environment being determined by transforming an input signal to a autocorrelation domain and applying a distance measure between transformed candidate vectors and a transformed output signal from an inverse filter, characterized by comprising a device according to one of claims 712.

Description:

METHOD, DEVICES AND SYSTEM FOR GENERATING BACKGROUND NOISE IN A TELECOMMUNICATIONS SYSTEM FIELD OF INVENTION The present invention relates to a method and a device for regenerating a typical background noise in the receiver end of a transmission line.

DESCRIPTION OF RELATED ART In the area of telecommunication it is comforting for the listener to hear background noise when the other party is silent, since otherwise one of the communicating parts may believe that the communication is lost. However, transmitting background noise signals can cost a lot of channel capacity. A way of reducing the channel capacity needed for background noise comprises not sending signals representing the actual background noise, but instead regenerating an old transmitted version thereof at the receiver end, so that signals representing the background noise has to be sent only at intervals, when actually transmitting such signals, of course as few bits as possible should be used. The regenerated background noise is called comfort noise. It may be difficult to regenerate such background noise using very few bits, since the character of the generated noise should have a reasonable resemblance to that of the original noise recorded at the transmitter end. However it is not important to perfectly model the actual noise, since, for instance it may be perfectly good to hear that people are speaking in the background, but one does not need to know exactly what is being said.

Code-excited linear prediction (CELP) is a speech coding technique used to produce high quality synthesized speech. This class of speech coding, also known as vector-excited linear prediction, is used in numerous speech communication and speech

synthesis applications. CELP is particularly applicable to digital speech encrypting and digital radiotelephone communications systems wherein speech quality, data rate, size and cost are significant issues.

In a CELP speech coder, the long-term (pitch) and the short- term (formant) predictors which model the characteristics of the input speech signal are incorporated in a set of time varying filters. Specifically, a long-term and a short-term filter may be used. An excitation signal for the filters is chosen from a codebook of stored innovation sequences, or codevectors.

For each frame of speech, an optimum excitation signal is chosen. The speech coders apply an individual codevector to the filters to generate a reconstructed speech signal. The reconstructed speech signal is compared to the original input signal, creating an error signal. The error signal is then weighted by passing it through a spectral noise weighting filter. The spectral noise weighting filter has a response based on human auditory perception. The optimum excitation signal is a selected codevector that produces the weighted error signal with the minimum energy for the current speech frame.

Speech coders typically use the minimization of the Mean Squared Error (MSE) as a criterion for selecting the parameters of the speech coder. MSE means Mean Squared Error, i. e. the sum of the square. MSE generally describes the square of the distance between two points in a multidimensional space.

In a speech codec for a digital cellular system operating using source controlled variable bitrates, different bitrates are needed for different input signals. The highest bitrate is needed for speech signals while non-speech signals need a lower bit rate in order to be reproduced well.

Coding of background noise should preferably use as low a bit- rate as possible. For spread spectrum systems (e. g. CDMA) a main objective is to reduce the average bit rate and thereby the total system load, and for TDMA systems the objective is a more efficient use of the battery, although system load can also be important.

A common way of encoding background noise is to apply brute quantization on the signal spectrum and the noise energy.

Often the signal spectrum and energy are averaged over several frames. However, this approach seldom gives any information on the kind of environment in which the other speaker is located, when having a conversation since the signal spectrum is averaged.

Another approach is not to average the signal spectrum and energy in order to avoid smearing of the signal spectrum and instead increase the update rate at the cost of fewer bits per update in order to maintain a low average bitrate.

The two estimates are transmitted to the decoder, sometimes at regular intervals or when e. g. the signal spectrum has changed.

The important issue is to consume not too many bits. In the decoder the spectrum and the energy estimates are interpolated in order to try to ensure smooth transmissions. As an excitation source to an STP filter, which normally models the signal spectrum, either white noise is used or randomized versions of fixed and adaptive codebooks are used. The term STP means Short Term Predictor, which is a model of the acoustic characteristics of the oral cavity.

The document US-5692101 having the title"Speech coding vector energy matching method for CELP modifying mean-squared error criterion for speech code parameters and selecting correspon- ding gains according to gain bias factor to minimize total weighted error energy"for GERSON and assigned to MOTOROLA INC

et al discloses a method involving choosing a code vector to represent an input speech vector. A long term predictor coef- ficient and a gain term for the code vector are optimized. A gain bias factor is determined to more closely match the energy of the code vector to the energy of the input speech vector.

The optimal long term predictor coefficient and the optimal gain term are altered using the gain bias factor. The gain bias factor is determined by forming a synthetic excitation signal using the code vector, the optimal long-term predictor and the optimal gain term. The energy of the input speech vector is calculated to form a speech data energy value. The energy of the synthetic excitation signal is calculated to form a synthetic excitation energy value. The ratio of the speech data energy value and the synthetic excitation energy value is determined. The gain bias factor is equal to the square root of the ratio. The method according to US-5692101 may be used for Digital speech encryption in a digital radiotelephone communication system. It provides more natural-sounding speech replication especially at high frequencies without over- emphasis by producing more accurate representation of input speech energy contour. The deficiency of the method according to US-5692101 is that ordinary CELP coders do not function properly together with the low bitrates of comfort noise. The reason is that the waveforms of the input signal is matched with the synthesized waveform, and as only a very limited amount of synthesized waveforms are available, it is very difficult to find an adequate waveform.

SUMMARY OF THE INVENTION The term"comprises/comprising"when used in this specification is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps, components or groups thereof.

The present invention offers a novel method to model background noise, with improved naturalness, by regenerated noise (comfort noise). This is achieved with no or little extra transmission capacity needed.

The herein described method improves the perceived quality of regenerated background noise, for instance such as a reduction of swirling and fluttering noise caused by a limited set of spectral shapes caused by quantization of the signal spectrum.

The cost for said improvement in extra bits sent is very moderate. Furthermore, the application of this invention is not limited to background noise encoding, it may for instance be used when searching a codebook candidate for a second codebook while the first codebook is searched with e. g. a conventional wMSE criterion in the time domain. wMSE denotes weighted Mean Squared Error, i. e. the error signal is weighted so as to take advantage of the masking effects of the human ear.

An advantage of the invention is that waveforms are not matched, the signals being instead matched in the autocorrelation domain. In this way very few bits are required for the codebook index.

Further scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, is given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram of a radiotelephone system.

Fig. 2 is a block diagram illustrating a method for generating digital signals representing background noise.

Fig. 3 is a decoder according to the invention.

The invention will now be described in more detail with reference to preferred exemplifying embodiments thereof and also with reference to the accompanying drawings.

DESCRIPTION OF PREFERRED EMBODIMENTS Fig. 1 is an illustration in block diagram form of a radio communication system 100. The radio communication system 100 includes two transceivers 101,113, which transmit and receive speech data to and from each other. The two transceivers 101, 113 may be part of a truncated radio system or a radio telephone communication system or any other radio communication system which transmits and receives speech data. At a first transceiver 103, the speech signals are input into a microphone 108, and the speech coders 107 select the quantized parameters of the speech model. The codes for the quantized parameters are then transmitted using the transmitter 105 to the other transceiver 113 via a radio channel. At the other transceiver 113, the transmitted codes for the quantized parameters are received by a receiver 121 and used to generate the speech in a speech decoder 123. The regenerated speech is output to a speaker 124. In the same way, at the second transceiver 113, the speech signals are input into a microphone 108, and the speech coder 119 selects the quantized parameters of the speech model. The codes for the quantized parameters are then transmitted using the transmitter 117 to the first transceiver 103 via a radio channel. At the first transceiver 103, the transmitted codes for the quantized parameters are received by a receiver 111 and used to generate speech in a speech decoder 109. The regenerated speech is output to the speaker 110.

A means of reaching a higher fidelity when regenerating a typical background noise in the other end of a transmission line would be to use some kind of codebook to model the signal.

A common error criterion is the wMSE criterion in the time domain, i. e. to compare the candidate codebook vectors with the weighted input signal sample by sample. This works well for speech signals when a high bitrate is used. For background noise however, where the information content of the signal is quite high and very few bits are allowed to be used the above approach is not good enough.

In fig. 2 is shown a simple schematic illustrating the steps of a method for background noise coding. According to the method as disclosed herein the input signal is divided into frames, then, in block 223, the coefficients for an appropriate inverse filter A (z) are computed and quantized for the present frame.

Then, an input signal representing white noise is fed to the filter A (z), block 225, thereat the output signals corresponding to the input signal to the circuit computing the coefficients for the inverse filter A (z) are computed in block 225 for the present frame. Then, ACF's (autocorrelation functions) are computed in block 114 from the candidate vectors and the residual signal, i. e. the output signal from the inverse filter in block 225, after the inverse filter. The ACF of a block x of the time series input signal can be expressed as The autocorrelation function R can be described as a representation of the input sequence x in the autocorrelation domain. Autocorrelation is a transform of a signal, which is selected to code noisy signals having a high degree of information.

An advantage of the ACF is that besides a relatively low computational complexity, in this particular application, it is also insensitive to phase information in the input signal and therefore does not"care"about waveform matching in the time domain. Thus, the ACF is a good transform if one want to find the approximate distance between two signals in what can be

described as the frequency domain. The relationship between the autocorrelation domain and the frequency domain has to be clarified here. The relationship between the autocorrelation function R and the Power spectral density P can be expressed as: For the simple case, where the ACF of an input block only consists of two values Ro and R1 for lags 0 and 1, respectively. The tilt of the frequency domain spectrum of the signal is described by the relation-Rl/Ro. However, no information about the phase as a function of frequency can be deduced. In this aspect, the ACF can be described as a transformation to set of coefficients that describe the amplitude response in the frequency domain. As mentioned above, no information about the phase exists. Using additional legs will increase the spectral matching properties of the ACF. It must be stressed however that in order get the exact amplitude response, a transformation (e. g. Discrete Fourier Transform) has to be applied on the ACF coefficients R.

Another advantage of autocorrelation is that it enhances the naturalness of the reproduction of a signal; the listener perceives if something happens in the background. The candidate vectors may be chosen from a Pseudo random Noise (PN) sequence.

A codebook could also be trained, i. e. an array of vectors best describing the background noise is chosen. An appropriate amount of candidates is defined according to the amount of index bits which may be used. The amount of candidates is N=2k, k being the amount of index bits; E. g. 3 bits results in 8 candidate vectors. Experiments have shown that 3 to 8 bits give a good enough result. In block 235 the difference between rg and ACF of a gain scaled codebook candidate is calculated and thereafter multiplied with gn (rn). In block 245 the energy is calculated on the difference between said vectors rs and rn. The lower the energy is, the less is the difference between the two vectors. The best candidate is selected in block 255 by

means of e. g. an MSE criterion in the ACF domain. By doing this we try to reach a good match in the autocorrelation domain and do not have to waste a lot of energy on trying to model the background noise in the time domain. Thereat, the quantized A (z) coefficients, the quantized optimal gain gn'and the index (n) of the best codebook candidate CN are fed to the transmission channel, in block 230.

The codebook candidate Cn, wherein n is the index of an optimal candidate vector, which gives the best correlation with the input residual signal in the ACF domain, is selected.

Figure 2 shows a block diagram illustrating the search of the optimal codebook vector, which is performed in three steps.

1. In block 205 a candidate vector C1 is chosen and different values of gl (a gain factor) are tested. In block 215, only one ACF is calculated of the signal. rl is subtracted from rs, gl providing the lowest sum of the square of the difference between the elements in the vectors rs and rl.

2. The above procedure is repeated in block 206,216 for the vector r2, and in block 207,217 for the vector rc.

3. In block 245, the sum of the square of the difference between rs and rn (MSE) is calculated, the vector providing the best match with rs being chosen.

4. The steps 1-3 are repeated for every candidate vector. , (r, _ rT) 2 rn. rnT rn rn where n= [l, N] (1) corrn is a normalized correlation between the two vectors rs and rn. rs is ACF sequence (vector) on the output signal from block 225 rn is a ACF sequence calculated in block 215 when N=1, in block 216 when N=2, etc, r. is the transpose of rn,

the corresponding optimal gain ,,is computed computed as where n= [l, N] (not shown in the figure) (2) T denotes the transpose of a matrix As the above procedure requires an enormous cost in complexity, the codebook vector n, with the highest correlation between rs and rn, is chosen (see Eq. (1). The optimal gain is thereafter calculated according to Eq. (2). Block 255,"Selection logic", is a switch, which chooses an optimal codebook vector. It chooses the vector Cn, which ACF provides rn the best match with rs.

Fig. 3 shows the decoder end. From the transmission channel, quantized A (z) coefficients are extracted, quantized optimal gain gn'and index (n) of the best codebook candidate are extracted from the transmission channel in block 330.

In the decoder the received codebook candidate (block 307) is multiplied with its optimal gain gn and synthesized in block 317. Optimal gain is the amplifier factor minimizing the energy calculated in block 245. If error weighting of the input signal is desired, it is simply done by a sequence of multiplication of the ACF's with the ACF of the error weighting filter. This operation is applied on the calculated ACF, i. e. on the output signals from block 214,215,216 and 217. An error weighting filter is a filter locating the major part of the error energy in the frequencies which are difficult for the human ear to perceive. Such a process is known by a man skilled in the art.

In order to implement the invention, the same candidate vectors may not be used over again; the noise would then have a periodic characteristic. Instead, a new set of candidate vectors may be chosen from a Pseudo random Noise (PN) sequence

for each frame, at the coder end and at the decoder end. It is here assumed that the same PN-sequence is used in the coder and the decoder. For a fix codebook, the phase information in the candidate vectors must be scrambled, so as to avoid the periodic noise. Algorithms for scrambling phase information are well known by the man skilled in the art.

The invention being thus described, it will be obvious that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such modifications as would be obvious to one skilled in the art are intended to be included within the scope of the following claims.

Previous Patent: DISPLAY AND IMAGE DISPLAYING METHOD

Next Patent: COMPUTATION AND QUANTIZATION OF VOICED EXCITATION PULSE SHAPES IN LINEAR PREDICTIVE CODING OF SPEECH