CODED COMMUNICATIONS SYSTEM - BRITISH TELECOMM

Title:

CODED COMMUNICATIONS SYSTEM

Document Type and Number:

WIPO Patent Application WO/1989/002148

Kind Code:

A1

Abstract:

In an LPC type coded communications system the excitation source is derived from previous filter outputs at the decoder; in one embodiment the speech output is used, in other embodiments, intermediate excitation outputs are used. To enable tracking the coder derives the filter parameters by using the same excitations, supplied by a local decoder, to synthesise locally the actual error produced at the decoder; the parameters are optimised iteratively by varying the delay of an FIR stage and deriving the actual error in a loop, and selecting the delay for minimum actual error. The IIR parameters may be calculated jointly with the other FIR parameters inside the loop, either for minimum prediction error or minimum actual error. The FIR stage may comprise several parallel FIRs, separately excited.

Inventors:

XYDEAS COSTAS (GB)
GOUVIANAKIS NIKOLAOS (GB)

Application Number:

PCT/GB1988/000711

Publication Date:

March 09, 1989

Filing Date:

August 26, 1988

Export Citation:

Click for automatic bibliography generation Help

Assignee:

BRITISH TELECOMM (GB)

International Classes:

G10L19/08; H04B14/04; (IPC1-7): G10L9/14

Foreign References:

US4301329A	1981-11-17
US4038495A	1977-07-26

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1.

A method of transmitting speech in which filter parameters are periodically derived from an input speech signal in a coder and transmitted to a decoder to update the response of a decoder filter, in which the excitation input to the decoder filter is derived in the decoder from the speech output of the decoder filter.

2.	A method according to claim 1 in which the derivation of the filter parameters is carried out in the manner such as to reduce the actual error between the input speech signal and the output of the decoder.

3.

A method according to claim 2, in which the 5 decoder includes a synthesis filter comprising a first filter and a second filter in series, the first filter having a variable delay parameter and the second filter being an infinite impulse response filter; and the filter parameters are periodically derived in the coder by an Q iterative process including the step of varying the delay parameter and deriving a corresponding estimate of the actual error between the input speech signal and the output of the decoder; and then selecting a value of the delay parameter which reduces this estimated actual error. 5.

4.

A method according to claim 3, in which the coder also includes such a synthesis filter, and the estimate of the actual error is derived at the coder by producing a synthetic speech output for each value of the delay 0 parameter and comparing the synthetic speech output with the input speech signal.

5.	A method according to claim 3 or claim 4 in which for each value of the delay parameter, values for other parameters of the first filter and for parameters of the second filter are jointly calculated and used to derive the estimated actual error.

6.

A method according to claim 3 or claim 4, in which values for parameters of the second filter are initially estimated outside the iterative process, and values for the other parameters of the first filter are calculated for each value of the delay parameter and used to derive the estimated actual error.

7.

A method of transmitting speech in which, in a coder, filter parameters are periodically derived from an input speech signal and transmitted to a decoder to update the response of a decoder filter comprising a first filter and a second filter in series, the first filter having a variable delay parameter and the second filter being an infinite impulse response filter; in which the excitation input to the decoder filter is derived at the decoder from one or more intermediate outputs of the decoder filter and in which the filter parameters are derived in the coder by an iterative process including the step of; varying the delay parameter, calculating the values of the other parameters of the first filter and of the second filter and using such values to derive an estimate of the actual error between the input speech signal and the output of the decoder; and then selecting the delay parameter value so as to reduce the estimated actual error.

8.	A method according to any one of claims 5, 6 or 7, wherein values of the other parameters of the first filter are calculated, for each value of the delay parameter, so as to give rise to a low prediction error.

9.	A method according to any one of claims 5, 6 or 7, wherein values of the other parameters of the first filter are calculated, for each value of the delay parameter, so as to give rise to a low actual error.

10.	A method according to any one of claims 5, 6 or 7, wherein after selection of the delay parameter, values of the other parameters of the first filter are recalculated so as to give rise to a low actual error.

11.	A method according to any preceding claim, in which the actual error is weighted so as to reduce its dependence upon perceptually less important spectral regions. ^".

12.

A speech coder comprising means arranged to periodically derive filter characteristics from an input speech signal, in such a manner as to reduce an estimate of the actual error between the input speech signal and 4he synthetic speech output which would be produced by a filter having those characteristics when excited by a signal derived from its previous synthetic speech output.

13.	A coder according to claim 12 including a synthesis filter adapted to produce a synthetic speech output, the estimate of the actual error being the error formed between the input speech signal and the synthetic speech output of the synthesis filter.

14.

A speech coder including means arranged to periodically derive filter characteristics from an input speech signal, the filter characteristics being derived in such a manner as to reduce an estimate of the actual error between the input speech signal and the output which would be produced by a synthesis filter comprising a first filter and a second filter in series, the first filter comprising a plurality of parallel feed forward filters, at least one of which is connected to receive one of a plurality of different excitation sequences derived from combinations of the outputs of the parallel feed forward filters, and the second filter being an infinite impulse responser filter, having those characteristics, when thus excited..

15.

A coder according to claim 14, including means arranged to derive the said filter characteristics by a method comprising the steps of: varying the value of a delay parameter of a first said parallel feed forward filter, deriving a corresponding estimate of the actual error for each such value, and selecting a value of that delay parameter which reduces the said actual error; and then repeating the step for the delay parameter of each further said parallel feed forward filter, each said estimate being produced using all previously selected said delay parameter values.

16.	A coder according to any one of claims 12 to 14, including also a local decoder adapted to produce a ^". synthetic speech output.

17.

A decoder for coded speech which includes a filter, the decoder being arranged upon receipt of coded speech signals to update the characteristics of the filter therefrom, the filter having an excitation input and a speech output connected thereto so that in use the filter is. excited from its previous speech output.

18.

A decoder for coded speech which includes a filter, the decoder being arranged upon receipt of coded speech signals to update the characteristics of the filter therefrom, the filter comprising a first filter and a second filter in series, the first filter comprising a plurality of parallel feed forward filters, at least one of which is connected to receive one of a plurality of different excitation sequences derived from combinations of the outputs of the parallel feed forward filters, and the second filter being an infinite impulse response filter.

19.	A decoder according to claim 18, in which at least one such feed forward filter is connected to receive an excitation sequence derived from the output of the second filter.

20.	A receiver including a decoder according to any one of claims 17 to 19.

21.	A receiver substantially as herein described with reference to the accompanying Figures 2 and 2a.

22.	A transmitter including a coder according to any 5 one of claims 12 to 16.

23.	A transmitter substantially as herein described with reference to the accompanying Figures 1 and la.

24.	A speech transmission system comprising a transmitter and a receiver according to any preceeding claim.

25.	24 A method of speech transmission substantially as herein described, with reference to the accompanying Figures.

Description:

CODED COMMUNICATIONS SYSTEM

This invention relates to a system for transmitting speech signals in coded form. This invention relates also to a transmitter and to a receiver, for coding and decoding respectively such a signal.

Many known systems for transmitting coded speech signals derive the characteristics of a filter from the input speech samples at a coder, and transmit these to a decoder, where they are used to configure a decoder filter which is then excited by a suitable excitation signal source to produce a synthesised reproduction of the input speech signal.

In Linear Predictive Coding (LPC), an all-pole recursive filter is used to model the short-term spectral envelope of a speech signal, using a limited number of coefficients which are updated at periodic intervals, the coefficients usually being directly calculated by solving a set of linear equations devised to minimise the 'prediction error' (a measure of the difference between the input speech signal and the predicted speech signal).

In some LPC systems, a predictor stage is included in the coder to remove long-term periodicity corresponding to speaker pitch from the difference signal; this predictor may also be considered as a filter, and the corresponding filter parameters are also transmitted to the decoder.

The prediction error, regardless of whether or not pitch prediction is included in the predicted speech signal, differs from the actual error because the prediction is based on a model of excitation, not on the excitation actually used at the decoder

Recent LPC systems include, for example, the Multipulse Excited .(HP), the Regular Pulse Excited (RPE), and the Code Book Excited (CE) LPC Systems. The excitation to be used at the decoder is selected or derived at the coder; in the above systems, the decoder includes a controllable excitation signal generator, and the coder must therefore transmit control signals to the decoder.

The decoder may thus be a two-stage filter (with a relatively short delay ^' all-pole filter and a relatively long delay filter) and an excitation generator.

The excitation control signals are themselves produced at the coder; in MPLPC, this has been achieved by using "analysis by synthesis" - ie synthesising speech locally using the unmodified excitation pulse sequence, subtracting this synthesis from the actual input speech sequence to form an actual error signal, perceptually weighting the error signal, and then using the weighted error signal as a control in a closed optimisation loop to select the excitation sequence for minimum (weighted) actual error.

According to the present invention there is provided a method of speech coding in which filter parameters are periodically derived from an input speech signal in a coder and transmitted to a decoder to update the response of a decoder filter, in which the excitation input to the decoder filter is derived in the decoder from the speech output of the decoder filter.

It is thus possible to reduce the bit rate by avoiding the transmission of data relating to the excitation sequence, as the excitation is simply derived from the previous output speech at the decoder. Preferably at least some of the filter parameters are derived so as to reduce the actual error which will be produced at the

decoder, rather than the prediction error. A synthesis filter may be provided at the coder and the actual error may be derived by subtracting locally synthesised speech from the input speech signal.

Preferably the parameter derivation is achieved by iteratively deriving the delay parameter of an all-zero stage of the filter by minimising the actual error. Preferably at least some of the other parameters are calculated inside the loop. Preferably all other parameters are jointly optimised, by calculating them inside the delay parameter iteration loop. In another embodiment these optimisation methods may be applied to a system using previous excitations, rather than previous decoded speech, as excitation to a plurality of parallel feed-forward stages of the filter. Use of the invention can give a significant signal to noise ratio advantage and/or an ability to work effectively with small frame sizes and thus allows, for a given signal to noise ratio, a shorter coding delay.

Other aspects of the invention are recited in the claims.

The invention will now be described, by way of example only, with reference to the drawings in which;

- Figure 1 shows schematically a generalised transmitter employing a coder according to the invention,

- Figure la shows a specific embodiment of a transmitter employing a coder according to the invention,

- Figure 2 shows schematically a generalised receiver employing a decoder according to the invention,

- Figure 2a shows a specific embodiment of a receiver employing a decoder according to the invention,

- Figure 3 shows schematically the components of a single excitation synthesis filter of Figures 1 or 2,

- Figures 4a and 4b provide six algorithms corresponding to methods of optimising the parameters of the synthesis filter of a coder according to the invention,

- Figure 5 shows schematically the components of a multiple excitation synthesis filter of a coder according to another aspect of the invention

- Figure 6 illustrates examples of methods of optimising the parameters of such a multiple input synthesis filter, and - Figures 7a and 7b show excitation waveforms occurring in embodiments of the invention.

Referring to Figure 1, at a transmitter, incoming speech is sampled to give frames [y.] (i=0,l, ... n-1) of samples. From each frame, a filter optimisation stage 1 derives filter parameters, using an 'analysis by synthesis' technique, as follows. An excitation sequence source 2 generates an initial excitation sequence [x_.] (i=0,l,...n-l), which drives a two stage synthesis filter 3 corresponding to a finite impulse response filter B(z) and an infinite impulse response (all pole) filter 1/A(z) in series, having a response H(z):

(n+dl)

^H(z ⁾=^ ^"k"n"dl B(z)z - (A) k=o A(Z)

k=l

where d _χ^0, B(z)= b^z ^" , k=0

P A(z)=l- a, z~ and n is the size of the k=l

analysis frame from which H(z) is to be estimated.

The filter parameters are thus coefficients [a, ] and [b, ] and delay d,. A frame of synthesised speech output [y.] is produced by the synthesis filter 3, and this is compared with the original input speech frame [ . ] by subtractor 4 which produces a difference signal e., where e. = y. - y. . A better measure of the perceived error to the human ear is obtained by frequency-weighting the error signal in such a manner as to de-emphasise formant regions in the error spectrum. Accordingly, the difference signal e. is preferably filtered by a weighting filter 5 with suitable transfer ^'function, to form the perceptually weighted error signal e . This signal is then minimised to optimise the parameters of synthesis filter 3 by optimisation calculator 6 in an iterative closed loop, which derives values for [a, ], [b, ] and d,, which are quantised by quantiser 7 and passed to a coder 8, which codes the parameters for transmission to a decoder station, and to a local decoder 10 which is functionally identical to the decoder at the receiver station.

At the local decoder 10, the parameters are used to configure a local decoder synthesis filter 11, which is then driven from the excitation sequence source 2 by the same excitation sequence [x_ _j] for which the parameters were optimised, to produce an output frame of synthesised speech [y.] .

This synthesised output is received and processed as necessary by an excitation adaptation calculator 12, which thus constructs a new excitation sequence [x_,] from the

output of the decoder synthesis filter 11. The new excitation sequence is then passed to the excitation sequence source 2 for use on the next input speech frame, for which it will form part of the Past Decoded Speech (PDS) signal.

The excitation adaptation calculator 12 and Local Decoder 10 are not necessary in all embodiments of the invention as shown in Figure la. In this simple embodiment, a frame of input speech [y.] is received and optimised filter parameters are derived in a closed loop using the excitation sequence supplied by excitation sequence generator 2. Once this is done, the synthesised speech signal [y.] produced using these optimised parameters is synthesised by the synthesis filter 3 (rather than by the local decoder synthesis filter 11 as described above with reference to Figure 1) and supplied to the excitation sequence generator 2 which is simply a store capable of storing the synthesised frame [y.] , and supplying this as the next excitation sequence [x ,] (and also of storing the initial excitation sequence needed when the first frame of input speech is to be coded).

Referring to Figure 2, at the distant decoder 20 a decoder unit 28 performs the inverse operation to coder 8, to recover the filter parameters [a, ] , [ ] and d,, which are passed to synthesis filter 21 to produce the configuration previously optimised at the transmitter. Excitation sequence generator 23 drives the filter with an excitation sequence [x_ _j] which will be the same as that used at the transmitter; for the first frame of speech it will be the initial excitation sequence identical to that used at the transmitter. A frame of

-A synthesised speech output [y.j is thus produced for the listener, and this output is also supplied to the

excitation adaption calculator 22 to create a new excitation sequence [X J] in the excitation sequence generator 23 in the same manner as at the local decoder 10.

As shown in Figure 2a, the excitation adaption calculator 22 is not necessary in single input embodiments, and the excitation sequence generator 23 may simply be a frame store holding the last decoded frame

(and an initial excitation sequence) as discussed above.

Since the distant decoder 20 and the local decoder 10 perform exactly the same 'synthesis, using the same filter and the same initial excitation, the updated excitation sequences thus created are also the same and there is no need to transmit any excitation sequence data.

There is of course a tendency for the propagation of transmission errors, since a wrongly-received filter parameter will produce an "incorrect" excitation sequence which is , not that for which the next set of -filter parameters were optimised at the transmitter.

Accordingly, it is advantageous to provide, in a known manner, re-initialisation sequences occasionally so as to bring the distant decoder and the local decoder back into agreement; this may take the form of a periodic control bit transmitted to both decoders, instructing each excitation sequence generator 23 to re-start with the preprogrammed initial excitation sequence.

It will, of course, be understood that many of or all the above functions may be performed by a single digital processor and therefore need have no separate physical existence. The operation of the filter optimisation stage 1 of the transmitter of Figures 1 and la will now be described in greater detail.

In many prior art systems using LPC coding techniques, the optimisation process is performed by minimising the

value of the prediction (residual) error, which is derived by comparing the current value of the speech signal and its predicted value based on past samples,

P -X ^ei ^{= y}i ^"∑ ^ak ^yi-k -∑ k ^xi-k-(n ₊dl) ^(B)

where e. is the prediction error for the i _fc. sample.

This technique might also be used in implementing the invention, but preferably at least one filter characteristic is optimised by minimising the actual error e, = y^ - ^ between the synthesised speech output of the synthesis filter 3, or rather in a preferred embodiment the weighted actual error e ^" produced by passing e, through the perceptual weighting filter 5. The weighting filter 5 is defined in the same way as in MPLPC or RPE-LPC, and will henceforth be omitted from the analysis of operation of the invention; the term "actual error" will be understood to encompass the perceptually weighted actual error. Since minimising e (the error vector) by calculation is a non-linear problem of high computational complexity, in preferred embodiments of the invention the filter characteristics are selected by partly linearising the problem, by calculating initially some parameters for minimum prediction error energy e e (using a least mean square solution) which is straightforward. Using these values, the optimum values for the remaining parameters are derived by finding the actual error e, (for example by synthesising the output speech signal ^ in a loop for each possible delay parameter value and subtracting it from the input speech signal y. ), finding the minimum mean squared weighted actual error e , and selecting

_r-. parameter values which gave rise to this minimum e V ^"

The values of the other parameters may be calculated either inside or outside the loop, as discussed below.

Referring to Figure 3, the synthesis filter 3 of the filter optimisation stage 1 comprises essentially two stages in series; a first filter element 31 comprising a finite impulse response filter with transfer characteristic B(z)z ~ ⁿ~ , and a second filter element 32 comprising an infinite impulse response filter with transfer function 1 , so that

A(z)

the overall filter impulse response H(z) is

B(z) z- ^(n+d), A(z)

as required by equation A. The first filter element 31 consists of q,+l coefficients [b.] (i=0,l, q,) and qyd- delay elements so that it comprises a delay of length (d,+n-l) followed by a q,+ 1 stage non-recursive filter? in practice, a small number (q _η .__.=0,l or 2) of such coefficients are used.

The number of delay elements (and hence the delay length) d, employed is variable in operation up to a predetermined maximum number N-l (which may be greater or smaller than the frame size n).

This first filter element 31 therefore receives the excitation signal [ J] and contains the most recent q _π+ d,+ n samples of the [x_ _i] sequence.

The second filter element comprises a recursive filter with an all-pole response, defined by filter coefficients

[a _k] (k = 1, 2, ...p).

In the following description, vector notation will be adopted whereby the vector a denotes the set of [a, coefficients and the vector b denotes the set of [b ] coefficients.

For each received frame of input speech, the filter optimisation calculator 6 initially calculates values for the [a, ] coefficients for mean square prediction error energy e e.

In preferred embodiments, (referred to generally as

Method 1 in Figures 4 and 5) the a vector is calculated jointly with the b vector inside a loop for each value of d, between 0 and N-l, and d, is selected for minimum e„w as discussed above,

For each specific value of d,, the a and 5 vectors

- - which minimise the predictor error energy e e are

where X is an nx(q, + 1) matrix of excitation samples

samples from y_ to

The actual error energy e e is minimised using one of the following three methods:

In a first embodiment, for each value of d, in the loop, a and b ^" are computed jointly from (D) for minimum prediction error and used to synthesise 7. The actual i.ψd Λ _Λ error energy e e is also determined using e = y - y. H(z) is therefore determined within an 'analysis by synthesis' loop where d, varies over a range of N

values i.e. 0 d, N-l and the a,15 vectors are defined for the value of ά-^ which minimises e e. In this method only the d, filter parameter is optimised for a minimum actual error energy e e, while both a and b are selected for minimum prediction error energy -eT—e. This is referred to in Figure 4 as 'scheme 1'.

In a second embodiment, the approach for estimating b ^" is a modification of Scheme 1 above. In this case, a and b ^" are defined by equation (D) for the value of d, which minimises prediction error -eTe-. Once d, is optimised the ID vector is re-estimated (outside the optimisation loop) for minimum actual error, using:

b ^" = (XVQX) ^"1 X ^TQ ^T (y-i) (F)

m represents the output of the 1/A(z) filter when its output is zero, and its "memory" consists of the most recent samples of the previous synthesised speech frame; where is an (nxn) convolution matrix and q, is the kth value of the impulse response of the 1/A(z) filter, employing the a and d, values obtained from the previous analysis by synthesis search process. This is referred to in Figure 4 as 'scheme 2'.

In a third embodiment, for each given value of d,, a (and b) is calculated for minimum prediction error using equation (D). The values of S ^" are then (re-Calculated inside the loop for minimum actual error, using equation (F).

Thus within the analysis by synthesis loop, defined for each d, value (0^ d, N-l) a is first calculated using equation (D) and 5 is then re-estimated in the loop. For each set of a,_ and d, values the actual error energy e e can be measured by synthesising y and

forming e. Alternatively e e can be calculated directly from the expression:

i ^τe = [1 J -b ^τ]

The d, value, and corresponding a & 5 ^" values, giving rise to the minimum actual error energy are then selected as optimal. This is referred to in Figure 4 as "Scheme 3". Schemes 2 and 3 above are preferred on accuracy grounds, as b~ is (re-)defined by minimising the actual

_*. error energy e rather than the prediction error energy e.

In an alternative embodiment, referred to as Method II in Figures 4 and 5, a is initially calculated for minimum prediction error energy outside the loop and independently of b, by assuming £ is equal to zero, yielding

a = (Λf ¹ Y ^Ty (H)

T en given a, the U and d, parameters can be optimised by following schemes 1, ^'2 or 3 described above. In particular, the value of b producing a minimum prediction error energy -eT—e i.s given by

The corresponding approaches to schemes 1, 2 and 3 above are shown in Figure 4 as Schemes 4, 5 & 6 respectively and form fourth, fifth and sixth embodiments of the invention.

The parameters produced are then quantised by a quantiser 7, which allocates available bits between parameters.

Although the excitation to the synthesis filter has been described in terms of a single excitation sequence

[x .], it may be desirable to use several excitation sequences and accordingly, referring to Figure 1, excitation sequence source 2 is adapted to supply a number of different sequences [x _ [u_^] , [v_.]...to synthesis filter 3.

One excitation sequence, \*_A , may be the Past

Decoded Speech y. output of the synthesis filter 3; the other sequences are derived by the excitation adaption

^'calculator 12 from intermediate outputs of the synthesis filter 3 (or all such sequences may be thus derived as discussed ^'below).

Referring to Figure 5, the filter 3 (and, of course, filters 11 and 21 of Figures 1, 2 and 2a) comprises two stages in series, the first filter element 51 comprising a number j of filters 51a, 51b, ... 51j, each receiving an excitation sequence [x_ ]/ [u_^], [v_^] etc., and producing an output. As shown, the filters 51a, 51b etc.; have responses B ⁽¹⁾(z)z~ ^n"dl, B ⁽²⁾(z)z ^"n"d2,

B^ Uz)z ^"π" etc., respectively, and their combined output is passed to the second filter element 32 which is as before a recursive filter with response 1/A(z) .

As shown, each filter 51a, 51b, etc., corresponds to filter element delay of length d., and coefficients As with the case of a single excitation filter, the pa ' ] ,

[b ing th ing computational complexity by partially linearising (i.e. calculating some parameters by solving for minimum prediction energy) using methods based on those described above in the first to sixth embodiments.

In general, the filter parameters are sequentially optimised so that the filter "grows" - in other words,

values for a, b^ ' and d,, are derived first, then

~(2) values for b ' and d ₂, and so on - and consequently improves its accuracy.

This lends itself to adaptive transmission schemes, wherein if a bit-rate reduction is required, only earlier coefficients (a, ϊr , d ₁ for example) need be transmitted.

Referring to Figure 6, the filter optimisation means 6 in one embodiment is adapted to employ a strategy based on that of the first embodiment (Method I) of the single excitation case. In a first stage of the process, a, ϊr ' and d, are found by minimising actual error energy following scheme 1, 2 or 3 of Figure 4, with other parameters assumed to be zero. In a second stage these values of a, ϊr ' and d,

-(2) - are used to derive b ' and d ₂ values which, in a preferred embodiment, may then themselves be used to redefine a and ^" ' in various ways, some of which are discussed below. A first approach is based on scheme 1 (or scheme 2).

If Ώ ~ ' is calculated independently of a and ϊr ' ,

-T— a minimum predictor energy e e solution is given by

b ⁽²⁾ = (U ^TU) ^""1U ^T(y-Ya-Xb ⁽¹⁾ ) (Fl)

where U is an nx(q, + 1) matrix of excitation samples u, f om

^u-(n + d ₂ + q ₂ ) ^{t0 u}-(d ₂ + 1) which corresponds to equation F in the first stage.

-f2

\_) ^~ ' may alternatively be calculated jointly with a, or jointly with a and ϊr , using straightforward matrix expressions corresponding to equation (D) for the first stage.

Thus the second analysis by synthesis loop can calculate for each value of d_, (0^ ά~< N-l), minimum

, - 1(2) - - ( 2 ) prediction error energy solutions for , or a b ^{v ;} or a ϊr ' b respectively. Regardless of how these are calculated, however, for each of the N resulting [a, b ⁽¹⁾, b ⁽²⁾, d _ι; d ₂) filters, [y._] is synthesised and the filter which offers minimum actual error energy ^"e e, is selected.

Some of the filter parameters can be re-estimated

(outside the loop) for the specific values d, = d and d ₂ = άi which have been already determined by the process. Specifically, when the second analysis by synthesis process estimates filter parameters for minimum prediction energy, some or all filter parameters can be re-defined, for either a minimum prediction error energy or a minimum actual error energy, as follows:

(i) given d, and d.,, prediction error energy solutions for a -, b~f ^xl ' and -bT2 ) ^; can be calculated.

(ii) given d,, d ₂ and a, the ^~ϊr ' b^ ' vectors can be re-calculated for minimum prediction error from:

[ ( Q I QU ) ^T (QX . QU ) ] ^"1 [QX ! QU] [y-m] (D3 )

Alternatively, approaches based on the algorithms of scheme 3 may be employed, for example;

- i

(i) b 2'- may be optimi .sed, i ■ndependently of a and within an analysis by synthesis loop which is executed for N different values of d -2.' F -o ^{^}r e"a"c-h" d ^u2.' the vector b is calculated for minimum actual error

-(2) energy, so that b ' is selected for the value of d = d ₂ which minimises e e.

-(2 ) —

(ii) b ' may be calculated jointly with a for

-T— minimum prediction error energy e e. For each value of

- ( 2 ) . d ₂, a is retained and b ' is re-estimated for minimum prediction error.

(iii) ϊr ' is calculated jointly with a and ϊr ' for minimum predictor error. The new a vector is retained wwhhiillee ϊbr ' aanndd ϊϊrr ' ' aarree rree--eestimated for minimum prediction error from equation (D3).

Now, given the optimised values of d, and d ₂, i.e. d' and d ₂", some of the filter parameters can be re-optimised in a single step (outside the analysis by synthesis loop) using equation (D3) to re-define ϊr ' and b^ ²⁾.

The extension of the above calculations to a third stage for calculating Ώ ~ ' and d~, and recalculating previous filter characteristics, is simple if the equations are appropriately modified to include higher terms, and the size of the filter, i.e. the number of summation terms used in producing y, increases progressively as the optimisation process evolves from one stage to the next.

In another embodiment of the invention, the approach of schemes 4,5 or 6 of Figure 4b may be followed, the

[a ] coefficients being initially optimised for minimum

-T- prediction error energy e e using a conventional LPC solution.

The method used in deriving filter coefficients can be chosen from a wide class of algorithms based on one or

other of the six schemes which could be used in stage 1. Inspection of the second stage equations show their formal similarity to the stage 1 equations, however, and it is preferable that the filter optimisation calculator 6 employs the same method for stage 2 and each subsequent stage as for stage 1, with necessary extensions to the equations, since this lends itself easily to iterative programming techniques and hence reduces system complexity.

Although in the foregoing, the selection of some filter parameters was effected by calculating them for minimum prediction error energy e it will be understood that they could be calculated or estimated otherwise (for example by inverse filtering [y.] ).

It is important to realise that the adaptation of the H(Z) filter as described above is not only confined to the values of the coefficient sets but is also applied to the "structure" of the filter - ie the lengths of the delay lines d,, d ₂ etc - so that the operation of H(z) is re-defined on a frame-by-frame basis (in contradistinction to conventional adaptive filters).

The operation of the excitation adaption calculator 12 will now be described. From the foregoing, it is clear that the excitation used at the receiver station to drive the synthesis filter 21 must be that for which the filter is optimised; when the receiver filter 21 is identical to the optimisation filter 3 and local decoder filter 11 at the transmitter, the excitation must be the same as that for which the transmitted parameters were optimised.

Initially this is arranged, as described above, by providing the excitation sequence generators at the transmitter 2 and the receiver 23 with a preprogrammed initial sequence - the exact nature of the sequence is not critical, but it may be for example a zero mean Gaussian random sequence.

This single sequence may be used to drive each input of a multiple-input synthesis filter 3 of the kind shown in Figure 5. Considering the first three filters 51a, 51b _r 51c having responses B^(z)z~ ^n"dl, B ⁽²⁾(z)z~ ⁿ~ ^d2, B ⁽³⁾(z)z ^"n_d3 as discussed above, where n is the frame size of the analysis frame, the values of the d,, d ₂ and d- parameters define specific segments in the [x_ _j] sequence which are then inserted into the FIR components 51a, 51b, 51c of the H(z) synthesis filter 3 to act as excitations thereto.

In the PDS embodiment of the invention, the adaptation calculator 12 is simply a link between the speech output and the excitation store. Use of the PDS signal as an excitation sequence has significant advantages in increased signal to noise ratio for a given transmission rate, and/or low coding delay.

As discussed above, it is of course possible to use other inputs for X(z): in particular, X(z) may be derived from the output of the first filter element 31 or 51, (ie an intermediate output of the whole filter H(z)). The filter element may then be considered to be an infinite impulse response filter since there is thus a feedback path between its output and its input. Where the excitation is derived only from such intermediate outputs, the input to the A(Z) filter may in one particular case be equivalent to that produced in the 'self-excited' vocoder described by Rose and Barnwell ('The self-excited vocoder - an alternative approach to Toll Quality at 4800 bps' , IEEE Proc of ICASSP April 1986 pp 453 - 456).

The parameter optimization methods described above are equally applicable to such parallel filter structures.

Referring to Figure 7a a typical signal used as an input to the all-pole filter 32 is the PDS signal filtered

by all-zero filter 31, which is essentially very similar to the speech waveforms shown. The speech-like nature of the PDS excitation differs markedly from the PES excitations shown in Figure 7b; these latter waveforms are very much more similar to the excitations used in prior art systems such as MPLPC.

These excitations generally resemble a train of glottal pulses like those produced by the human vocal cords.

At first sight, it is therefore surprising that the PDS excitation of the invention produces better results than the PES excitation. Since the PDS signal is 'speech-like' however it requires less filtering, as the filter merely needs to modify the previous speech frame to take account of the change between frames, rather than having to re-synthesise each speech frame from a random waveform as in the traditional source/filter model.

It has been found that SNR decreases with increasing frame size for each embodiment but also that the PDS embodiment using a single excitation sequence has an SNR advantage - especially at low frame sizes - which is very noticeable in subjective speech quality. Also the single excitation PDS embodiment is capable of operating on smaller frame sizes which reduces coding delay to considerable advantage.

Joint rather than separate evaluation of filter parameters has been found particularly effective.

It was found that increasing either the number of filter coefficients, or the maximum value of d., results in a higher SNR for both codecs. When the SNR improvement achieved is costed against the number of extra bits required to be transmitted per frame however, then it is clear that the allocation of the extra bits to the delay parameters pays off considerably better than the allocation of bits to new filter coefficients.

Previous Patent: SPEECH CODING

Next Patent: PROCESS FOR PRODUCTION OF MAGNETIC MEDIA