CONTEXT CODING - CRAVEN PETER G

Title:

CONTEXT CODING

Document Type and Number:

WIPO Patent Application WO/2003/083856

Kind Code:

A1

Abstract:

The present invention provides a compression system for one-bit audio signals that uses entropy coding whose context includes information transmitted for the purpose. The context information will itself usually be transmitted in compressed form, for example using lossless compression. The additional context information may be a decimated version of the one-bit signal. The decimated signal will usually be multi-bit. The one-bit signal is divided into 'tuples' of eight bits, and each tuple is entropy coded using nearby samples of the decimated signal as context.

Inventors:

CRAVEN PETER G (GB)
LAW MALCOLM J (GB)

Application Number:

PCT/GB2003/001392

Publication Date:

October 09, 2003

Filing Date:

March 28, 2003

Export Citation:

Click for automatic bibliography generation Help

Assignee:

CRAVEN PETER G (GB)
LAW MALCOLM J (GB)

International Classes:

G11B20/00; H03M7/30; (IPC1-7): G11B20/00; H03M7/30

Foreign References:

US6061007A	2000-05-09
US6269338B1	2001-07-31
US5835043A	1998-11-10

Attorney, Agent or Firm:

GILL JENNINGS & EVERY (Broadgate House 7 Eldon Street London EC2M 7LH, GB)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

1.

A method of losslessly compressing a onebit signal having a first sampling rate, comprising the steps of: deriving an auxiliary data stream from the onebit signal ; coding the auxiliary data stream; deriving a context stream from the auxiliary data stream or coded auxiliary data stream ; entropy coding the onebit signal in dependence on the context stream ; and, transmitting the coded auxiliary data stream together with the entropy encoded onebit signal.

2.	A method according to claim 1, wherein the coding of the auxiliary data is lossless.

3.	A method according to claim 1 or 2, wherein the method further comprises the step of dividing the onebit signal into tuples, each tuple consisting of at least one bit from the onebit signal.

4.	A method according to claim 3, wherein each tuple is entropy coded in dependence on a part of the context stream derived from said tuple.

5.	A method according to claims 3 or 4, wherein each tuple is entropy coded using a probability table in dependence on the context stream.

6.	A method according to any of claims 3 to 5, wherein the bits contained within a tuple are contiguous bits from the onebit signal.

7.	A method according to any of claims 3 to 6, wherein each tuple contains eight bits.

8.	A method according to any preceding claim, wherein the auxiliary data stream preserves a first temporal characteristic of the onebit signal.

9.	A method according to claim 8, wherein the first temporal characteristic of the onebit signal is a longterm temporal structure.

10.	A method according to any preceding claim, wherein the auxiliary data stream is derived by sampling the onebit signal at a second sampling rate, the second sampling rate being lower than the first sampling rate.

11.	A method according to claim 10, wherein the second sampling rate is a submultiple of the first sampling rate.

12.	A method according to claim 11, wherein the second sampling rate is one eighth of the first sampling rate.

13.	A method according to any of claims 1 to 9, wherein the auxiliary data stream is derived by application of a manytoone constraining function.

14.	A method according to claim 13, wherein the manytoone constraining function is a causal function.

15.	A method according to claims 13 or 14, wherein the manytoone constraining function is a deterministic function.

16.	A method according to any of claims 13 to 15, wherein a decimation filter is applied to provide a manytoone constraining function.

17.	A method according to claim 16, wherein the decimation filter is a finite impulse response filter.

18.	A method according to claims 16 or 17, wherein the decimation filter has integer coefficients.

19.	A method according to claim 18, wherein the decimation filter has the coefficients {1,2, 3,4, 5, 6, 7, 8, 7, 6,5, 4,3, 2, 1}.

20.	A method according to any of claims 10 to 19, wherein derivation of the auxiliary data stream further comprises the step of quantisation by means of a quantiser.

21.	A method according to claim 20, wherein the quantisation is noise shaped quantisation.

22.	A method according to any of claims 3 to 21, wherein the context stream provides, in relation of each tuple to be encoded, a local context that defines a subset of the set of all possible values for that tuple.

23.	A method according to any preceding claim, wherein the context stream is derived from the auxiliary data stream or coded auxiliary data stream by applying an inverse constraining function.

24.	A method according to any preceding claim, wherein the context stream is derived in dependence on the onebit signal.

25.	A method according to any of claims 20 to 24, wherein the context stream is further derived from the auxiliary data stream in dependence on an error signal from the quantiser.

26.	A method according to any preceding claim, wherein the step of entropy coding the onebit signal comprises the steps of: deriving a selection data stream in dependence on the one bit signal and the context stream; and, entropy coding the selection data stream.

27.	A method according to any preceding claim, wherein the context stream is derived in dependence on the one bit signal and the auxiliary data stream by information subtraction.

28.	A method according to claims 26 or 27, wherein the derivation of the selection data stream exploits a second temporal characteristic of the onebit signal.

29.	A method according to claim 28, wherein the second temporal characteristic is a shortterm temporal structure.

30.	A method according to any preceding claim, wherein the coded auxiliary data stream and the entropy encoded onebit signal are transmitted in a single bitstream.

31.

A method of losslessly compressing a onebit signal having a first sampling rate, comprising the steps of: deriving a context stream for the onebit signal by sampling the onebit signal at a second sampling rate, the second sampling rate being lower than the first sampling rate ; and, entropy coding a portion of the onebit signal at least in part in dependence on a portion of said context stream derived from said portion of the onebit signal.

32.

A method of losslessly compressing a 1bit signal stream, comprising the steps of splitting the 1bit signal stream into a first data stream which is encoded so as to exploit a first temporal structural characteristic of the 1bit signal stream and a second data stream which is encoded so as to exploit a second temporal structural characteristic of the 1bit signal stream, and subsequently transmitting the first and second data streams.

33.	A method according to claim 32, in which the first temporal characteristic is a long term temporal structure and the second temporal characteristic is a short term temporal structure.

34.	A method according to any preceding claim, wherein the onebit signal represents an audio signal.

35.	An encoder which implements the method according to any preceding claim.

36.	An encoder according to claim 35, implemented on a digital signal processor.

37.

A method of losslessly decompressing a losslessly compressed onebit signal to fumish a replica of the onebit signal, comprising the steps of: receiving an encoded data stream; recovering a context stream from the encoded data stream; and, decoding to furnish a portion of the replica in dependence on at least a portion of the context stream, said portion of the context stream being derived in dependence on a portion of the onebit signal which corresponds to the portion of the replica.

38.	A method according to claim 37, in which the step of recovering a context stream includes the substep of decompressing auxiliary data contained in the encoded data stream.

39.	A method according to claim 38, in which the substep of decompressing the auxiliary data is performed losslessly.

40.	A method according to claim 38 or claim 39, in which the decompressed auxiliary data preserves a longterm temporal structure of the onebit signal.

41.	A method according to any of claims 37 to 40, in which the step of recovering a context stream is performed in dependence on previously decoded portions of the replica.

42.	A method according to any of claims 37 to 41, in which the step of recovering a context stream includes the substep of applying an inverse constraining function.

43.	A method according to claim 42, in which the output from applying the inverse constraining function is represented as an integer.

44.	A method according to claims 42 or 43, in which the step of applying an inverse constraining function comprises the sub step of subtracting a weighted sum of previously decoded bits of the onebit signal.

45.	A method according to claim 44, in which the weights of the weighted sum are {1, 2, 3, 4, 5, 6, 7}.

46.	A method according to any of claims 42 to 45, in which the inverse constraining function is applied in dependence on an error signal from a quantiser.

47.	A method according to any of claims 42 to 46, in which the inverse constraining function furnishes implicitly a list of possible values for the onebit signal.

48.	A method according to any of claims 37 to 47, in which the step of decoding to furnish a portion of the replica comprises the substep of recovering a selection data stream from the encoded data stream.

49.	A method according to claim 48, in which the selection data stream is recovered by entropy decoding.

50.	A method according to claim 49, in which the entropy decoding is performed in dependence on probability tables that are supplied in the encoded data stream.

51.	A method according to claims 49 or 50, in which the entropy decoding is performed in dependence on previously decoded bits of the onebit signal.

52.	A method according to any of claims 48 to 51, in which the step of decoding to fumish a portion of the replica comprises the substep of selecting a value from a list of possible values for the onebit signal in dependence on the selection data stream.

53.	A method according to claim 52, in which the step of selecting is accomplished by indexing a table.

54.	A method according to any of claims 37 to 53, in which said portion of the onebit signal comprises a tuple of contiguous bits of the onebit signal.

55.	A method according to claim 54, in which the or each tuple contains eight bits.

56.	A method according to claims 54 or 55, in which the or each tuple is fumished in the natural order in which it occurs in the onebit signal.

57.	A method according to claims 54 or 55, in which evennumbered tuples are fumished in advance of oddnumbered tuples, the numbering referring to the order in which the tuples occur in the onebit signal.

58.	A method according to claims 54 or 55, in which oddnumbered tuples are fumished in advance of evennumbered tuples, the numbering referring to the order in which the tuples occur in the onebit signal.

59.

A method of losslessly decompressing a losslessly compressed 1bit signal, comprising the steps of: receiving an encoded data stream; recovering a first data stream from the encoded data stream; recovering a second data stream from the encoded data stream; and, processing the first data stream and the second data stream in combination to recover the 1bit signal.

60.	A method according to claim 59 in which the step of recovering the first data stream includes the substep of losslessly decompressing data contained in the encoded data stream.

61.	A method according to claim 59 or 60, in which the second data stream is recovered by entropy decoding.

62.	A method according to claim 61, in which the entropy decoding is performed in dependence on previously decoded bits of the onebit signal.

63.	A method according to claims 61 or 62, in which the entropy decoding is performed in dependence on probability tables that are supplied in the encoded data stream.

64.	A method according to any of claims 59 to 63, in which the step of processing the first and second data streams includes the substep of deriving context from the first data stream.

65.	A method according to claim 69, in which the context is derived in dependence on previously decoded portions of the onebit signal.

66.	A method according to claims 64 or 65, in which the substep of deriving context comprises applying an inverse constraining function.

67.	A method according to claim 66, in which the output from applying the inverse constraining function is represented as an integer.

68.	A method according to claims 66 or 67, in which the step of applying an inverse constraining function comprises the sub step of subtracting a weighted sum of previously decoded bits of the onebit signal.

69.	A method according to claim 68, in which the weights of the weighted sum are {1, 2, 3, 4, 5, 6, 7}.

70.	A method according to any of claims 66 to 69, in which the inverse constraining function is applied in dependence on an error signal from a quantiser.

71.	A method according to any of claims 66 to 70, in which the inverse constraining function furnishes implicitly a list of possible values for the onebit signal.

72.	A method according to any of claims 66 to 71, in which the second data stream is entropy decoded in dependence on the context.

73.	A method according to any of claims 59 to 72, in which the second data stream is a selection signal.

74.	A method according to claim 73, in which the selection signal is used to select a value from a list of possible values for the onebit signal.

75.	A method according to claim 74, in which the step of selecting is accomplished by indexing a table.

76.	A method according to any of claims 59 to 75, in which the first data stream preserves a first temporal structural characteristic of the 1bit signal.

77.	A method according to claim 76, in which the first temporal structural characteristic is a longterm temporal structure.

78.	A method according to any of claims 59 to 77, in which the onebit signal has a first sampling rate, and the first data stream has a second sampling rate, the second sampling rate being lower than the first sampling rate.

79.	A method according to claim 78, in which the second sampling rate is one eighth of the first sampling rate.

80.	A method according to any of claims 59 to 79, wherein the step of processing the first and second data streams comprises a sequence of substeps, each of which furnishes a tuple of contiguous bits of the onebit signal.

81.	A method according to claim 80, in which each tuple contains eight bits.

82.	A method according to claims 80 or 81, in which each tuple is fumished in the natural order in which it occurs in the onebit signal.

83.	A method according to claims 80 or 81, in which evennumbered tuples are fumished in advance of oddnumbered tuples, the numbering referring to the order in which the tuples occur in the onebit signal.

84.	A method according to claims 80 or 81, in which oddnumbered tuples are furnished in advance of evennumbered tuples, the numbering referring to the order in which the tuples occur in the onebit signal.

85.	A method according to according to any of claims 37 to 84, wherein the one bit signal represents an audio signal.

86.	A decoder which implements the method according to any of claims 37 to 85.

87.	A decoder according to claim 86, implemented on a digital signal processor.

88.	A data stream encoded according to the method of any claims 1 to 34.

89.	A storage medium storing data encoded according to the method of any of claims 1 to 34.

Description:

CONTEXT CODING Background to the Invention The traditional representation of high fidelity audio in a digital system has been by Pulse Code Modulation (PCM), sampling the instantaneous value of the waveform at regular intervals (typically in the range 44. 1 kHz to 96kHz) and quantising this instantaneous value to one of several regularly spaced possibilities spanning a finite range. Which possibility occurred is represented as a signed multi- bit integer with a typical wordlength in the region of 16 to 24 bits. At comparable quality levels higher sampling rates can be combined with shorter wordwidths by means of noise shaping which moves the quantisation noise from the low frequency regions where the ear is most sensitive to higher frequency regions where it is less sensitive. Recently a fashion has arisen for taking this to the extreme of 1 bit per sample and a very high sample rate, for example, the proprietary system known as "Direct Stream Digital" (DSD) which uses a sampling rate of 2.8224MHz.

However DSD requires a data rate of 2. 8224Mbits/channel/second, 4 times higher than the CD rate of 705. 6kHz/channel/second. This data-rate is inconveniently high-for example DVD Audio has a peak data rate limit of 9.6Mbits/second which only suffices to code 3 raw DSD channels. Further its capacity of 4.7Gbytes only corresponds to 3.7 raw DSD channel-hours to be compared with 7.4 channel-hours required for 74 minutes of 6 channels.

Consequently a compression scheme is required to allow multi-channel audio to be carried on such a medium.

US2001/3165A1 and US6269338B1 disclose that standard data compression techniques such as Huffman or Lempel-Ziv can give moderate compression ratios on one-bit audio signals.

They also disclose techniques illustrated in Figure 1. Here, the input of the encoder takes a one-bit audio signal. Delayed samples of this signal are used in a prediction unit which estimates whether the next bit is likely to be a 0 or a 1. The resultant prediction signal is exclusive-ORed with the one-bit audio signal yielding a one-bit error signal. The benefit of good prediction is that this one-bit error signal is predominantly zero, a fact that can be exploited by an entropy encoder. In some embodiments the prediction unit is also able to furnish the entropy encoder with "context": in this case a bit-by-bit estimate of the probability that the error signal is zero, further improving compression. The output of the entropy encoder then goes via the transmission medium to the input of the decoder where a corresponding entropy decoder decodes the one-bit error signal. This is then exclusive-ORed with

the decoder prediction signal to regenerate the one-bit audio signal. The decoder prediction signal is generated from delayed samples of the output one-bit audio signal.

Two forms of prediction unit are disclosed. In one form table lookup is used to predict the next bit based on the recent history of the one-bit audio signal. This prediction table may be calculated by the encoder and communicated to the decoder, or may be adaptive. In another form, linear prediction is used. This works off a far longer history of the one-bit audio signal and so is able to exploit characteristics of a one-bit audio signal like the significant non-flatness of the spectrum below 150KHz (c. f. figure 5). Many forms of entropy encoder are mentioned such as run length coding, simple Huffman coding (of blocks) but arithmetic coding is preferred.

Combinations of some of these techniques are low complexity. However the good reported compression comes from the techniques incorporating high order prediction as this allows the non-flatness of the one-bit audio spectrum at low frequencies to be exploited. FIR filters of order 100 are reported running at the full 2.8MHz sampling rate. This is well beyond the current capacity of general purpose DSP, although if implemented in dedicated hardware the filters are simplified by the observation that their input is binary valued.

It will be appreciated that, in the prior art, entropy coding is applied to the bits of the one-bit signal, or to an error signal resulting from prediction. The context for the entropy coding is generally provided by the previous history of decoded bits.

Known compression techniques for one-bit audio either achieve poor compression or have high computational requirements, sufficiently so as to require implementation on special purpose dedicated hardware. For an application such as a carrier for consumer audio, the decoder will be replicated in every consumer replay device and so it is important to minimise the computational requirements.

It is an object of the present invention to provide methods of coding one-bit audio so as to simultaneously achieve good compression and to reduce the computational complexity of the decoder to the point where implementation on general purpose digital signal processors is feasible.

Summary of the Invention The present invention provides a compression system suitable for one-bit audio signals that uses entropy coding whose context includes information transmitted for the purpose, as well as (usually) the previous history of decoded bits or tuples.

The context information will itself usually be transmitted in compressed form, for example using lossless compression. If the compression involves entropy coding, the same principle may be used recursively.

In one class of embodiments, the additional context information is a decimated version of the one-bit signal. A decimation ratio of eight is convenient.

The decimated signal will usually be multi-bit. The one-bit signal is divided into "tuples"of eight bits, and each tuple is entropy coded using nearby samples of the decimated signal as context.

This scheme is well suited to implementation on general purpose Digital Signal Processors (DSPs) because it allows the main operations to be performed at a lower sampling rate than that of the incoming one bit signal (typically 352KHz instead of 2.8MHz). The decimated signal is well suited to lossless compression using prediction, and prediction using transversal filters is much more efficient on the decimated signal than it would be on the original one-bit signal, not only because of the lower sampling rate but also because fewer taps are required for a given spectral resolution.

The essence of lossless compression is to identify characteristics of the class of signals that are to be compressed. Having identified appropriate characteristics, we seek techniques that will compress signals which possess those characteristics (which will inevitably expand the majority of signals which do not possess those characteristics).

The characteristics that we identify for one-bit audio are: 1. The signal does not possess a white spectrum at low frequencies; and, 2. Although the majority of the power in the signal is quantisation noise, this noise is not random but exhibits short term structure due to the deterministic action of whatever modulator produced it.

Most of the power of a one bit signal is quantisation noise. A one-bit modulator will have used noise shaping to shape this noise away from the low frequency (eg <100KHz) regions so that it doesn't audibly interfere with the audio, spreading this excess noise out smoothly over the wideband region (eg. 100KHz- 1.4MHz). Below 100KHz though the spectrum usually shows quite wide variation (perhaps of the order of 80dB) and further detail corresponding to the music (which typically might have most of its power below 1KHz falling off at higher frequencies).

Thus exploitation of the non-white spectrum is advantageously narrow band (ie we are exploiting long term structure), but also the region capable of exploitation is a small proportion of the 1.4MHz bandwidth of a DSD signal.

The low bandwidth of the interesting portions of the spectrum suggests that attempts to exploit the spectrum at the full sampling rate are wasteful of computational resource. If we decimate the signal to a lower sampling rate, we will still be left with a signal which contains the characteristic of a significantly non-white spectrum which we aim to exploit. In decimating though, we will have discarded the bulk of the wideband signal. But this portion of the original is essentially the noise which we hope to exploit for short term structure.

Accordingly, we propose to split the original one-bit-audio stream into two information streams, one of which can profitably be coded exploiting long term structure, the other of which can be coded exploiting short term structure. In combination they are sufficient to reconstruct the original one-bit-audio stream. Both information streams will be transmitted from the encoder to the decoder. Clearly we must take care not to introduce additional information in the splitting process, else we will negatively impact data-rate.

One way to do this is to choose a many-to-one function (constraining function) of the one-bit-audio which preserves one of the two characteristics we aim to exploit. By virtue of the constraining function being many to one, the resultant constraining signal contains incomplete information about the original one-bit-audio signal and additional information (selection signal) is required to allow the original one-bit-audio signal to be reconstructed. We hope that this selection signal will be amenable to compression by exploiting the other characteristic.

As an example, suppose we are coding an n bit fragment of one-bit signal (n might typically be several thousand). There are 2"possible such signals but different one-bit signals will produce different constraining signals. Only some number k (very much smaller than 2") of those possible one bit signals would produce the actual constraining signal we have. The selection signal is the information stating which of those k plausible one-bit signals was actually present.

This is a 1 in k decision, which can then be entropy coded given assumptions about the relative probabilities of those k possibilities.

There is quite a correspondence with conditional probabilities. A priori, all 2" one-bit signals were possible, with our thoughts about the characteristics of one-bit- audio signals leading us to view some as more probable than others. We make an observation of the result of applying the constraining function. A postiori the probabilities are modified by our knowledge of the constraining signal, with the probability of occurence of all (2"-k) one-bit signals which correspond to different constraining signals being zero.

The act of producing the selection signal can also be viewed as being a process of information subtraction. Starting with the one-bit signal and the constraining signal, which contains some of the information in the one-bit signal, we form the selection signal, which by construction contains precisely that information in the one-bit signal that was not in the constraining signal.

A sensible choice for the constraining function is decimation. Decimation lowers the sampling rate of the signal and thus its bandwidth. The decimation filter keeps the low frequency information (which was where the spectral properties we plan to exploit reside) and discards the rest of the original spectrum (which we expect, for one-bit-audio, to be fairly flat and unexploitable by spectral means). The computational load of running an FIR filter with a given spectral resolution (a common technique for exploiting non-flat spectra) reduces as the square of the sampling rate, making the reduction in sampling rate very beneficial for computational cost.

We now need a practical method of performing the information subtraction step, since the above description operates in terms of whole signal fragments which are inconveniently large to work with. For a wide class of constraining functions we can calculate the selection signal in an incremental manner. Broadly these are constraining functions where the constraining function is casual, ie there is a sequence of bits in the one-bit signal (roughly evenly spaced) such that each successive value of the constraining signal depends on the corresponding bit and previous bits of the one bit signal, but not any subsequent bits. Here we identify each value of the constraining signal with the portion of the one bit signal between the last bit that influences its value and the last bit that influences the previous value of the constraining signal. Now, we can produce a sequence of selection signal values corresponding to each portion so that the sequence of portions is communicated efficiently to the decoder. If the decoder can calculate all previous portions, and knows the current value of the constraining signal then it can work out the contribution of the previous portions to this value and derive a partial constraining function as a function purely of the current portion of one-bit signal.

Knowing the value of the partial constraining function at this point, most settings for the portion are known to be impossible (since they would yield different values of the partial constraining function) and the current value of the selection signal is a decision amongst the plausible candidates for the portion which yield the correct value for the partial constraining function.

There are a number of considerations in choosing a suitable decimation filter :

1. We want to decimate by as much as possible in order to reduce the computational load of the filtering to exploit the non-flat spectrum; 2. Alias products from inadequate stopband rejection can fill in quiet spectral regions we hoped to exploit, thus limiting compression; 3. We must obtain a good balance between information in the constraining signal and information in the selection signal so that both have room to benefit from exploiting their structure; and, 4. A linear constraining function makes working with the selection signal far easier-which suggests integer decimation filter coefficients so that signal quantisation is not required.

We find that a good balance between these aims is achieved by decimating by a factor of 8 using a decimation filter with coefficients {1, 2,3, 4,5, 6,7, 8,7, 6,5, 4,3, 2, 1}. A regular decimation ratio of 8 leads to regular portions of 8 bits (8-tuples) of the one-bit signal. The linearity of the constraining function makes the partial constraining function a constant function of the corresponding tuple-it is the weighted sum (with weights 8,7, 6,5, 4,3, 2,1) of the bits in the tuple. The value of the partial constraining function is easily computed from the value of the constraining signal by subtracting the weighted sum (with weights 0,1, 2,3, 4,5, 6,7) of the bits in the previous tuple. Consequently, instead of selecting from the 256 possible tuples, the selection signal is only selecting between those tuples with the correct value of the weighted sum-there are at most 14 of those.

Brief Description of the Drawings Examples of the present invention will now be described in detail with reference to the accompanying drawings, in which: Figure 1 shows prior art methods of losslessly encoding and decoding one- bit audio; Figure 2 shows a more general method; Figure 3 shows a method according to the present invention; Figure 4 shows a method including lossless compression of auxiliary signal ; Figure 5 shows a spectrum of an audio signal sampled at 2.82MHz and quantised to 1 bit using a 7th order delta-signal modulator; Figure 6 shows in more detail an implementation of an encoder and decoder pair according to the present invention; Figure 7 is the same implementation as figure 6 but configured to process a signal consisting of 8-bit tuples ;

Figure 8 shows an embodiment of a lossless encoder according to the present invention; Figure 9 shows a corresponding lossless decoder to the encoder of Figure 8; Figure 10 illustrates a technique for recovering an original one-bit-audio signal utilising knowledge of a constraining signal ; Figure 11 shows an encoder that generates a noise-shaped constraining signal ; Figure 12 shows a decoder corresponding to the encoder of Figure 11.

Figure 13 shows an example of an embedded lossless encoder adapted to compress multi-bit-audio ; Figure 14 shows an example of an embedded lossless decoder corresponding to the multi-bit audio lossless encoder of Figure 13; and, Figure 15 shows an example spectrum of the constraining signal of the first embodiment.

Detailed Description Figure 1 shows a conventional encoder and decoder for the compression of one-bit audio signals. The encoder receives a one-bit audio input signal that is passed through a one-sample delay element z'to a prediction unit. The prediction unit predicts the current sample value, and the predicted value and the input sample are fed to an exclusive-OR gate. If the prediction is correct, the output of the gate will be 0, otherwise the output of the gate will be 1. If the prediction is correct most of the time, the binary stream from the gate output will consist mainly of 0's, a situation that can advantageously be handled using entropy encoding [Langdon, G. G. Jr.,"An Introduction to Arithmetic Coding"IBM J. Research & Development, vol. 28 no. 2 (March 1984) ]. The binary stream is therefore passed to an entropy encoder, whose output is transmitted.

In the decoder on the right of Figure 1, the transmitted signal is received and is entropy decoded. The resulting binary stream is passed to an exclusive-OR gate, the other input of which is fed from a prediction unit identical to that in the encoder. Assuming that initialisation of the prediction unit and of the z-1 delay are consistent between the encoder and the decoder, it is easily verified that the output of the exclusive-OR gate in the decoder will be the same as the input that was fed to the encoder.

The z~t delay element is needed in order to avoid a delay-free feedback loop in the decoder.

Shown dotted in Figure 1 is a path for additional information to pass from each prediction unit to the corresponding entropy encoder or decoder. This takes the form of a bit-by-bit estimate of the probability that the prediction is correct. In the field of entropy coding, such information is known as"context", and its presence allows the entropy encoder to operate more efficiently.

From an information-theoretic point of view, and given the presence of context, we have recognised that the exclusive-OR is in fact irrelevant : the prediction of the current value could be regarded as part of the context. This point is made explicit in Figure 2, which is our generalisation of the prior art encoder and decoder of Figure 1.

In Figure 2, the input signal 130 is fed through a sample delay 10 to a context deriver 20. Context from the context deriver 20 is fed to an entropy encoder 30 and the resulting coded signal is transmitted through a path 70.

The decoder portion of Figure 2 comprises an entropy decoder 60, which receives the coded signal from the transmission path 70 and decodes to furnish the output signal 140. Context for the entropy decoding is provided by a context deriver 50, which is fed from the output signal 140 through a sample delay 40.

It will be seen that the context derivers 20 and 50 in Figure 2 replace or generalise the prediction units and exclusive-OR gates of Figure 1.

Figure 3 show an example of a compression system according to the present invention. It will be seen that Figure 3 is like Figure 2 but with additionally a data path from the encoder's input 130 to an auxiliary data deriver 80. Auxiliary data from the auxiliary data deriver 80 are transmitted to the decoder through an auxiliary data transmission path 90. The auxiliary data are fed to the context derivers 20 and 50 in the encoder and the decoder respectively, in order to provide additional context and thereby improve the compression.

In practice, the signals through the transmission path 70 and the auxiliary data transmission path 90 may be combined into a single composite stream.

Figure 4, also according to the present invention, is like Figure 3 except in the treatment of the auxiliary data. An auxiliary signal deriver 100 derives an auxiliary signal 150 that is fed through a lossless encoder 110 in order to provide compressed auxiliary data that is transmitted through the auxiliary data path 90. In the decoder portion of Figure 4, a lossless decoder 120 decodes the auxiliary data in order to recover an auxiliary signal 160. By virtue of the lossless property of the encoder 110 and the decoder 120, the auxiliary signals 150 and 160 are identical.

In essence, the difference between Figure 3 and Figure 4 is that the auxiliary

information supplied to the context deriver 20 is transmitted verbatim in Figure 3, but in coded form in Figure 4.

Figure 3 and Figure 4 show the flow of information, but do not specify the granularity of the data flowing along each path. If we delete the paths through the delays 10 and 40, we can consider the entire one-bit signal, or at least a substantial chunk of, for example, a few tens of milliseconds, as the unit of information. In this case, referring to Figure 4, the auxiliary signal 150 corresponding to the entire chunk will be computed and context derived, then the entire chunk will be entropy coded.

The auxiliary signal 150 will also be losslessly and transmitted. The decoder will losslessly decode the auxiliary signal and so derive context for the entire chunk, which can then be entropy decoded.

At the other extreme, the data can be processed bit-by-bit, and the delays 10 and 40 can then be delays of one sample, as required to prevent a delay-free feedback loop in the decoder.

An advantageous intermediate possibility is processing of tuples of, for example, eight bits. Thus the input 130 can be considered as a stream of tuples, each tuple being entropy coded in sequence by the encoder 30. The delay 10 would then normally be a delay of one tuple.

If the entropy encoder 30 codes tuples individually (or bits, since a bit can be regarded as a degenerate case of a tuple), then context is provided separately for each one: we refer to this as"local context". One use for local context is to take advantage of the local structure of a typical one-bit signal produced by a Delta Sigma Modulator (DSM). For example, a run of three consecutive ones will typically occur with probability much smaller than the (1/2) 3 = 1/8 that would be expected from a random stream, and a run of four consecutive ones is still more rare. Hence, if the previous tuple ends with a run of two or three ones, then the first bit of the current tuple is known to be zero with high probability. Local context derived from one or more adjacent tuples can be regarded as a means of mitigating the"end effects"incurred in coding tuples separately.

In general, increasing the amount of context provided to the encoder 30 will reduce the quantity of data that needs to be transmitted through the path 70. If this context is transmitted explicitly through the path 90, the additional data through path 90 must be balanced against the saving through path 70. Context that is derived from previously encoded or decoded tuples (for example. through the paths 10 and 20, or paths 40 and 50 in figure 4) does not incur additional transmission overheads and so should be used as much as possible subject to the computational cost.

The use of context from"previously"encoded or decoded tuples does not require that the tuples be processed in the natural order in which they appear in the one-bit signal. For example, it is possible to encode all the even-numbered tuples within a segment of the signal, and then to encode all the odd-numbered tuples. In this case the even tuples are encoded without the benefit of context from adjacent tuples, but the odd numbered tuples are encoded with the benefit of two-sided context.

Thus in general, the tuples can be encoded in any order, provided the order is either predetermined or is communicated to the decoder, and the entropy coding of a tuple may use context derived from any or all previously encoded tuples, in addition to the explicitly transmitted context.

The use of context from previously encoded tuples imposes an order, or a partial order, on the decoding of tuples. For example, in the even/odd case mentioned above, at least some even samples must be decoded before any odd samples, though there may well still be considerable freedom of choice.

The above description does not imply that all tuples are of the same size, nor even that the bits that constitute a tuple are consecutive within the one-bit stream, though practical considerations may make it unusual to take advantage of such generality.

Consider an embodiment of the present invention in which the auxiliary signal 150 is a decimated version of the input signal 130. This decimated signal will generally be multi-bit, and will be at a sampling rate (for example 352KHz) not greatly higher than some of the higher rates generally used to transmit Pulse Code Modulation (PCM) audio. Consequently, the lossless compression techniques conventionally used for PCM audio can in principle be adapted for use on the auxiliary signal.

Typically, such a lossless encoder incorporates linear signal prediction followed by entropy coding. Prediction is effective on signals whose frequency spectrum is non-flat, indeed its aim is to produce a prediction residual with a flat or almost flat spectrum. Linear prediction has also been used in prior art one-bit compression systems [Bruekers, F. , Oomen, W. , and van der Vleuten, R. , van de Kerkhof, L.,"Improved Lossless Coding of 1-bit Audio Signals AES 130rd convention, reprint no. 4563 (1997 Sept)].

Prediction is of value when compressing signals whose frequency spectrum deviates substantially from flatness: indeed it is the aim to replace the signal by a prediction residual whose spectrum is flat or nearly so.

Figure 5 shows a spectrum of a typical one-bit signal produced by a DSM sampled at 2. 82MHz. It will be seen that the spectrum is already fairly flat at frequencies above 70KHz. Therefore, a predictor will be of little value unless it has sufficient spectral resolution to exploit the substantial but narrow dip below that frequency.

In the case of the usual type of predictor using a transversal (Finite Impulse Response) filter, high spectral resolution implies a substantial number of taps. The above-cited Bruekers et. al. reference reports 100 taps which, when executed 2. 82x106 times per second, represents a substantial computational load. However, when prediction is applied to a decimated signal with a decimation ratio of, say 8, then a much smaller number of taps is required in order to achieve the same spectral resolution. As the filter is also executed at one-eighth the rate, the number of operations required per second to implement the filter is reduced by a factor of approximately 64.

In order better to describe specific implementations of the present invention, it will be convenient to use slightly different language.

We start with the observation that, if the additional context signal is derived as a deterministic function of the one-bit signal, then knowledge of the context signal partitions the set of all possible one-bit signals into a first set of those that would result in that context signal and a second set of those that would not. Considering the entropy coding process, the context information thus"constrains"the signals that need to be considered. The context information is described as a"constraining signal", and the deterministic function used to derive it is described as a "constraining function". The information referred to above as"the entropy coded one-bit signais described as a"selection signal"that selects from the first set the unique one-bit signal that was presented to an encoder.

A decoder starts by decoding the constraining signal to determine the first set, and then uses the selection signal to"disambiguate"which of the one-bit signals in that set was actually presented to the encoder.

This process is illustrated in Figure 6. The reference numerals 130,100, 150 110,90, 120,160 and 140 in Figure 6 correspond precisely to the items with the same numerals in Figure 4. The input signal 130, illustrated as containing bits 101100... 110010, is passed through the constraining function 100 to furnish the constraining signal 150, which is normally a multi-bit signal and is losslessly compressed by the multi-bit lossless encoder 110. The compressed constraining signal is transmitted through path 90, where an identical copy 160 of the constraining signal 150 is fumished by the multi-bit lossless decoder 120.

Mathematically, the constraining function 100 is a many-to-one mapping, and therefore the inverse constraining function 32 yields not a unique result, but a set 33 of possible results, one element of which must be the input 130 to the constraining function 100. Thus, the inverse constraining function 32 determines, from the constraining signal 150, the set 33 of possible signals that might have given rise to the signal 150. An identification unit 35 compares the input signal 130 to the set 33, and the identification decision is passed to an entropy encoder 37 and transmitted through path 70.

Comparing Figure 6 with Figure 4, blocks 35 and 37 together in Figure 6 correspond to the entropy encoder 30 of Figure 4, and the inverse constraining function 32 in Figure 6 corresponds to the context deriver 20 of Figure 4.

Considering the decoding part of Figure 6, an inverse constraining function 62 furnishes a set 63 of possible signals, identical to the set 33. An entropy decoder 67 receives the coded identification decision and decodes it to fumish a replica, which is then used as a selection signal. The selection signal is used by a selection unit 65 to select a signal 101100... 110010 from the set 63. This signal is a replica of the one-bit signal that was identified by the identification unit 35 in the encoder, and it is passed to the output 140.

The concepts of"constraining","selection"and"disambiguation"can be applied to the whole signal, or, if entropy coding and decoding is performed on tuples individually, it can be applied more locally to the individual tuples. In this case we define the"local context"of a tuple as the context that is used for encoding or decoding that tuple.

A preferred embodiment uses tuples in one-to-one correspondence with samples of the constraining signal. In this case the local context is taken from the corresponding sample of the constraining signal, and potentially also from previous samples of the constraining signal and from previously encoded tuples. Context from previous samples and previous tuples is potentially advantageous to compression, and is obtained without the need to transmit additional information, though its use needs to be balanced against computational cost.

Figure 7 is akin to Figure 6, but is adjusted to show the encoding and decoding of individual tuples. It is assumed that the tuples are all of the same size D bits, that they are processed in the natural order in which they appear in the input sequence. We shall refer to them as D-tuples. The samples of the constraining signal 150 have a one-to-one correspondence with the D-tuples, and the constraining signal thereby has a sampling rate equal to 1/D times the sampling rate of the one-bit signal. Therefore, the D-tuples are processed at the same rate as the

samples of the constraining signal, and Figure 7 can be interpreted as a signal flow diagram that is executed at the subsampled rate of 1/D times the sampling rate of the one-bit signal. In particular, the unit delays 10 and 40 are delays of one D-tuple.

Figure 7 illustrates the case in which the local context derived by the inverse constraining function 32 is derived in dependence on the sample of the constraining signal that corresponds to the current tuple, and also in dependence on the previous tuple. The dependency on the previous tuple is provided through the unit delay 10.

In more detail, a D-tuple 130, representing D bits of a one-bit signal, is fed to a constraining function 100, which furnishes a sample 150 of the constraining signal.

The constraining function 100 may have memory, and the sample 150 may thereby depend also on previous tuples. The sample 150 is fed to the multi-bit lossless encoder 110, whose output is fed through transmission path 90 and losslessly decoded by the multi-bit lossless decoder 120, which produces a replica 160 of the sample 150.

The sample 150 of the constraining signal, and the previous tuple provided by the delay path 10, are fed to the inverse constraining function 32. The inverse constraining function 32 may have memory, so its output may depend also on previous values of the constraining signal and/or on tuples prior to the immediately previous one. It furnishes a set 33 of D-tuple values. The set 33 consists of those D-tuple values that, if presented alternatively as input to the constraining function 100 in place of the actual D-tuple 130, would result in the same output from the constraining function as results when the actual D-tuple 130 is presented.

By construction, the set 33 must contain the value of the input D-tuple 130 as an element. The identification unit 35 identifies which element of the set 33 is the input D-tuple 130. This identification decision is the information that is entropy coded by the entropy encoder 37, and transmitted through the path 70.

Although the constraining function 100 and the inverse constraining function 32 both have memory, it is possible for the inverse constraining function 32 to be constructed such that dependencies on previous values cancel and that the context is thereby a function of the current tuple only. In this case, the context is described as"locally constraining".

Knowledge of the locally constraining context allows those values of the current tuple that would result in a different context to be excluded immediately from consideration. In a table-driven implementation, this reduces the sizes of the necessary tables. Thus, although the present invention does not require the locally constraining property, it is advantageous to the implementation.

There may be additional"guidance context"supplied to the entropy encoding and decoding process, as well as the"constraining local context". The guidance context may not have the locally constraining property, for-example it may be a function also of tuples that have not yet been encoded. In that case it can still be part of a signal that is regarded as"constraining"at the global level.

In a practical implementation, it may be difficult or impossible to isolate the identification unit 35 as a functional unit that is separate from the entropy encoder 37. For example, suppose that D=8, and that the set 33 contains 18 elements.

Then one way to convey the identification decision from the identification unit 35 to the entropy encoder 37 would be encode the decision as an integer in the range 0 to 17. In this case the identification unit 35 would perform the task of encoding the decision as an integer. Another way would be to identify the 8-tuple 130 by sending its 8 bits explicitly, as an integer in the range 0 to 255. In this case the identification unit is vacuous, and there would instead be a direct path from the input 130 to the entropy encoder 37. However, in order to encode the integer efficiently, the entropy encoder 37 needs to"know"that it must take one of a set of 18 possible values. In this case the set 33 is also communicated directly to the entropy encoder 37.

To state the same thing more concretely, if the entropy encoder is of a type that employs a table of probabilities, indexed by the value to be encoded, in the first case the table will have 18 entries, and in the second case the table will have 256 entries, all but 18 of which are conceptually zero. (However, the conceptually zero entries will never be accessed, a fact that makes it possible to merge the tables for the different wi into a single table of length 256.) In a practical implementation, the set 33 need not necessarily be represented by explicit enumeration of its elements. If the number of possible values for a set is conveniently small, the set can be represented by an index to a table. Altematively the set 33 may be represented by a property of the D-tuples it contains, for example the property that the number of bits set to binary"1"is 5. In the case that the inverse constraining function 32 furnishes a property, the property is a property of the current tuple only, whereas the constraining signal 150 is usually a function of previous tuples also. In the event that the constraining signal 150 were a function of the current tuple only, it could be fed directly to the identification unit 35, and the inverse constraining function 32 would be vacuous. If the identification unit 35 were also vacuous as suggested above, then the constraining function 150 would be fed directly to the entropy encoder 37.

So far we have mostly assumed that the only context provided is locally constraining context. Such context constrains the possible values that a tuple can

take, but it does not provide information on the relative probabilities of those values.

In order to code efficiently, an entropy encoder needs to know those probabilities.

They may be estimated by gathering statistics over a segment of the one-bit signal, or, for known types of signal, they may predetermined. A decoder must use the same probability estimates as the encoder. If the probability estimates are not predetermined they may be communicated from the encoder to the decoder, possibly with parallel adaptation in the encoder and decoder as the processing proceeds.

Guidance context, if provided, can be used to adjust the probability estimates on a tuple-by-tuple basis. For example, there is no reason why evaluation of the constraining function should not run ahead of the entropy coding, so that the "following"sample of the constraining signal is available to provide guidance context. This information would be fed to the entropy encoder 37 in order to refine the probability estimates used to encode the current tuple.

In the decoding part of Figure 7, the inverse constraining function 62 furnishes a set 63 of D-tuples that is a replica of the set 33. The entropy decoder 67 receives the encoded information from the transmission path 70 and decodes it to an identification decision that is a replica of the one produced by the identification unit 35 in the encoder. This decision is fed to the selection unit 65 that selects the correct D-tuple from the set 63 and feeds it to output 140. The unit delay 40 is analogous to the delay 10 in the encoder.

It is possible that the selection unit 65 may not be separately identifiable, in which case the set 63 is fed directly to the entropy decoder 67, as discussed above in the case of the encoder.

Likewise, if guidance context is used in the encoder, it must also be used in the decoder, and would be fed to the entropy decoder 67.

Figure 8 shows a lossless encoder in which the constraining function takes the form of a decimator. The input is adapted to receive a one-bit-audio signal 1300, which is divided into D-tuples by a serial-to-parallel converter 170, thus furnishing the stream of D-tuples 130.

The input signal 1300 feeds also the decimator 1000, which decimates by a factor D and thus furnishes the decimated sequence of samples 150. The decimator 1000 in Figure 9 is to be identified with the constraining function 100 in Figure 7. There is an apparent difference in that the decimator 1000 is shown in Figure 8 as fed from the one bit signal 1300 rather than from the stream of D-tuples 130, but it will be clear that these two signals convey the same information and the

difference is more a matter of description of the interface than any change of functionality.

The decimator 1000 consists conceptually of a decimation filter followed by a sub-sampling (not shown). In what follows we shall assume that the decimation filter is causal, so that each decimated sample 150 is a function of the corresponding tuple and previous tuples only. Thus, the decimated signal 150 in Figure 8 corresponds precisely to the constraining signal 150 in Figure 7, and in both figures the embedded encoder 110 losslessly encodes this signal 150.

In Figure 8 the D-tuple 130 is fed through a delay element 10 having a delay of one D-tuple, to a summer 34, which thereby receives the preceding D-tuple. The summer 34 furnishes a weighted sum of the bits in that preceding D-tuple. A subtractor 38 subtracts the weighted sum from the constraining signal 150 to produce the value 33, also labelled w.

We now make the further assumption that the decimator 1000 uses a Finite Impulse Response decimation filter (not shown) having no more than 2xD taps, such that its output 150 is a function of the current tuple 130 and the immediately preceding tuple only. We assume further that the summer 34 has coefficients equal to the coefficients of the decimation filter that are applied to the bits of the immediately preceding tuple, and that its output is thereby equal to the contribution of these bits to the signal 150. Therefore, by virtue of the delay 10, the subtractor 38 removes the contribution of these bits and its output 33 (=wd is a function or property of the current tuple only.

The value 33 is therefore a representation of a set of possible values of the current tuple. Thus, the value 33 (=wd in Figure 8 can be identified with the set 33 in Figure 7, and the summer 34 and subtractor 38 in Figure 8 together perform the function of the inverse constraining function 32 in Figure 7.

In order that the output 33 correctly represent a set that contains D-tuple 130 on the very first execution of the algorithm, the state of the filter in the decimator 1000 and of the delay element 10 must be initialised consistently.

More generally, the decimator 1000 could use a longer decimation filter that spans several D-tuples, provided that the summer 34 is adjusted to include appropriately weighted bits from several previous tuples such that the subtractor 38 subtracts the contribution from all previous tuples. The possibility of Infinite Impulse Response filtering in the decimator 1000 and in the summer 34 is not excluded.

The identification unit 35 identifies which of the tuples in the set represented by the value 33 is equal to the tuple 130, and the entropy encoder 37 encodes this identification decision.

The outputs of the entropy encoder 37 and the embedded lossless encoder 110 are combined, along with any configuration information or other useful data 180, by a combiner 190, to fumish a single compressed bitstream that is delivered to an output 200.

Figure 9 shows the lossless decoder corresponding to the encoder of Figure 8. A splitter 210 splits the compressed bitstream 200 (as output by the encoder) into its constituent parts. Any configuration parameters 185 are extracted and used for initialisation etc.. Entropy decoder 67 decodes the entropy coded identification decisions. The losslessly coded decimated signal is decoded by the lossless decoder 120 to fumish a replica 160 of the corresponding signal 150 in the encoder of Figure 8.

Assuming inductively that the delay element 40 produces the same output as the delay element 10 in the encoder, a summer 64 in the decoder will fumish the same output as the summer 34 in the encoder, and a subtractor 68 in will fumish a value 63 that is a replica of the corresponding value 33 in the encoder. As has already been explained in relation to the encoder, the value 33 represents a set of tuples, and therefore so also does the value 63. The value 63 is therefore fed to the selection unit 65 along with identification decision from the entropy decoder 67, and the selection unit 65 selects the tuple 140 that is a replica of the tuple 130 in the encoder. If the delay unit 40 has first been initiale to the same value as the delay unit 10 in the encoder, the two delay units will thereafter be presented with the same input value, this confirming the inductive assumption above.

Finally the tuple 140 is serialised by a serialiser 175, which produces a replica 1400 of the one-bit signal 1300.

In order to get from the value 63 to the set that it"represents", the decoder needs to know what"property"was used to obtain the value. This depends on the form of the decimation function 1000, which must therefore be known to the decoder, even though decimation is not performed in the decoder.

We will discuss the present invention further by means of describing several embodiments in greater detail.

In a first embodiment, the constraining function is of the form of a decimation filter with integer coefficients. Specific values that are used for illustration are a decimation ratio D=8 with a 15 tap decimation filter with coefficients {1, 2, 3, 4, 5, 6, 7, 8, 7, 6, 5, 4, 3, 2, 1}.

The operation of the subtractor 38 and the identification unit 35, or alternatively of the subtractor 68 and the selection unit 65, are illustrated in more detail in Figure 10. In this figure, the horizontal axis represents time, or sample

number of the one-bit signal. The rectangle 400 represents those bits that have already been fully processed or"disambiguated", which means"decoded"in the case of a decoder, or in the case of an encoder that any identification decisions required to determine these bits have already been passed to the relevant entropy encoder.

The item 1500 represents the weighting given to each bit in calculating the current sample of the constraining function. This is shown as an isosceles triangle to reflect the symmetry of the assumed weights {1, 2,3, 4,5, 6,7, 8,7, 6,5, 4,3, 2, 1}. This weighting is now separated into a weighting 340 that is applied to those bits that have already been disambiguated, and a weighting 330 that is applied to those bits that have not. Given knowledge of the sum of bits weighted by weights 1500 and the sum of bits weighted by weights 340, the sum of bits weighted by weights 330 can be computed by subtraction. This is the function of the subtractor 38 in Figure 8, or of the subtractor 68 in Figure 9. It will be seen that the sum of bits weighted by weights 330 is precisely the value 33 in Figure 8, or the value 63 in Figure 9. We refer to this sum as w.

Item 440 in Figure 10 illustrates how the value of w, can define a set.

Suppose for example that wE = 8. In the case considered, the weights 330 are {8, 7, 6, 5, 4,3, 2, 1}, and it is easily verified that the only possible 8-tuples of bits that sum to 8 when added with these weights are the six 8-tuples : {1, 0, 0, 0, 0, 0, 0, 0} {0, 1, 0, 0, 0, 0, 0, 1} {0, 0, 1,0, 0,0, 1, 0} {0, 0, 0, 1, 0, 1, 0, 0} {0, 0, 0, 1,0, 0, 1, 1} {0, 0, 0, 0,1, 1,0, 1} Therefore in this instance, the identification decision that must be encoded by encoder 37 and, in the decoder, decoded and used as a selector by the selection unit 65, is a 6-way choice between these possibilities. Without this information, we have a 6-way ambiguity about the next eight bits, which is resolved or "disambiguated"by the identification/selection information.

The amount of information required to disambiguate depends strongly on the value w. For example, if wj = 0 or if w = 36, the only possible 8-tuple is {0,0, 0,0, 0,0, 0, 0} or {1, 1,1, 1, 1, 1, 1, 1} respectively, and no information is required at all.

Nothing in the above discussion requires the decimation function to be constant or the number of bits considered at each stage to be constant. However for reasons of regularity it would be normal to encode a constant number bits at

each stage, and encoding of 8-tuples is convenient both because it results in tables of a convenient size and because of compatibility with conventional computer architectures.

For implementation on a conventional computer or Digital Signal Processor, the serial to parallel conversion 170 in Figure 8 would probably take place in extemal hardware. The decimator 1000 would receive 8-tuples 130 instead of bits 1300. Its output 150 is the sum of the bits in the previous tuple weighted with weights {0, 1,2, 3,4, 5, 6, 7}, plus the sum of the bits in the current tuple weighted with weights {8,7, 6,5, 4,3, 2, 1}. Each of these sums could be computed by a single look-up in a table of length 256.

We now repeat the description of the encoder of Figure 8 in order to introduce more mathematical language that will be used to explain other embodiments.

Let us denote the values of the one-bit signal 1300 by {bd, the values of the constraining signal 150 by {fi}, and the values 33 that result from subtracting the output of summer 34 from the constraining signal 150 by {wj}. We shall denote successive D-tuples of one-bit-audio bits as B = boi-. boi+D-i. We shall suppose the binary values of the bi to be 0 and 1.

We shall assume further that the encoder has divided the long or semi- infinite stream of one-bit samples presented to its input into segments or"blocks"of some convenient length for encoding, say of order 105 samples (representing a few tens of milliseconds if the one-bit signal is a DSD audio signal). Let N be the number of 8-tuples in the block currently being encoded.

The encoder decimates the 8N bits (bo.. bBN-,) in the block by 8 using the 15-tap decimation filter with taps {1, 2,3, 4,5, 6,7, 8,7, 6,5, 4,3, 2, 1} to yield the multi-bit constraining signal fo.. fN-1. Subtraction of the output of summer 34 leaves a signal filtered by the last eight of these taps: {8, 7, 6, 5,4, 3, 2, 1}..

Thus wi = 8bei + 7b8i+1 + 6b8i+2 + 5bu+3 + 4baya + 3bs) + 2bows + b8i+ «.

And f, = b8i-7 + 2b8i-6 + 3b8i-5 + 4bau + 5b8i-3 + 6b8i-2 + 7b8i-1 + w,-32.

The offset of 32 is to compensate the asymmetry in considering the values of {bj} to be 0 & 1. To allow fo to be calculated, b-7..b-1 are assumed to take some predetermined values e. g. {1, 1, 0,0, 1, 1, 0} Possible values of wu lie in the range 0 to 36, so the 256 possible values that Bi can take are partitioned into 37 disjoint subsets by the values of w. Some of these subsets contain only one member (for example wj=0 corresponds only to Bj= {0,0, 0,0, 0,0, 0, 0}) but most contain more, to a maximum of 14 in the subset corresponding to wi=18.

In order to furnish probability estimates for use by the entropy encoder, the number of occurrences of each 8-tuple in the block to be coded is counted, and the counts are used to compute conditional probability estimates P (B=x | wj=y) : the probability that Bi takes each possible value x given the value of wj.

In more detail, for any 8-tuple B, define W (B) as the weighted sum of the bits of B, with weights {8, 7,6, 5,4, 3, 2, 1}. Thus W (Bj) = wj In the following, index i denotes the sequence number of tuple Bi within the stream, buts and are indices over the possible values that the ith tuple Bi might take.

Let Nj be the number of occurrences of 8-tuple Bj within the block to be coded. Then the a priori probability estimate that tuple Bj takes the value Bj is and the corresponding estimate given the context information that way is where X is the characteristic function.

If guidance context is used, these probability estimates will be modified accordingly.

It may be necessary to quantise these probability estimates to values that the entropy coder can handle. If so, this quantisation must ensure that no 8-tuple which actually occurs in the block has its probability quantised to zero (which would make it impossible to code), and preferably it should also ensure that the quantised probabilities sum to unity. This set of probabilities (or some equivalent) is communicated to the decoder as configuration information for the block. All the B, are entropy coded using those probabilities, furnishing the entropy coded selection signal.

The constraining signal {fj} is encoded with some embedded lossless encoder suitable for multi-bit audio. Suitable multi-bit lossless coding systems will be described later.

The decoder operates as follows : The fi for the block are recovered by means of the embedded lossless audio decoder corresponding to that in the encoder.

The initial values h7.. b1 are assumed by agreement with the encoder to have the predetermined values {1, 1,0, 0,1, 1, 0}. Knowing fo and b 7.. b"the decoder calculates wo=fo +32-b 7-2bT-3b-4b-5bq-6b 2-7b. (Practical decoders can omit the constant 32). Knowing wo the decoder reduces the 256 possible values for Bo to a small subset (as previously discussed) and selects which was actually presented to the encoder by using the entropy decoded selection signal.

Knowing Bo, B, can be decoded in the same manner and so on until the whole block of one-bit-audio has been decoded losslessly.

When coding a multichannel one-bit-audio signal, the probabilities for entropy coding the selection signals may be calculated across all channels and the same probabilities used for all channels. This is appropriate because these probabilities are largely a function of the modulator used to generate the one-bit- audio signal (which we may reasonably assume had the same architecture on all channels) rather than the audio signal that was encoded.

In a second embodiment, the first n D-tuples of the block of one-bit-audio signal are transmitted verbatim, the rest of the block being coded as described above. This is done because the lossless encoder 110 may code badly at first, especially if it uses prediction because there will initially be no history of the constraining signal 150 on which to base the prediction.

Practical multi-bit audio lossless coders have means of dealing with this issue, for example the first few samples of the signal 150 could be transmitted either verbatim or without the benefit of prediction. However this second embodiment allows the embedded lossless encoder 150 to be simplified by addressing the issue at a higher level instead.

In a third embodiment, both the encoder of Figure 8 and the decoder of Figure 9 perform a many-to-one function on the signal wi, labelled 33 and 63 in the two figures respectively, before using it to identify the subset of the possible D- tuples which the selection signal chooses between.

In the first embodiment as illustrated above, wi was an integer in the range [0,36]. Two examples of a many-to-one function that could be applied to wi are clipping to a smaller range (e. g. , [4,32]) or reducing modulo 32. The following discussion will assume we reduce w, modulo 32.

This step represents a discarding of information, and hence a coding inefficiency, but the loss is very small. The modulo operation confuses values in the ranges [0, 4] with values in the range [32, 36], but these values of wi are extremely rare when coding a one-bit signal from a DSM. The modulo operation can bring practical advantages in an implementation, as will now be discussed.

If the decoder were to be presented with erroneous input, we could expect erroneous output, but we would not tolerate the decoder crashing. On erroneous input, the wi signal might be calculated with out-of-range values, and so the decoder requires some range limiting process before using it to index a table. A modulo 32 operation accomplishes this nicely, as well as reducing the number of table entries (which may themselves be tables) to a convenient power of two.

Moreover sometimes the coding inefficiency from discarding information can be recovered by not coding it in the first place. It is usually sensible for the embedded lossless encoder in the encoder to transmit the signal {fi} as this signal can be expected to possess exploitable properties of an audio signal. However the embedded lossless encoder of this embodiment is not obliged to do this. The information that has to be losslessly conveyed to the decoder is in fact not {fi} but {fi modulo 32}. Should samples of {fj} prove awkward to code, the embedded lossless encoder is free to add or subtract multiples of 32 from the signal as it finds convenient.

It can be seen that this embodiment is equivalent to one of the form of Figure 6 where the constraining function is fi = (bai-y + 2ba !-6 + 3b8j 5 + 4b8j 4 + 5b8j 3 + 6b8j 2 + 7b8j, + 8b8i + 7b8i+1 + 6b8i+2 + 5b8i+3 + 4b8i+4 + 3b8i+5 + 2b8i+6 + b8i+7) modulo 32.

In a fourth embodiment, the decoder decimates by 4 instead of 8, using a decimation filter of {1,2, 3,4, 3, 2, 1}.

The procedure, however, is the same as for the first embodiment. The 16 possible 4-tuples of bits are split into 11 disjoint subsets by the value of wi. 6 values of wi correspond to exactly one 4-tuple, the other 5 values correspond to exactly two possible 4-tuples. This means the coding system only requires binary decisions from the entropy coded selection signal. This is a benefit because arithmetic decoding of binary decisions can often be implemented more easily than that of multi-way decisions.

In a fifth embodiment, the decoder does not directly compute the one-bit signal values bi from the constraining signal fj combined with the selection information.

Instead a first portion of the selection information is used to regenerate an intermediate signal, which is then used in combination with further selection information to regenerate the one-bit-audio signal.

As a variation of the first embodiment, such an intermediate signal {fi} might be the original one-bit-audio decimated by 4 using the filter {1, 2,3, 4,3, 2, 1}.

Decimating this intermediate signal by 2 with the filter {1,2, 1} yields the signal {fi} of the first embodiment.

Information to allow regeneration of {gj} might be chosen in the following manner.

Since fi = g2i-1 + 2g2i + g2i+1, given fj and 92i-1, the decoder can deduce 2925 + g2i+1 but will need additional information from the encoder in order to decode g2i & g2, +i individually. A sensible choice of additional information is g2j-g2i+i. We note that (2g2i + 92|+1) + (g2i - g2i+1) = 392i, so the decoder can deduce (g2i-g2i+1) modulo 3 without the additional information. Therefore, the additional information (g2i - g2i+1) is entropy coded using (2gaz + 92i+1) mod 3 as context (possibly combined with other context). Since the supports for 92 & 92ill overlap they are not completely independent and the possible range for 92i-92i+ is less than twice the range of gui.- 12 <= g2i - g2i+1 <= 12.

Knowing both (292i + g2i+i) and (g2,-g2i+i), the decoder can recover 92 and 92i+,.

Regeneration of the one-bit signal can then proceed in the manner described in the fourth embodiment, as if the constraining signal had been {gi} In a sixth embodiment, further"guidance"context is used to entropy code the selection signal, as well as the value of w. This means that the probability estimates used to entropy code the D-tuples depend not only on wi but also on additional information which is available to both the encoder and the decoder.

An example of guidance context is the next decimated value fi+,. This value is clearly not a deterministic function of current and previous tuples. It is found that the usefulness of this additional guidance context is different for different values of wi. Accordingly additional context may be only be used for some values of wi or different items of additional context may be used for different values of wi.

Whilst it is sensible and convenient for the constraining function to be calculated by means of a small integer valued decimation filter, it is not essential to the invention. Indeed at high decimation ratios it is increasingly difficult to find a suitable integer valued decimation filter which achieves adequate alias rejection (to avoid filling in the quiet portions of the spectrum which the embedded lossless audio encoder aims to exploit) without putting excessive information into the decimated signal which is also found to be deleterious to compression.

Accordingly in a seventh embodiment, the encoder illustrated in Figure 11 comprises a constraining function 101 that produce a high precision"real valued" constraining signal. This function might be a decimation filter with fractional coefficients or even an IIR decimation filter (with well defined internal quantisation to ensure output identical to that produced by the corresponding function 201 in the decoder of Figure 12). The high precision constraining signal is then quantised by

the quantiser 102 to produce the constraining signal 150. The scaling of this quantisation step allows the design to trade-off the amount of information which is conveyed in the constraining signal against the information conveyed in the selection signal.

Ordinary quantisation is unlikely to prove satisfactory because the noise introduced by the quantisation will fill in quiet portions of the spectrum reducing the ability of the embedded lossless encoder to exploit them for compression gain.

Accordingly this embodiment of the invention uses noise shaped quantisation so that the introduced quantisation noise can be kept out of spectral regions where the high precision constraining signal is quiet. The coefficients governing this noise shaping can be fixed, or adaptive at both encoder and decoder, or chosen by the encoder and transmitted to the decoder as side information. They may be chosen in tandem with a prediction filter in the embedded lossless audio encoder.

Persons skilled in the art of noise-shaping will understand that, if the noise- shaping filter and the prediction filter have inverse responses, then the same effect can be achieved with an ordinary quantiser and a lossless encoder that does not use prediction, by preceding the quantiser by a filter, preferably a"whitening"filter, and placing an inverse filter in the path from the constraining signal to the inverse constraining function.

The encoder and decoder will also need to agree suitable initial state for the noise shaper, either by setting the state to prearranged values or the encoder choosing appropriate state and communicating it as side-information to the decoder.

Previously described embodiments of the invention that employ integer coefficient decimation filters are able to derive from the constraining signal fi a signal wi that is a deterministic function of the current tuple Bi only, and knowledge of wi thereby allows attention to be focussed on a smaller list of possible tuples {Bd for the purpose of entropy encoding and decoding. In these embodiments, the mapping from w, to the list {Bd is fixed.

In this seventh embodiment, it is not possible to derive a signal that is a deterministic function of the current tuple B, onty, but it is still true that the output gi (150 in Figure 11) of the noise shaped quantiser is a deterministic function of the current tuple Bj and previous tuples. Therefore, given knowledge of all previous tuples it is possible to deduce from gi what are the possible values for the current tuple B. In this case, we have a variable mapping from gi to the list {Bd.

In the case that the constraining function 101 in Figure 11 is a Finite Impulse Response filter, the mapping from gi to the list {Bd depends on the tuples that contribute to the output of the filter, and on the error feedback signal S that is used

by the noise shaper. S is a single scalar value that is a function of previous samples of the signal 149 only, and with careful design it can be quantised to a reasonably small number of values.

This mapping from gi and S to the list {Bd is performed by the list generator 36 in Figure 11, which takes the place of the summer 34 and subtractor 38 in Figure 8. There is an additional path 39 in Figure 11, that communicates the error feedback S used in the noise-shaper to the list generator 36.

Error feedback inherently involves a z'delay element. To make the delay explicit in figure 13, undelayed feedback is taken from the noise shaped quantiser and the delay element is duplicated extemally in path 39.

Other numbered processing elements in Figure 11 have exact counterparts on Figure 8, and their descriptions will not be repeated.

Similarly, most of the numbered processing elements in the corresponding decoder of Figure 12 have exact counterparts in Figure 9. The list generator 136 in Figure 12 corresponds to the summer 64 and subtractor 68 in Figure 9.

Because the list generator 136 needs to duplicate the action of the list generator 36, it needs to know the error feedback signal S that was used in the encoder. One way to provide this is to provide duplicates 201 and 202 of the corresponding elements 101 and 102 in the encoder. The"main"output of the noise shaped quantiser is not required in the decoder, and is discarded.

The present invention calls for an embedded lossless encoder 110 and decoder 120. These should be adapted to compress the constraining signal, which in many embodiments of the invention will be of the form of a decimated one-bit- audio signal.

Several prior art techniques are known for losslessly encoding multi-bit audio signals, many of which will be suitable for use in this application. For example, transform coding methods may be used to code an approximation to the audio, and the difference between that approximation and the actual audio may be coded as a "touchup"signal [Purat, M. , Liebchen, T. and Noll, P.,"Lossless Transforun Coding of Audio Signals"AES 102"d Convention, preprint no. 4414 (1997 March) ]. However, we prefer techniques similar to those illustrated in Figure 13 and Figure 14.

Figure 13 shows a lossless encoder adapted to compress multi-bit-audio. A prediction filter estimates the current sample of the input audio signal from previous samples, yielding a prediction signal. The prediction signal is quantised and subtracted from the input audio signal to yield a prediction residual. This prediction residual is entropy coded to produce the output-encoded bitstream.

Figure 14 shows a lossless decoder corresponding to the multi-bit audio lossless encoder of Figure 13. The encoded bitstream is entropy decoded to regenerate the prediction residual signal. A prediction filter estimates the current sample of the audio signal from previously"regenerated"samples, yielding a prediction signal. The predicted sample signal is quantised and added to the prediction residual to furnish a regenerated sample, which is fed to the output.

This kind of lossless audio coding achieves compression by exploiting the non-flatness of the audio spectrum by means of a good choice of prediction filter, and by efficiently entropy coding the prediction residual signal.

There are some characteristics of the typical constraining signal that can be taken into consideration in choosing and tuning such a system. The sampling frequency may be high-perhaps 350KHz whereas many lossless audio coding systems are designed with 44. 1KHz in mind. The typical constraining signal may only exercise a few levels (37 in the first embodiment), while a typical audio coding system might be designed with 65536 levels or 16777216 levels in mind. The spectrum of the constraining signal is similar to that of audio below 20KHz or so, but above that frequency the noise floor rises sharply to quite a high level, as shown in Figure 15.

A configuration that has been found to give good results in the context of the first embodiment is an FIR prediction filter having of order 20 to 25 taps, followed by a type of entropy encoder, such as an arithmetic encoder, that can efficiently encode fractional bits. It has also been found helpful to compression to use the fractional portion of the prediction signal as context to the entropy coding of the prediction residual signal.

Previous Patent: METHOD FOR DUPLICATING SHAPE

Next Patent: WATERMARK TIME SCALE SEARCHING