Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
STEREO AUDIO SIGNAL ENCODER
Document Type and Number:
WIPO Patent Application WO/2018/142018
Kind Code:
A1
Abstract:
A method comprising: receiving at least two audio channel signals; determining, for a first frame, at least two parameters representing a difference between the at least two channel audio signals; scalar quantising (451 ) the at least two parameters to generate at least two index values; determining an initial index map for reordering (453) one of the at least two index values, and determining at least one further index map for reordering at least one further of the at least two index values, wherein the at least one further index map is determined based on the one of the at least two index values; reordering the one of the at least two index values based on the initial index map; reordering the further of the at least two index values based on the at least one further index map; encoding (455) the reordered one of the at least two index values dependent on an order position of the reordered one of the at least two index values; encoding the reordered further of the at least two index values based on an order position of the reordered further of the at least two index values; generating a single channel representation of the at least two audio channel signals dependent on the at least two parameters; and encoding (456) the single channel representation.

Inventors:
VASILACHE ADRIANA (FI)
Application Number:
PCT/FI2018/050018
Publication Date:
August 09, 2018
Filing Date:
January 11, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA TECHNOLOGIES OY (FI)
International Classes:
G10L19/035; G10L19/008; G10L19/02; H03M7/40
Foreign References:
US20160027445A12016-01-28
US20050177360A12005-08-11
US20160217800A12016-07-28
Other References:
BREEBAART, J. ET AL.: "Parametric Coding of Stereo Audio", EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, vol. 2005, no. 9, 21 June 2005 (2005-06-21), pages 1305 - 1322, XP055532228, Retrieved from the Internet [retrieved on 20180509]
See also references of EP 3577649A4
Attorney, Agent or Firm:
NOKIA TECHNOLOGIES OY et al. (FI)
Download PDF:
Claims:
Claims:

1 . A method comprising:

receiving at least two audio channel signals;

determining, for a first frame, at least two parameters representing a difference between the at least two channel audio signals;

scalar quantising the at least two parameters to generate at least two index values;

determining an initial index map for reordering one of the at least two index values, and determining at least one further index map for reordering at least one further of the at least two index values, wherein the at least one further index map is determined based on the one of the at least two index values;

reordering the one of the at least two index values based on the initial index map;

reordering the further of the at least two index values based on the at least one further index map;

encoding the reordered one of the at least two index values dependent on an order position of the reordered one of the at least two index values;

encoding the reordered further of the at least two index values based on an order position of the reordered further of the at least two index values;

generating a single channel representation of the at least two audio channel signals dependent on the at least two parameters; and

encoding the single channel representation. 2. The method as claimed in claim 1 , wherein scalar quantising the parameter further comprises ordering the scalar quantized output according to a predetermined map.

3. The method as claimed in any of claims 1 to 2, wherein encoding the reordered one and further index values dependent on an order position of the reordered index one and further index values comprises applying a Golomb-Rice encoding to the reordered one and further index values dependent on an order position of the reordered index one and further index values.

4. The method as claimed in any of claims 1 to 3, wherein determining, for a first frame, at least two parameters comprises determining at least three parameters;

scalar quantising the at least two parameters comprises scalar quantising the at least three parameters to generate at least three index values, the at least three index values comprising a first index value, a first further index value and a second further index value; and

determining at least one further index map comprises:

determining a first further index map for reordering the first further index value, wherein the first further index map is determined based on the first index value; and

determining a second further index map for reordering the second further index value, wherein the second further index map is determined based on the first further index value.

5. The method as claimed in claim 4, wherein determining the first further index map for reordering the first further index value comprises selecting, from a first array of index maps, the first further index map based on the first index value.

6. The method as claimed in claim 5, wherein determining the second further index map for reordering the second further index value comprises selecting, from a second array of index maps, the second further index map based on the first further index value.

7. The method as claimed in claim 6, wherein the second array of index maps is the first array of index maps.

8. The method as claimed in any of claims 1 to 4, wherein determining the at least one further index map for reordering at least one further of the at least two index values comprises selecting, from an array of index maps, the at least one further index map based on the one of the at least two index values.

9. The method as claimed in claim 8, wherein determining the at least one further index map for reordering at least one further of the at least two index values comprises generating, from a compressed array of index maps, the at least one further index map based on the one of the at least two index values.

10. The method as claimed in any of the claims 1 to 9, wherein the at least one further index map is further determined based on a further one of the at least two index values. 1 1 . The method as claimed in any of claims 1 to 10, wherein encoding the single channel representation comprises:

determining the number of bits used for encoding the reordered further of the at least two index values; and

encoding the single channel representation based on the determined number of bits.

12. A method comprising:

decoding from a first part of a signal at least two parameter index values, wherein the parameters represent a difference between at least two channel audio signals, and wherein the signal is an encoded multichannel audio signal;

reordering a first of the at least two parameter index values based on a first determined reordering to generate a first reordered index value;

reordering a second of the at least two parameter index values based on a second determined reordering to generate a second reordered index value, wherein the second determined reordering is based on the first reordered index value; and dequantizing the first and the second reordered index value to generate the at least two parameters.

13. The method as claimed in claim 12, wherein decoding from a first part of a signal comprises decoding a first part of a signal using a Golomb-Rice decoding.

14. The method as claimed in any of claims 12 to 13, wherein reordering a first of the at least two parameter index values based on a first determined reordering to generate a first reordered index value comprises:

determining an inverse ordering; and

applying the inverse ordering.

15. The method as claimed in claim 14, wherein reordering a second of the at least two parameter index values based on a first determined reordering to generate a first reordered index value comprises:

determining a second inverse ordering based on the first reordered index value; and

applying the second inverse ordering. 16. The method as claimed in any of claims 12 to 15, further comprising:

receiving from a further part of a signal an encoded downmix channel signal;

determining a number of bits used in the first part of the signal;

decoding the encoded downmix channel signal based on the number of bits used in the first part of the signal.

17. An apparatus configured to perform the method of any of claims 1 to 1 1 .

18. An apparatus configured to perform the method of any of claims 12 to 16.

19. An apparatus comprising: a parameter determiner configured to determine, for a first frame, at least two parameters representing a difference between the at least two channel audio signals;

a scalar quantiser configured to scalar quantise the at least two parameters to generate at least two index values;

a map determiner configured to determine an initial index map for reordering one of the at least two index values, and determine at least one further index map for reordering at least one further of the at least two index values, wherein the at least one further index map is determined based on the one of the at least two index values;

a reorderer configured to reorder the one of the at least two index values based on the initial index map and further configured to reorder the further of the at least two index values based on the at least one further index map;

an encoder configured to encode the reordered one of the at least two index values dependent on an order position of the reordered one of the at least two index values, and encode the reordered further of the at least two index values based on an order position of the reordered further of the at least two index values;

a mono channel generator configured to generate a single channel representation of the at least two audio channel signals dependent on the at least two parameters; and

a mono channel encoder configured to encode the single channel representation. 20. The apparatus as claimed in claim 19, wherein the scalar quantizer is further configured to order the scalar quantized output according to a predetermined map.

21 . The apparatus as claimed in any of claims 19 to 20, wherein the encoder is configured to applying a Golomb-Rice encoding to the reordered one and further index values dependent on an order position of the reordered index one and further index values.

22. The apparatus as claimed in any of claims 19 to 21 , wherein the parameter determiner is configured to determine at least three parameters, the scalar quantiser is configured to scalar quantise the at least three parameters to generate at least three index values, the at least three index values comprising a first index value, a first further index value and a second further index value; and the map determiner is configured to:

determine a first further index map for reordering the first further index value, wherein the first further index map is determined based on the first index value; and

determine a second further index map for reordering the second further index value, wherein the second further index map is determined based on the first further index value.

23. The apparatus as claimed in claim 22, wherein the map determiner is configured to select, from a first array of index maps, the first further index map based on the first index value.

24. The apparatus as claimed in claim 23, wherein the map determiner is configured to select, from a second array of index maps, the second further index map based on the first further index value. 25. The apparatus as claimed in claim 24, wherein the second array of index maps is the first array of index maps.

26. The apparatus as claimed in any of claims 19 to 22, wherein the map determiner is configured to select, from an array of index maps, the at least one further index map based on the one of the at least two index values.

27. The apparatus as claimed in claim 26, wherein the map determiner is configured to generate, from a compressed array of index maps, the at least one further index map based on the one of the at least two index values. 28. The apparatus as claimed in any of the claims 19 to 27, wherein the map determiner is configured to determine the at least one further index map based on a further one of the at least two index values.

29. The apparatus as claimed in any of the claims 19 to 28, wherein the mono channel encoder is configured to:

determine the number of bits used for encoding the reordered further of the at least two index values; and

encode the single channel representation based on the determined number of bits.

30. An apparatus comprising:

a decoder configured to decode from a first part of a signal at least two parameter index values, wherein the parameters represent a difference between at least two channel audio signals, and wherein the signal is an encoded multichannel audio signal;

a reorderer configured to reorder a first of the at least two parameter index values based on a first determined reordering to generate a first reordered index value and further configured to reorder a second of the at least two parameter index values based on a second determined reordering to generate a second reordered index value, wherein the second determined reordering is based on the first reordered index value; and

a dequantizer configured to dequantize the first and the second reordered index value to generate the at least two parameters. 31 . The apparatus as claimed in claim 30, wherein the decoder is configured to decode a first part of a signal using a Golomb-Rice decoding.

32. The apparatus as claimed in any of claims 30 to 31 , wherein the reorderer is configured to:

determine an inverse ordering; and

apply the inverse ordering.

33. The apparatus as claimed in claim 32, wherein the reorderer configured to reorder a second of the at least two parameter index values based on a first determined reordering to generate a first reordered index value is configured to: determine a second inverse ordering based on the first reordered index value; and

apply the second inverse ordering.

34. The apparatus as claimed in any of claims 30 to 33, further comprising a mono/downmix decoder configured to:

receive from a further part of a signal an encoded downmix channel signal; determine a number of bits used in the first part of the signal; and

decode the encoded downmix channel signal based on the number of bits used in the first part of the signal.

Description:
Stereo Audio Signal Encoder

Field

The present application relates to a stereo audio signal encoder, and in particular, but not exclusively to a stereo audio signal encoder for use in portable apparatus.

Background

Audio signals, like speech or music, are encoded for example to enable efficient transmission or storage of the audio signals.

Audio encoders and decoders (also known as codecs) are used to represent audio based signals, such as music and ambient sounds (which in speech coding terms can be called background noise). These types of coders typically do not utilise a speech model for the coding process, rather they use processes for representing all types of audio signals, including speech. Speech encoders and decoders (codecs) can be considered to be audio codecs which are optimised for speech signals, and can operate at either a fixed or variable bit rate.

An audio codec can also be configured to operate with varying bit rates. At lower bit rates, such an audio codec may be optimized to work with speech signals at a coding rate equivalent to a pure speech codec. At higher bit rates, the audio codec may code any signal including music, background noise and speech, with higher quality and performance. A variable-rate audio codec can also implement an embedded scalable coding structure and bitstream, where additional bits (a specific amount of bits is often referred to as a layer) improve the coding upon lower rates, and where the bitstream of a higher rate may be truncated to obtain the bitstream of a lower rate coding. Such an audio codec may utilize a codec designed purely for speech signals as the core layer or lowest bit rate coding.

An audio codec is designed to maintain a high (perceptual) quality while improving the compression ratio. Thus instead of waveform matching coding it is common to employ various parametric schemes to lower the bit rate. For multichannel audio, such as stereo signals, it is common to use a larger amount of the available bit rate on a mono channel representation and encode the stereo or multichannel information exploiting a parametric approach which uses relatively few bits.

Current speech and audio standardization efforts at the 3rd Generation Partnership Project (3GPP) aim to increase the quality of the encoded signal through coding efficiency, bandwidth, as well as number of channels. A stereo/binaural extension is being prepared for the Enhanced Voice Services (EVS) speech and audio codec candidate. The coding efficiency for this proposal is of importance, especially for lower codec bitrates. As the addition of a large bitrate extension would diminish the benefits of having an extension, if the total bitrate equals or overpasses the bitrate of a dual mode.

The proposed stereo/binaural extension is composed of encoded stereo parameters. Increasing the coding efficiency for these parameters means reducing the bitrate of the extension and using the 'saved' bits for better encoding of the mono downmix. This is particularly useful at low bit rates where the quality of the encoded downmix is more sensitive to the bitrate.

In addressing the coding efficiency of the stereo parameters a significant saving of bits may be made. Coding efficiency of stereo parameters has involved quantization of the values (levels), followed by entropy encoding to reduce further the bitrate. A previously proposed method for encoding the stereo parameters disclosed in EP2856776 uses an adaptive version of the Golomb Rice coding.

Summary

There is provided according to a first aspect a method comprising: receiving at least two audio channel signals; determining, for a first frame, at least two parameters representing a difference between the at least two channel audio signals; scalar quantising the at least two parameters to generate at least two index values; determining an initial index map for reordering one of the at least two index values, and determining at least one further index map for reordering at least one further of the at least two index values, wherein the at least one further index map is determined based on the one of the at least two index values; reordering the one of the at least two index values based on the initial index map; reordering the further of the at least two index values based on the at least one further index map; encoding the reordered one of the at least two index values dependent on an order position of the reordered one of the at least two index values; encoding the reordered further of the at least two index values based on an order position of the reordered further of the at least two index values; generating a single channel representation of the at least two audio channel signals dependent on the at least two parameters; and encoding the single channel representation.

Scalar quantising the parameter may further comprise ordering the scalar quantized output according to a predetermined map.

Encoding the reordered one and further index values dependent on an order position of the reordered index one and further index values may comprise applying a Golomb-Rice encoding to the reordered one and further index values dependent on an order position of the reordered index one and further index values.

Determining, for a first frame, at least two parameters may comprise determining at least three parameters; scalar quantising the at least two parameters may comprise scalar quantising the at least three parameters to generate at least three index values, the at least three index values comprising a first index value, a first further index value and a second further index value; and determining at least one further index map may comprise: determining a first further index map for reordering the first further index value, wherein the first further index map is determined based on the first index value; and determining a second further index map for reordering the second further index value, wherein the second further index map may be determined based on the first further index value.

Determining the first further index map for reordering the first further index value may comprise selecting, from a first array of index maps, the first further index map based on the first index value. Determining the second further index map for reordering the second further index value may comprise selecting, from a second array of index maps, the second further index map based on the first further index value.

The second array of index maps may be the first array of index maps.

Determining the at least one further index map for reordering at least one further of the at least two index values may comprise selecting, from an array of index maps, the at least one further index map based on the one of the at least two index values.

Determining the at least one further index map for reordering at least one further of the at least two index values may comprise generating, from a compressed array of index maps, the at least one further index map based on the one of the at least two index values.

The at least one further index map may be further determined based on a further one of the at least two index values. Encoding the single channel representation may comprise: determining the number of bits used for encoding the reordered further of the at least two index values; and encoding the single channel representation based on the determined number of bits.

According to a second aspect there is provided a method comprising: decoding from a first part of a signal at least two parameter index values, wherein the parameters represent a difference between at least two channel audio signals, and wherein the signal is an encoded multichannel audio signal; reordering a first of the at least two parameter index values based on a first determined reordering to generate a first reordered index value; reordering a second of the at least two parameter index values based on a second determined reordering to generate a second reordered index value, wherein the second determined reordering is based on the first reordered index value; and dequantizing the first and the second reordered index value to generate the at least two parameters.

Decoding from a first part of a signal may comprise decoding a first part of a signal using a Golomb-Rice decoding. Reordering a first of the at least two parameter index values based on a first determined reordering to generate a first reordered index value may comprise: determining an inverse ordering; and applying the inverse ordering.

Reordering a second of the at least two parameter index values based on a first determined reordering to generate a first reordered index value may comprise: determining a second inverse ordering based on the first reordered index value; and applying the second inverse ordering.

The method may further comprise: receiving from a further part of a signal an encoded downmix channel signal; determining a number of bits used in the first part of the signal; and decoding the encoded downmix channel signal based on the number of bits used in the first part of the signal.

An apparatus may be configured to perform the method of encoding as described herein.

An apparatus may be configured to perform the method of decoding as described herein.

According to a third aspect there is provided an apparatus comprising: a parameter determiner configured to determine, for a first frame, at least two parameters representing a difference between the at least two channel audio signals; a scalar quantizer configured to scalar quantise the at least two parameters to generate at least two index values; a map determiner configured to determine an initial index map for reordering one of the at least two index values, and determine at least one further index map for reordering at least one further of the at least two index values, wherein the at least one further index map is determined based on the one of the at least two index values; a reorderer configured to reorder the one of the at least two index values based on the initial index map and further configured to reorder the further of the at least two index values based on the at least one further index map; an encoder configured to encode the reordered one of the at least two index values dependent on an order position of the reordered one of the at least two index values, and encode the reordered further of the at least two index values based on an order position of the reordered further of the at least two index values; a mono channel generator configured to generate a single channel representation of the at least two audio channel signals dependent on the at least two parameters; and a mono channel encoder configured to encode the single channel representation.

The scalar quantizer may be further configured to order the scalar quantized output according to a predetermined map.

The encoder may be configured to applying a Golomb-Rice encoding to the reordered one and further index values dependent on an order position of the reordered index one and further index values.

The parameter determiner may be configured to determine at least three parameters, the scalar quantizer may be configured to scalar quantise the at least three parameters to generate at least three index values, the at least three index values comprising a first index value, a first further index value and a second further index value; and the map determiner may be configured to: determine a first further index map for reordering the first further index value, wherein the first further index map is determined based on the first index value; and determine a second further index map for reordering the second further index value, wherein the second further index map is determined based on the first further index value.

The map determiner may be configured to select, from a first array of index maps, the first further index map based on the first index value.

The map determiner may be configured to select, from a second array of index maps, the second further index map based on the first further index value.

The second array of index maps may be the first array of index maps.

The map determiner may be configured to select, from an array of index maps, the at least one further index map based on the one of the at least two index values.

The map determiner may be configured to generate, from a compressed array of index maps, the at least one further index map based on the one of the at least two index values.

The map determiner may be configured to determine the at least one further index map based on a further one of the at least two index values. The mono channel encoder may be configured to: determine the number of bits used for encoding the reordered further of the at least two index values; and encode the single channel representation based on the determined number of bits.

According to a fourth aspect there is provided an apparatus comprising: a decoder configured to decode from a first part of a signal at least two parameter index values, wherein the parameters represent a difference between at least two channel audio signals, and wherein the signal is an encoded multichannel audio signal; a reorderer configured to reorder a first of the at least two parameter index values based on a first determined reordering to generate a first reordered index value and further configured to reorder a second of the at least two parameter index values based on a second determined reordering to generate a second reordered index value, wherein the second determined reordering is based on the first reordered index value; and a dequantizer configured to dequantize the first and the second reordered index value to generate the at least two parameters.

The decoder may be configured to decode a first part of a signal using a

Golomb-Rice decoding.

The reorderer may be configured to: determine an inverse ordering; and apply the inverse ordering.

The reorderer configured to reorder a second of the at least two parameter index values based on a first determined reordering to generate a first reordered index value may be configured to: determine a second inverse ordering based on the first reordered index value; and apply the second inverse ordering.

The apparatus may further comprise a mono/downmix decoder configured to: receive from a further part of a signal an encoded downmix channel signal; determine a number of bits used in the first part of the signal; and decode the encoded downmix channel signal based on the number of bits used in the first part of the signal.

The parameters representing a difference between the at least two channel audio signals may be at least one of: a side gain, an interphase difference, a residual prediction gain. A computer program product may cause an apparatus to perform the method as described herein.

An electronic device may comprise apparatus as described herein.

A chipset may comprise apparatus as described herein.

Brief Description of Drawings

For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:

Figure 1 shows schematically an electronic device employing some embodiments;

Figure 2 shows schematically an audio codec system according to some embodiments;

Figure 3 shows schematically an encoder as shown in Figure 2 according to some embodiments;

Figure 4 shows schematically a channel analyser as shown in Figure 3 in further detail according to some embodiments;

Figure 5 shows schematically a stereo channel encoder as shown in Figure 3 in further detail according to some embodiments;

Figure 6 shows a flow diagram illustrating the operation of the encoder shown in Figure 2 according to some embodiments;

Figure 7 shows a flow diagram illustrating the operation of the channel analyser as shown in Figure 4 according to some embodiments;

Figure 8 shows a flow diagram illustrating the operation of the channel encoder as shown in Figure 5 according to some embodiments;

Figure 9 shows schematically the decoder as shown in Figure 2 according to some embodiments; and

Figure 10 shows a flow diagram illustrating the operation of the decoder as shown in Figure 9 according to some embodiments. Description of Some Embodiments of the Application The following describes in more detail possible stereo and multichannel speech and audio codecs, including layered or scalable variable rate speech and audio codecs. As discussed above a previously proposed method for encoding the stereo parameters disclosed in EP2856776 uses an adaptive version of the Golomb Rice coding.

The concept as expressed in the embodiments described hereafter is one which attempts to better capture and exploit intraframe value correlation and as a consequence further reduce bitrate consumption for encoding the stereo parameters.

As such the embodiments explicitly store the order of first order probabilities of the symbols to be encoded (instead of having them adaptively sorted). In other words, for a single data frame, based on a previously encoded symbol, an array of integers keeps the order of probabilities for each symbol. In other words 0 if it is most probable, 1 , if is the second most probable and so on. The probability order value is then encoded with an adaptive GR code.

In this regard reference is first made to Figure 1 which shows a schematic block diagram of an exemplary electronic device or apparatus 10, which may incorporate a codec according to an embodiment of the application.

The apparatus 10 may for example be a mobile terminal or user equipment of a wireless communication system. In other embodiments the apparatus 10 may be an audio-video device such as video camera, a Television (TV) receiver, audio recorder or audio player such as a mp3 recorder/player, a media recorder (also known as a mp4 recorder/player), or any computer suitable for the processing of audio signals.

The electronic device or apparatus 10 in some embodiments comprises a microphone 1 1 , which is linked via an analogue-to-digital converter (ADC) 14 to a processor 21 . The processor 21 is further linked via a digital-to-analogue (DAC) converter 32 to loudspeakers 33. The processor 21 is further linked to a transceiver (RX/TX) 13, to a user interface (Ul) 15 and to a memory 22.

The processor 21 can in some embodiments be configured to execute various program codes. The implemented program codes in some embodiments comprise a multichannel or stereo encoding or decoding code as described herein.

The implemented program codes 23 can in some embodiments be stored for example in the memory 22 for retrieval by the processor 21 whenever needed.

The memory 22 could further provide a section 24 for storing data, for example data that has been encoded in accordance with the application.

The encoding and decoding code in embodiments can be implemented in hardware and/or firmware.

The user interface 15 enables a user to input commands to the electronic device 10, for example via a keypad, and/or to obtain information from the electronic device 10, for example via a display. In some embodiments a touch screen may provide both input and output functions for the user interface. The apparatus 10 in some embodiments comprises a transceiver 13 suitable for enabling communication with other apparatus, for example via a wireless communication network.

It is to be understood again that the structure of the apparatus 10 could be supplemented and varied in many ways.

A user of the apparatus 10 for example can use the microphone 1 1 for inputting speech or other audio signals that are to be transmitted to some other apparatus or that are to be stored in the data section 24 of the memory 22. A corresponding application in some embodiments can be activated to this end by the user via the user interface 15. This application in these embodiments can be performed by the processor 21 , causes the processor 21 to execute the encoding code stored in the memory 22.

The analogue-to-digital converter (ADC) 14 in some embodiments converts the input analogue audio signal into a digital audio signal and provides the digital audio signal to the processor 21 . In some embodiments the microphone 1 1 can comprise an integrated microphone and ADC function and provide digital audio signals directly to the processor for processing.

The processor 21 in such embodiments then processes the digital audio signal in the same way as described with reference to the system shown in Figure 2, the encoder shown in Figures 2 to 8 and the decoder as shown in Figures 9 and 10.

The resulting bit stream can in some embodiments be provided to the transceiver 13 for transmission to another apparatus. Alternatively, the coded audio data in some embodiments can be stored in the data section 24 of the memory 22, for instance for a later transmission or for a later presentation by the same apparatus 10.

The apparatus 10 in some embodiments can also receive a bit stream with correspondingly encoded data from another apparatus via the transceiver 13. In this example, the processor 21 may execute the decoding program code stored in the memory 22. The processor 21 in such embodiments decodes the received data, and provides the decoded data to a digital-to-analogue converter 32. The digital-to-analogue converter 32 converts the digital decoded data into analogue audio data and can in some embodiments output the analogue audio via the loudspeakers 33. Execution of the decoding program code in some embodiments can be triggered as well by an application called by the user via the user interface 15.

The received encoded data in some embodiment can also be stored instead of an immediate presentation via the loudspeakers 33 in the data section 24 of the memory 22, for instance for later decoding and presentation or decoding and forwarding to still another apparatus.

It would be appreciated that the schematic structures described in Figures

3, 5, 7 and 9, and the method steps shown in Figures 4, 6, 8 and 10 represent only a part of the operation of an audio codec and specifically part of a stereo encoder/decoder apparatus or method as exemplarily shown implemented in the apparatus shown in Figure 1 .

The general operation of audio codecs as employed by embodiments is shown in Figure 2. General audio coding/decoding systems comprise both an encoder and a decoder, as illustrated schematically in Figure 2. However, it would be understood that some embodiments can implement one of either the encoder or decoder, or both the encoder and decoder. Illustrated by Figure 2 is a system 102 with an encoder 104 and in particular a stereo encoder 151 , a storage or media channel 106 and a decoder 108. It would be understood that as described above some embodiments can comprise or implement one of the encoder 104 or decoder 108 or both the encoder 104 and decoder 108.

The encoder 104 compresses an input audio signal 1 10 producing a bit stream 1 12, which in some embodiments can be stored or transmitted through a media channel 106. The encoder 104 furthermore can comprise a stereo encoder 151 as part of the overall encoding operation. It is to be understood that the stereo encoder may be part of the overall encoder 104 or a separate encoding module. The encoder 104 can also comprise a multi-channel encoder that encodes more than two audio signals.

The bit stream 1 12 can be received within the decoder 108. The decoder 108 decompresses the bit stream 1 12 and produces an output audio signal 1 14. The decoder 108 can comprise a stereo decoder as part of the overall decoding operation. It is to be understood that the stereo decoder may be part of the overall decoder 108 or a separate decoding module. The decoder 108 can also comprise a multi-channel decoder that decodes more than two audio signals. The bit rate of the bit stream 1 12 and the quality of the output audio signal 1 14 in relation to the input signal 1 10 are the main features which define the performance of the coding system 102.

With respect to Figure 3 an example encoder 104 is shown according to some embodiments.

The encoder 104 in some embodiments comprises a frame sectioner/transformer 201 . The frame sectioner/transformer 201 is configured to receive the left and right (or more generally any multichannel audio representation) input audio signals and generate frequency domain representations of these audio signals to be analysed and encoded. These frequency domain representations can be passed to the channel parameter determiner 203.

In some embodiments the frame sectioner/transformer 201 can be configured to section or segment the audio signal data into sections or frames suitable for frequency domain transformation. The frame sectioner/transformer 201 in some embodiments can further be configured to window these frames or sections of audio signal data according to any suitable windowing function. For example the frame sectioner/transformer 201 can be configured to generate frames of 20ms which overlap preceding and succeeding frames by 10ms each.

In some embodiments the frame sectioner/transformer 201 can be configured to perform any suitable time to frequency domain transformation on the audio signal data. For example the time to frequency domain transformation can be a discrete Fourier transform (DFT), Fast Fourier transform (FFT), modified discrete cosine transform (MDCT). In the following examples a Fast Fourier Transform (FFT) is used. Furthermore the output of the time to frequency domain transformer can be further processed to generate separate frequency band domain representations (sub-band representations) of each input channel audio signal data. These bands can be arranged in any suitable manner. For example these bands can be linearly spaced, or be perceptual or psychoacoustically allocated. In some embodiments the frequency domain representations are passed to a channel analyser 203.

In some embodiments the encoder 104 can comprise a channel analyser 203. The channel analyser 203 can be configured to receive the sub-band filtered representations of the multichannel or stereo input. The channel analyser 203 can furthermore in some embodiments be configured to analyse the frequency domain audio signals and determine parameters associated with each sub-band with respect to the stereo or multichannel audio signal differences. Furthermore the channel analyser 203 can use these parameters and generate a mono channel.

The stereo parameters and the mono parameters/signal can then be output to a quantizer processor/mono encoder 205.

In some embodiments the encoder 104 comprises a quantizer processor/mono encoder 205. The quantizer processor/mono encoder 205 can be configured to receive the stereo (difference) parameters determined by the channel analyser 203. The quantizer processor/mono encoder 205 can then in some embodiments be configured to perform a quantization on the parameters and furthermore encode the parameters so that they can be output (either to be stored on the apparatus or passed to a further apparatus). The quantizer processor/mono encoder 205 may furthermore be configured to receive the mono parameters/channel and furthermore encode the mono parameters/channel using any suitable encoding and furthermore based on the number of bits used to encode the stereo parameters. In other words the stereo parameters are first encoded and then the downmixed signal is encoded. The bits that are saved by using entropy encoding for the stereo parameters may be used to encode the downmixed signal.

In some embodiments the encoder comprises a signal output 207. The signal output as shown in Figure 3 represents an output configured to pass the encoded stereo parameters to be stored or transmitted to a further apparatus.

With respect to Figure 4 a summary of the encoding process according to some embodiments and the operation of the encoder 104 shown in Figure 3 is shown as a flow diagram.

The operation of generating audio frame band frequency domain representations is shown in Figure 4 by step 501 .

The operation of determining the stereo parameters is shown in Figure 4 by step 502.

The operation of generating the mono (downmix) channel parameters is shown in Figure 4 by step 503.

The operation of quantizing the stereo (multichannel) parameters and encoding the quantized stereo (multichannel) parameters is shown in Figure 4 by step 504.

The operation of encoding the mono (downmix) channel parameters based on the bit usage of the optimised quantized stereo parameters is shown in Figure 4 by step 505.

The outputting of the encoded quantized stereo (multichannel) parameters and encoded mono (downmix) parameters/signal is shown in Figure 4 by step 507.

With respect to Figure 5 an example channel analyser 203 according to some embodiments is described in further detail. In some embodiments the channel analyser 203 comprises a channel difference parameter determiner 301 . The channel difference parameter determiner 301 is configured to determine the various channel difference parameters. In the following examples the input audio signals are left and right audio signals. In some embodiments this may be generalised as j'th and j+1 'th audio channels from an multichannel audio system.

For example the channel difference parameter determiner 301 may be configured to receive the following parameters from the frame sectioner/transformer 201 , component i of the DFT of the right channel, - component i of the DFT of the left channel.

These may furthermore be represented as real and imaginary parts such as for the right channel and

real part of the i-th component of the DFT of the right channel,

imaginary part of the i-th component of the DFT of the right channel.

From these components the channel difference determiner may be configured to generate channel energy parameters, for example:

energy of the right channel, energy of sub-band b of the right

channel,

energy of the left channel,

energy of sub-band b of left

channel, geometric mean of the left and right energies, dot product real, dot product imaginary,

Furthermore the channel difference determiner may be configured determine difference (stereo) parameters according to the following equations: side gain for sub-band b

non-normalized residual prediction gain for sub-band b esidual prediction gain (normalized with

downmix energy).

Furthermore in some embodiments the channel difference determiner may be configured to generate for non-speech signals further parameters such as:

For speech signals and for the higher sub-bands the channel difference determiner may be configured to generate:

inter channel phase difference for sub-band b

(for higher sub-bands this value may be set to 0).

The difference parameters such as the interchannel phase difference, the side gain and the residual prediction gain parameter values can be passed to the mono channel generator and as stereo channel parameters to the quantizer processor.

In some embodiments the encoder 104 (or as shown in Figure 5, the channel analyser 203) comprises a mono channel generator 305. The mono channel generator is configured to receive the channel analyser values such as the side gains and inter channel phase differences from the channel difference determiner 301 . Furthermore in some embodiments the mono channel generator/encoder 305 can be configured to further receive the input multichannel audio signals. The mono channel generator 305 can in some embodiments be configured to generate an 'aligned' or downmixed channel which is representative of the audio signals. In other words the mono channel generator 305 can generate a mono (or downmixed) channel signal which represents an aligned multichannel audio signal. For example in some embodiments where there is a left channel audio signal and a right channel audio signal one of the left or right channel audio signals are delayed with respect to the other according to a determined delay difference and then the delayed channel and other channel audio signals are averaged to generate a mono channel signal. However it would be understood that in some embodiments any suitable mono channel generating method can be implemented.

The mono channel parameters/signal can then be output. In some embodiments the mono channel signal is output to the quantizer processor/mono encoder 205 to be encoded.

With respect to Figure 6 a summary of the analysis process (such as described in Figure 4 by steps 502 and 503) according to some embodiments and the operation of the channel analyser 203 shown in Figure 5 is shown as a flow diagram.

The operation of receiving the multichannel audio signal frequency components is shown in Figure 6 by step 551 .

The operation of determining intermediate parameters (e.g. Energy parameters for the audio signal channels) is shown in Figure 6 by step 552.

The operation of determining the difference parameters (e.g. side gain, interphase difference, residual prediction gain) which are generated at least partially from the intermediate parameters is shown in Figure 6 by step 553.

The operation of generating a mono (downmix) channel signal/parameters from a stereo (multichannel) signal is shown in Figure 6 by step 555.

With respect to Figure 7 an example quantizer processor/mono encoder

205 is shown in further detail. In some embodiments the quantizer processor/mono encoder 205 comprises a scalar quantizer 451 . The scalar quantizer 451 is configured to receive the stereo parameters from the channel analyser 203.

The scalar quantizer can be configured to perform a scalar quantization on these values. For example the scalar quantizer 451 can be configured to quantize the values with quantisation partition regions defined by the following array.

Q= {-10000.0, -8.0, -5.0, -3.0, 0.0, 3.0, 5.0, 8.0, 100000.0}

The scalar quantizer 451 can thus output an index value symbol associated with the region within the quantization partition region the level difference value occurs within. For example an initial quantisation index value output can be as follows:

The index values can in some embodiments be output to a remapper 453.

In some embodiments the quantizer processor/mono encoder 205 comprises a remapper 453. The remapper 453 can in some embodiments be configured to receive the output of the scalar quantizer 451 , in other words an index value associated with the quantization partition region within which the stereo or difference parameter is found and then the map or order the index value according to a defined mapping.

In some embodiments the index (re)mapping (or reordering) is based on an adaptive map selected from a range of defined maps. The defined maps may be maps which are determined from training data or any other suitable manner which exploit intraframe correlation. For example these maps may exploit the correlation between adjacent symbols representing adjacent sub-band parameters.

As such the first symbol within a frame may be mapped according to a default or defined map. The second symbol within a frame mapped according to a map which is selected based on the first symbol, and so on.

For example a first symbol may be remapped according to the table

The next (second) symbol may then be remapped based on a map which depends on the previous (first) symbol. For example the reordering or remapping of the second symbol may be defined as

Where previous (first) symbol

Where previous first s mbol =1

Where previous (first) symbol =2

Where previous (first) symbol =3

=

Where previous (first) symbol =5

Where previous (first) symbol =6

These mappings may be stored as an array of mappings, such as for example

Where if the previous symbol has been '0' then the first line from the above 2 dimensional array is used as map, if the previous symbol has been Ύ then the second line and so on.

In the above example the array of reordering or remapping functions is the same for each symbol. In some embodiments each symbol may have a separate array of reordering or remapping functions. For example

the second symbol may have an array

where each array may be different.

This may provide the ability to tune the coding efficiency with respect to the specific sub-band to sub-band correlations at the cost of requiring additional arrays to be stored at the encoder and decoder.

Furthermore in some embodiments the array may be defined or selected from more than first order relationships. For example the array mapping function may be determined based on more than one previously determined symbol (sub- band) within the frame. This may also provide the ability to tune the coding efficiency at the cost of requiring additional arrays to be stored at the encoder and decoder.

Furthermore in some embodiments the array mapping function may be determined based on a time previous symbol. For example the mapping function may exploit any frame to frame correlation. The implementation of time and sub- band based adaptive mapping causes the table ROM to significantly increase. For 8 symbols the table with the mapping will have 64 lines instead of 8 lines. In some embodiments and depending on the data only interframe could be used instead of the intraframe. In some examples the interframe correlation is exploited by applying GR coding to the difference between the current and previous frame. The numbers 0,1 ,-1 ,2,-2,... are mapped to 0,1 ,2,3,4 ...and encoded then with GR of order 0 or 1 , whichever is best.

The output of the remapper 453, is then output to the Golomb-Rice encoder

455.

In some embodiments the quantizer processor/mono encoder 205 may comprise a map selector (or next symbol map selector) 454. The map selector 454 or map determiner may be configured to select or determine the map or ordering which is to be applied by the remapper 453. The map selector 454 may therefore receive a symbol or parameter index value from the scalar quantizer and from this value determine the map. In some embodiments as described in detail herein the selection or determination may be based on a look-up-table implementation. However in some embodiments the selection or determination may be made at least partially algorithmically.

The quantizer processor/mono encoder 205 can in some embodiments comprise a Golomb-Rice encoder 455. The Golomb-Rice encoder (GR encoder) 455 is configured to receive the remapped index values or symbols generated by the remapper and encode the index values according to the Golomb-rice encoding method. The Golomb-Rice encoder 455 in such embodiments therefore outputs a codeword representing the current and previous index values.

An example of a Golomb-Rice integer code for the first symbol is one where the output is as follows.

It would be understood that any suitable entropy encoding can be used in place of the GR integer code described herein.

The GR encoder 455 can then output the stereo codewords. In some embodiments the codewords are passed to a multiplexer to be mixed with the encoded mono channel audio signal. However in some embodiments the stereo codewords can in some embodiments be passed to be stored or passed to further apparatus as a separate stream.

The encoding method may be used for the DFT parameters within a parametric stereo audio encoder. In some embodiments the parameters to be encoded are side gains, residual prediction gains and interchannel phase differences. For an example superwideband case for a frame of audio data there may be

12 side gain values that need to be transmitted, corresponding to the first 12 sub-bands;

5 residual prediction gains and

8 interchannel phase differences

The values of all parameters may be scalarly quantized and their index is encoded with the adaptive GR.

In some embodiments there may be 31 (from 0 to 30) values for the side gains (quantized using 5 bits), 8 values for the residual prediction gains (quantized using 3 bits), and 8 values for the first 7 interchannel phase differences (quantized using 3 bits) and 4 values for the last interchannel phase differences component (quantized using 2 bits).

An example of the encoding function written in C can be written as:

The maps arrays for the three parameters type may be: For the side gain

For the residual prediction ga

and for the interphase differences

As shown above as there are 31 symbols for the side gains (the side gains are first scalar quantized using 5 bits) the 'maps' table is relatively large compared with the other 'maps' table.

In some embodiments the structure of the maps table is analysed and where there is any defined structure in the table that can be exploited then this can be used to compress the maps table. For example in the example sg maps table defined above the analysis may enable the following data to be stored:

This data may be used such that, in order to obtain for instance the 5 th line of 'maps'

By the 4 th pseudo-line line of sg_data1 tells that its data (in bold and underlined in the above line taken as example) is part of the corresponding line in 'maps'. The data in sg_data2, 4 th pseudo-line states that there are 14 components in sg_data1 that should be copied in 'maps' (first parameter of sg_data2, line 4), and that starting with position 16 the corresponding 'maps' line will be automatically filled. The automatic filling is such that the first consecutive number after the last value from sg_data1 pseudo-line will be at the beginning of the string right before 8, i.e. the value 14, then 15 will be at the other end, 16 at the beginning and so on. If there is no possibility to continue at one end, then numbers are filled consecutively just on one side.

An example function which may be used to re-create the 'maps' array for the side gains is:

In some embodiments the quantizer processor/mono encoder 205 further comprises a mono (downmix) channel encoder 456. The mono (downmix) channel encoder 456 may be configured to receive the mono (downmix) channel or parameters. Furthermore the mono (downmix) channel encoder 456 may be configured to receive an indication of the number of bits which have been used in the GR encoder for encoding the current frame. The mono (downmix) channel encoder 456 may then be configured to encode the mono (downmix) channel or parameters based on any suitable encoding method based on the knowledge of the number of bits used by the stereo parameter encoding. The mono channel generator/encoder 456 can encode the generated mono channel audio signal using any suitable encoding format. For example in some embodiments the mono channel audio signal can be encoded using an Enhanced Voice Service (EVS) mono channel encoded form, which may contain a bit stream interoperable version of the Adaptive Multi-Rate - Wide Band (AMR-WB) codec.

With respect to Figure 8 a summary of the encoding process (such as described in Figure 4 by steps 505) according to some embodiments and the operation of the quantizer processor/mono encoder 205 shown in Figure 7 is shown as a flow diagram.

The operation of receiving the stereo parameters is shown in Figure 8 by step 701 .

The operation of quantizing the stereo parameters to generate index values or symbols is shown in Figure 8 by step 703.

The operation of retrieving a map based on at least one previous index value or symbol (within the frame) is shown in Figure 8 by step 704.

The operation of reordering or remapping the symbol or index value based on the retrieved map is shown in Figure 8 by step 705.

The operation of generating codewords according to the Golomb-Rice coding system from the remapped symbol values is shown in Figure 8 by step 707.

The operation of outputting stereo codewords is shown in Figure 8 by step 709.

Furthermore the operation of receiving the mono parameters is shown in Figure 8 by step 702.

The operation of encoding the mono parameters/channel based on the Golomb-Rice encoding bit usage is shown in Figure 8 by step 708.

The operation of outputting mono codewords is shown in Figure 8 by step

710. In order to fully show the operations of the codec Figures 9 and 10 show a decoder and the operation of the decoder according to some embodiments.

In some embodiments the decoder 108 comprises a mono channel decoder 801 . The mono channel decoder 801 is configured in some embodiments to receive the encoded mono channel signal.

Furthermore the mono channel decoder 801 can be configured to decode the encoded mono channel audio signal using the inverse process to the mono channel coder shown in the encoder. In some embodiments the mono channel decoder 801 may be configured to receive an indicator from the stereo channel decoder 803 indicating the number of bits used for the stereo signal to assist the decoding of the mono channel.

In some embodiments the mono channel decoder 801 can be configured to output the mono channel audio signal to the stereo channel generator 809.

In some embodiments the decoder 108 can comprise a stereo channel decoder 803. The stereo channel decoder 803 is configured to receive the encoded stereo parameters.

Furthermore the stereo channel decoder 803 can be configured to decode the stereo channel signal parameters from the entropy code to a symbol value.

The stereo channel decoder 803 is further configured to output the decoded index values to a symbol reorderer (demapper) 807.

In some embodiments the decoder comprises a symbol map selector 805 (or map determiner or order determiner or order selector). The symbol map selector 805 can be configured to receive the current frame stereo channel index values (decoded and reordered symbols) and select a symbol map to reverse the mapping used in the encoder. In other words the symbol map selector 805 is configured to determine a map based on a previously determined symbol decoded within a frame.

The (symbol) map can be output to the symbol reorderer 807.

In some embodiments the decoder 108 comprises a symbol reorderer 807. The symbol or index reorderer (demapper) in some embodiments is configured to receive the symbol map from the map selector 805 and reorder the decoded symbols received from the stereo channel decoder 803 according to the selected map. In other words the symbol reorderer 807 is configured to re-order the index values to the original order output by the scaler quantizer within the encoder. Furthermore in some embodiments the symbol reorderer 807 is configured to de- quantize the demapped or re-ordered index value into a parameter (such as the interaural time difference/correlation value; and interaural level difference/energy difference value) using the inverse process to that defined within the quantizer section of the quantizer processor within the encoder.

In some embodiments the decoder comprises a stereo channel generator 809 configured to receive the reordered decoded symbols (the stereo parameters) and the decoded mono channel and regenerate the stereo channels in other words applying the level differences to the mono channel to generate a second channel.

With respect to Figure 10 a summary of the decoding process according to some embodiments and the operation of the decoder 108 shown in Figure 9 is shown as a flow diagram.

The operation of receiving the encoded mono channel audio signal is shown in Figure 10 by step 901 .

The operation of receiving the encoded stereo parameters is shown in Figure 10 by step 902.

The operation of decoding the mono channel (based on the number of bits used by the stereo channel) is shown in Figure 10 by step 903.

The operation of decoding the stereo parameters is shown in Figure 10 by step 904.

The operation of re-ordering and dequantizing the decoded symbols to generate dequantized (regenerated) stereo parameters for each frame is shown in

Figure 10 by step 906.

The operation of selecting the map for a next symbol based on a current symbol value is shown in Figure 10 by step 907.

The outputting of the stereo parameters to the stereo channel generator is shown in Figure 10 by step 908. The operation of generating the stereo channels from the mono channel stereo parameters is shown in Figure 10 by step 909.

Although in the examples above the map is selected from a stored table it is understood that in some embodiments the map for the current symbol may be determined algorithmically based on a function which receives as an input a previously determined symbol.

Although the above examples describe embodiments of the application operating within a codec within an apparatus 10, it would be appreciated that the invention as described below may be implemented as part of any audio (or speech) codec, including any variable rate/adaptive rate audio (or speech) codec. Thus, for example, embodiments of the application may be implemented in an audio codec which may implement audio coding over fixed or wired communication paths.

Thus user equipment may comprise an audio codec such as those described in embodiments of the application above.

It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.

Furthermore elements of a public land mobile network (PLMN) may also comprise audio codecs as described above.

In general, the various embodiments of the application may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the application may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The embodiments of this application may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.

The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.

Embodiments of the application may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

As used in this application, the term 'circuitry' refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and

(b) to combinations of circuits and software (and/or firmware), such as: (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and

(c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

This definition of 'circuitry' applies to all uses of this term in this application, including any claims. As a further example, as used in this application, the term 'circuitry' would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term 'circuitry' would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or similar integrated circuit in server, a cellular network device, or other network device.

The foregoing description has provided by way of exemplary and non- limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.