Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ENCODING AND DECODING OF AN AUDIO SIGNAL
Document Type and Number:
WIPO Patent Application WO/2009/047675
Kind Code:
A3
Abstract:
An encoder comprises a processor (301) which divides an input audio signal into a plurality of frames. A first encoder unit (303) generates, for each frame, first encoding data and a residual signal and a second encoder unit (305) encodes the residual signal to generate second encoding data. A combine processor (307) generates output encoded data comprising at least the first encoding data and the second encoding data. At least one of the first encoder unit (303) and the second encoder unit (305) employs a gradual frame transition extending into a neighboring frame. The encoder includes a processor (313) which determines a time interval of each frame corresponding to the gradual frame transition and a processor (311) for delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame. A complementary decoder is also provided. The invention may provide improved framing and reduced delay for cascaded coder/decoder arrangements.

Inventors:
VAN SCHIJNDEL NICOLLE H (NL)
RIJNBERG ADRIAAN J (NL)
VAN DE PAR STEVEN L J D E (NL)
EDLER BERND (DE)
Application Number:
PCT/IB2008/054016
Publication Date:
July 02, 2009
Filing Date:
October 02, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KONINKL PHILIPS ELECTRONICS NV (NL)
VAN SCHIJNDEL NICOLLE H (NL)
RIJNBERG ADRIAAN J (NL)
VAN DE PAR STEVEN L J D E (NL)
EDLER BERND (DE)
International Classes:
G10L19/02; G10L19/022; G10L19/14; G10L19/24
Domestic Patent References:
WO2006030340A22006-03-23
Foreign References:
EP1793372A12007-06-06
Attorney, Agent or Firm:
UITTENBOGAARD, Frank (5656 AE Eindhoven, NL)
Download PDF:
Claims:

CLAIMS:

1. An encoder for encoding an audio signal, the encoder comprising: means (301) for dividing the audio signal into a plurality of frames; a first encoder unit (303) for, for each frame, generating first encoding data and a residual signal; - a second encoder unit (305) for, for each frame, encoding the residual signal to generate second encoding data; and means (307) for generating output encoded data comprising at least the first encoding data and the second encoding data; wherein at least one of the first encoder unit (303) and the second encoder unit (305) is arranged to employ at least one gradual frame transition extending into a neighboring frame; and the encoder further comprises: time interval means (313) for providing a time interval of each frame corresponding to the gradual frame transition; and delaying means (311) for delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

2. The encoder of claim 1 further comprising a transmitting unit (309) arranged to transmit first encoding data and at least some second encoding data for a first frame before first encoding data is generated for a subsequent frame.

3. The encoder of claim 1 wherein the second encoder unit (305) is arranged to determine the at least some second encoding data for the time interval of a previous frame in response to the residual signal of a subsequent frame.

4. The encoder of claim 1 wherein the second encoder unit (305) further comprises: means for determining a plurality of encoding segments within a first frame in response to characteristics of the residual signal for the first frame;

means for determining transition encoding segments of the plurality of encoding segments, the transition encoding segments overlapping the time interval of the first frame; and wherein the delaying means (311) is arranged to delay inclusion of only second encoding data for the transition encoding segments to the subsequent frame.

5. The encoder of claim 1 wherein the time interval means (313) is arranged to determine the time interval for a first frame as a time interval having a start time corresponding to an earliest time for an overlap of a subsequent frame of the first encoder unit and the first frame of the first encoder unit, and an end time corresponding to a frame end time for the first frame for the second encoder unit.

6. The encoder of claim 5 wherein the first encoder unit further comprises: means for determining a plurality of encoding segments within a first frame in response to characteristics of the audio signal for the first frame; - means for determining the start time as a beginning time of an encoding segment comprising earliest time for an overlap of a subsequent frame of the first encoder unit and the first frame of the first encoder unit.

7. The encoder of claim 1 further comprising means for grouping (307) the at least some second encoding data for the time interval of a first frame together with second encoding data of a subsequent frame.

8. The encoder of claim 7 further comprising a packet transmitter (309) arranged to transmit first encoding data for a first frame in a first data packet and first encoding data for a subsequent frame in a second data packet and to transmit second encoding data of the first frame not representing the time interval of the first frame in the first data packet and the at least some second encoding data in the second data packet.

9. The encoder of claim 1 wherein at least one of the first encoder unit (303) and the second encoder unit (305) is a rate distortion optimized encoder.

10. The encoder of claim 9 wherein the second encoder unit (305) is arranged to perform a rate distortion parameter determination for a first frame based on the residual

signal of a set of frames including the first frame and excluding a subsequent frame of the first frame.

11. The encoder of claim 10 wherein the second encoder unit (305) is arranged to apply parameters of the rate distortion parameter determination for the first frame to the time interval of the first frame.

12. The encoder of claim 1 further comprising a combination unit for controlling the combination of the first encoder unit (303) and the second encoder unit (305), the combination unit being arranged to perform a rate distortion optimization for determining encoding parameters for the first encoder unit (303) and the second encoder unit (305).

13. The encoder of claim 1 wherein the second encoder unit is arranged to delay generation of at least some second encoding data for the time interval of a first frame until a subsequent frame has been received by the encoder.

14. A decoder for generating a decoded audio signal, the decoder comprising: a receiver (601) for receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; a first decoder unit (603) for, for each frame, generating a first decoded signal in response to first encoding data for the frame; - a second decoder unit (605) for, for each frame, generating a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; a combining unit (607) for combining the first decoded signal and the second decoded signal into the decoded output signal; - time interval means (609) for determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining unit (607) is arranged to generate the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

15. The decoder of claim 14 wherein the receiver (601) is arranged to receive a first encoding data group comprising first encoding data for a first frame and second encoding data for the second time interval of the first frame prior to receiving a second encoding data group comprising first encoding data for a subsequent frame and second encoding data for the first time interval of the first frame; and the decoder is arranged to generated the decoded audio signal for the second interval of the first frame prior to completely receiving the second encoding data group.

16. The decoder of claim 13 wherein the receiver (601) is arranged to receive the first encoding data group in a first data packet set and to receive the second encoding data group in a later data packet set.

17. A method of generating an audio signal, the method comprising: - dividing the audio signal into a plurality of frames; a first encoder unit (303) generating , for each frame, first encoding data and a residual signal; a second encoder unit (305) encoding, for each frame, the residual signal to generate second encoding data; and - generating output encoded data comprising at least the first encoding data and the second encoding data; wherein at least one of the first encoder unit (303) and the second encoder unit (305) employs at least one gradual frame transition extending into a neighboring frame; and the method further comprises: - providing a time interval of each frame corresponding to the gradual frame transition; and delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

18. A method of encoding an audio signal, the method comprising: dividing the audio signal into a plurality of frames; a first encoder unit (303) generating , for each frame, first encoding data and a residual signal;

a second encoder unit (305) encoding, for each frame, the residual signal to generate second encoding data; and generating output encoded data comprising at least the first encoding data and the second encoding data; wherein at least one of the first encoder unit (303) and the second encoder unit (305) employs at least one gradual frame transition extending into a neighboring frame; and the method further comprises: providing a time interval of each frame corresponding to the gradual frame transition; and - delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

19. A method of transmitting an audio signal, the method comprising: dividing the audio signal into a plurality of frames; - a first encoder unit (303) generating , for each frame, first encoding data and a residual signal; a second encoder unit (305) encoding, for each frame, the residual signal to generate second encoding data; generating output encoded data comprising at least the first encoding data and the second encoding data; and transmitting the output encoded data; wherein at least one of the first encoder unit (303) and the second encoder unit (305) employs at least one gradual frame transition extending into a neighboring frame; and the method further comprises: providing a time interval of each frame corresponding to the gradual frame transition; and delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

20. A method of decoding an audio signal, the method comprising: - receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame;

a first decoder unit (603) generating, for each frame, a first decoded signal in response to first encoding data for the frame; a second decoder unit (605) generating , for each frame, a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; combining the first decoded signal and the second decoded signal into a decoded output signal; determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining comprises generating the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

21. A method of receiving an audio signal, the method comprising: receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; a first decoder unit (603) generating, for each frame, a first decoded signal in response to first encoding data for the frame; - a second decoder unit (605) generating , for each frame, a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; combining the first decoded signal and the second decoded signal into a decoded output signal; - determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining comprises generating the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

22. A method of transmitting and receiving an audio signal, the method comprising a transmitter performing the steps of: dividing the audio signal into a plurality of frames,

a first encoder unit (303) generating , for each frame, first encoding data and a residual signal, a second encoder unit (305) encoding, for each frame, the residual signal to generate second encoding data, at least one of the first encoder unit (303) and the second encoder unit (305) employs at least one gradual frame transition extending into a neighboring frame generating output encoded data comprising at least the first encoding data and the second encoding data, providing a time interval of each frame corresponding to the gradual frame transition; delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame, and transmitting the output encoded data; and a receiver performing the steps of: - receiving the encoded audio signal, a first decoder unit (603) generating, for each frame, a first decoded signal in response to first encoding data for the frame, a second decoder unit (605) generating , for each frame, a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal, combining the first decoded signal and the second decoded signal into a decoded output signal, determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining comprises generating the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

23. A computer program product for executing the method of any of the previous claims 17 to 22.

24. A transmitter (201) for transmitting an audio signal, the transmitter comprising: means (301) for dividing the audio signal into a plurality of frames;

a first encoder unit (303) for, for each frame, generating first encoding data and a residual signal; a second encoder unit (305) for, for each frame, encoding the residual signal to generate second encoding data; and - means (307) for generating output encoded data comprising at least the first encoding data and the second encoding data; wherein at least one of the first encoder unit (303) and the second encoder unit (305) is arranged to employ at least one gradual frame transition extending into a neighboring frame; transmitting the output encoded data; and the transmitter further comprises: time interval means (313) for providing a time interval of each frame corresponding to the gradual frame transition; and delaying means (311) for delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

25. A receiver (203) for receiving an audio signal, the receiver (203) comprising: a receiver unit (601) for receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; a first decoder unit (603) for, for each frame, generating a first decoded signal in response to first encoding data for the frame; a second decoder unit (605) for, for each frame, generating a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; a combining unit (607) for combining the first decoded signal and the second decoded signal into the decoded output signal; time interval means (609) for determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining unit (607) is arranged to generate the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

26. A transmission system for transmitting an audio signal, the transmission system comprising: a transmitter (201) comprising: means (301) for dividing the audio signal into a plurality of frames, - a first encoder unit (303) for, for each frame, generating first encoding data and a residual signal, a second encoder unit (305) for, for each frame, encoding the residual signal to generate second encoding data, at least one of the first encoder unit (303) and the second encoder unit (305) being arranged to employ at least one gradual frame transition extending into a neighboring frame, means (307) for generating output encoded data comprising at least the first encoding data and the second encoding data, means (211) for transmitting the output encoded data, first time interval means (313) for providing a time interval of each frame corresponding to the gradual frame transition, and delaying means (311) for delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame; and a receiver (203) comprising: a receiver (601) for receiving the encoded audio signal, - a first decoder unit (603) for, for each frame, generating a first decoded signal in response to first encoding data for the frame, a second decoder unit (605) for, for each frame, generating a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal, - a combining unit (607) for combining the first decoded signal and the second decoded signal into the decoded output signal, second time interval means (609) for determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining unit (607) is arranged to generate the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

27. An encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual

signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; wherein at least some second encoding data for a time interval corresponding to the gradual frame transitions of one frame to a subsequent frame.

28. A storage medium having stored thereon a signal according to claim 27.

29. An audio playing device (203) comprising a decoder (200) according to claim 14.

30. An audio recording device (201) comprising an encoder (209) according to claim 1.

Description:

ENCODING AND DECODING OF AN AUDIO SIGNAL

FIELD OF THE INVENTION

The invention relates to encoding and decoding of an audio signal and in particular to encoding using a cascaded arrangement of encoder units and decoding using a cascaded arrangement of decoding units.

BACKGROUND OF THE INVENTION

Digital encoding of various source signals has become increasingly important over the last decades as digital signal representation and communication has increasingly replaced analogue representation and communication. For example, mobile telephone systems, such as the Global System for Mobile communication, are based on digital speech encoding. Also distribution of media content, such as video and music, is increasingly based on digital content encoding.

In many sound-coding applications, the signal cannot be encoded as a single complete signal but is divided into smaller time segments which are then encoded. These segments are known as frames and typically a signal to be encoded is divided into consecutive frames which are then individually processed.

In many applications, the encoding data for each frame is then transmitted individually and in many cases the encoding data for a frame is transmitted before encoding of the following frame is completed. This is, for example, often the case for real time communication and streaming applications as these applications operate under a time constraint. Typically, in a communication or distribution system, a delay or latency from the start of encoding a certain frame at the source until this is presented to a user(s) at a destination (or destinations) is subject to a constraint of the total delay being below a given value. For example, for many real time communication services, such as two way speech, the total delay is required to be below around 50ms- 100ms. Such constraints translate into individual delay constraints for the encoding, communication and decoding of the audio signal. In order to achieve sufficiently low encoding and decoding delays it is necessary to use relative short frames for the encoding and decoding processes. Often frame lengths of around 10-50 msecs are used.

In recent years, a new class of coders has been proposed which uses a cascading of a plurality of encoders which typically use different encoding algorithms and principles. Thus, in such cascaded (hybrid) encoders, an encoder unit at a given order of cascaded arrangement encodes an input signal which is provided by the previous encoder unit (or the original input signal for the first encoder unit). It then generates encoding data and an error signal representing the difference between the signal corresponding to the encoding data and the input signal. The error signal is then fed to the next encoder in the cascade which proceeds to encode the error signal to generate more encoding data. The decoders use a complementary cascaded structure to sequentially regenerate the encoded signals. Examples of such coders may for example be found in the articles Hamdy, K.

N., AIi, M., and Tewfik, A. H. (1996). "Low bit rate high quality audio coding with combined harmonic and wavelet representations," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP 1996), Atlanta, USA, May, 1996, Vol. II, pp. 1045-1048; Pena, A. S., Serantes, C, and Prelcic, N. G. (1996). "ARCO (Adaptive Resolution COdec): a Hybrid Approach to Perceptual Audio Coding," in Proc. 100th AES Convention,

Copenhagen, Denmark, 11-14 May, 1996, Convention Paper 4178; Pena, A. S., Serantes, C. A., and Gonzalez-Prelcic, N. (1997). "New improvements in ARCO (Adaptive Resolution Codec)," in Proc. 102th AES Convention, Munich, Germany, 22-25 March, 1997, Convention Paper 4419; and Riera-Palou, F., den Brinker, B., and Gerrits, A. J. (2004). "A Hybrid Parametric- Waveform Approach to Bit Stream Scalable Audio Coding," in Proc. Thirty-Eighth Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove (California), USA, November 7-10, 2004, pp. 2250 - 2254.

This new class of encoders provides very efficient encoding resulting in a high audio quality to data rate ratio. However, for such cascaded coders and decoders the issues related to dividing the signals into frames for individual processing becomes substantially more complicated.

Specifically, many encoding algorithms apply soft transitions between different frames such that consecutive frames are overlapping. Thus, rather than simply dividing the signal into non-overlapping sample groups, a window is applied which has smooth transitions. This introduces an overlapping signal segment which contributes to more than one frame. An example of such framing is illustrated in Fig. 1 where a gradual transition between frames is used for both a first encoder and a second encoder of a cascaded arrangement (where the second encoder follows the first encoder). As can be seen, the window applied to frame n overlaps both frame n-1 and frame n+1. Also, the windows of

frames n-1 and n+1 overlap frame n. Such windowing with gradual transitions provides a much smoother overlap between different frames thereby mitigating or eliminating audible coding artefacts resulting from a sharp division into non-overlapping frames (this often results in audible "clicks"). However, for hybrid coders wherein a frame is encoded by all encoders before the following frame is encoded by the previous encoder(s) in the cascaded arrangement (in order to reduce delay) the input signal to a subsequent coder from the previous coder is not fully known. Specifically, the error signal from the previous encoder in frame n has not been fully generated for any time interval in which the window of frame n+1 overlaps into frame n for the previous encoder. Thus, the subsequent encoder does not have a full input signal to work on and accordingly it cannot fully encode the entire frame. If the incompleteness of the input signal to the subsequent coder is ignored, the encoding quality will be reduced and typically noticeable coding artefacts will be introduced.

One coding method which has been found particularly advantageous for cascaded encoders is known as a rate-distortion optimized hybrid encoding technique.

In this technique, encoding parameters such as segmentation and bit allocation over the encoding algorithms etc are determined based on an optimization of a measure that indicates a perceptual distortion resulting from various settings of the encoding parameters. Perceptual distortion calculations quantify the difference between the original sound signal and the coded signal, as perceived by a listener. The smaller the perceptual distortion, the smaller this difference is, and the higher the perceptual quality of the coded sound. An intelligent search is performed over the possible parameter settings and the set of parameters that results in the lowest perceived distortion is selected as the encoding parameters. The rate distortion optimization algorithm can specifically use so-called Lagrange multipliers λ which are related to the trade-off between the bit rate and corresponding perceptual sound quality (or perceptual distortion).

In the absence of a complete input signal it is suboptimal for a rate distortion coder to parameterize (encode) a frame as this will lead to suboptimal decisions and erroneous encoding data. Specifically, in order to perform a rate distortion optimization the distortion calculations must be based on the result of the last coder and as this signal is not complete, suboptimal decisions may be taken by the optimization process. Furthermore, the optimization process may be biased, due to the incomplete input signals and the effect thereof in the distortion calculations. Especially erroneous encoding data are problematic, because

these may directly result in audio encoding artefacts. Thus, the gradual transitions between frames can result in a significant degradation.

However, delaying the encoding until the subsequent frame wherein the full input signal is available to all encoders introduces an additional frame delay to the encoding process. As the frame intervals are typically of the same order of magnitude as the delay constraints for many applications, this is unacceptable in many cases including e.g. communication, streaming and real-time encoding applications.

Hence, an improved audio encoding/decoding using a cascaded arrangement would be advantageous and in particular a system allowing increased flexibility, improved framing, reduced delays, improved quality, facilitated implementation, reduced data rate and/or improved performance would be advantageous.

SUMMARY OF THE INVENTION

Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.

According to one aspect of the invention there is provided an encoder for encoding an audio signal, the encoder comprising: means for dividing the audio signal into a plurality of frames; a first encoder unit for, for each frame, generating first encoding data and a residual signal; a second encoder unit for, for each frame, encoding the residual signal to generate second encoding data; and means for generating output encoded data comprising at least the first encoding data and the second encoding data; wherein at least one of the first encoder unit and the second encoder unit is arranged to employ at least one gradual frame transition extending into a neighboring frame; and the encoder further comprises: time interval means for providing a time interval of each frame corresponding to the gradual frame transition; and delaying means for delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

The invention may provide improved performance for encoders using cascaded encoder units. In particular, improved framing may be performed which reduces transition encoding effects at frame boundaries while maintaining low encoding delays. In particular, encoding delays may be less than two frame durations thereby allowing e.g. longer frame lengths thus facilitating implementation and/or improving performance. The invention may allow e.g. improved communication, streaming and/or real time applications using cascaded encoding techniques. An improved encoded audio quality to data rate ratio may be

achieved relative to many other encoding techniques including many more complex encoding techniques.

The residual signal determined for a frame may specifically be an error signal. The individual frame may be generated by applying a window of a fixed length to a section of the signal. Consecutive frames may be generated by moving the window by a fixed amount for each new frame. The window may result in overlapping transitions if the length of the window is longer than the interval between frames. At least one of the start or end transitions of at least one of the first and second encoder unit may be gradual, i.e. the transition will not completely take place between two samples. The residual signal for a given frame may not be a complete representation of a difference between the audio signal and the synthesized signal within the full frame length (where the frame length may correspond to the time interval wherein the applied window is above a threshold which specifically may be zero) due to the gradual frame transition extending into a neighboring frame. Specifically, due to the gradual frame transition extending into a neighboring frame a complete residual signal may not be determined until encoding of the subsequent frame by the first encoder unit. The time interval may specifically be determined to correspond to a time interval in which a complete residual signal representing the difference between the synthesized signal and the audio signal cannot be generated. The encoder may e.g. generate (partial) second encoding data for the time interval for a first frame during the processing of the first frame. This data may be delayed and included in the output encoded data when the subsequent frame is processed. As another example, no second encoding data may be generated for the time interval of a first frame until the subsequent frame is processed. In some embodiments, the delay means may be arranged to delay the inclusion of the second encoding data for the time interval by delaying the residual signal prior to the encoding by the second encoder unit.

The cascading of encoder units may extend to more than the first and second encoder unit. For example, the audio signal may be a residual signal from a previous encoder unit and/or the second encoder unit may generate a residual signal which is encoded by a subsequent encoder unit. The same principles may be applied to framing for the additional encoder units.

The residual signal represents a difference between the audio signal and a signal synthesized from the first encoding data. Specifically, the residual signal for a frame

may represent an error signal between the synthesized signal and a windowed version of the audio signal (e.g. the frame signal).

According to an optional feature of the invention, the encoder further comprises a transmitting unit arranged to transmit first encoding data and at least some second encoding data for a first frame before first encoding data is generated for a subsequent frame.

The invention may allow an efficient system transmitting/ distributing an encoded audio signal with improved quality and/or reduced delay. The invention may in particular allow improved communication, streaming and/or real time applications According to an optional feature of the invention, the second encoder unit is arranged to determine the at least some second encoding data for the time interval of a previous frame in response to the residual signal of a subsequent frame.

This may allow improved performance and/or facilitated implementation.

According to an optional feature of the invention, the second encoder unit further comprises: means for determining a plurality of encoding segments within a first frame in response to characteristics of the residual signal for the first frame; means for determining transition encoding segments of the plurality of encoding segments, the transition encoding segments overlapping the time interval of the first frame; and wherein the delaying means is arranged to delay inclusion of only second encoding data for the transition encoding segments to the subsequent frame.

This may allow improved performance and/or facilitated implementation. In particular, the feature may allow an improved second encoder unit in many embodiments and/or may allow backwards compatibility with many existing encoding techniques.

According to an optional feature of the invention, the time interval means is arranged to determine the time interval for a first frame as a time interval having a start time corresponding to an earliest time for an overlap of a subsequent frame of the first encoder unit and the first frame of the first encoder unit, and an end time corresponding to a frame end time for the first frame for the second encoder unit.

This may allow improved performance and/or facilitated implementation. A gradual frame transition of a frame may fall within a neighbor frame if the gradual frame transition overlaps a window for the following frame where the window value is above a threshold which specifically is zero.

According to an optional feature of the invention, the first encoder unit further comprises: means for determining a plurality of encoding segments within a first frame in

response to characteristics of the audio signal for the first frame; means for determining the start time as a beginning time of an encoding segment comprising earliest time for an overlap of a subsequent frame of the first encoder unit and the first frame of the first encoder unit.

This may allow improved performance and/or facilitated implementation. In particular, the feature may allow an improved second encoder unit in many embodiments and/or may allow backwards compatibility with many existing encoding techniques. According to an optional feature of the invention, the encoder further comprises means for grouping the at least some second encoding data for the time interval of a first frame together with second encoding data of a subsequent frame. This may allow improved performance and/or facilitated implementation. The second encoding data for the first frame outside the time interval may be transmitted prior to the completion of the generation of the second encoding data for the subsequent frame (outside the time interval for the subsequent frame).

According to an optional feature of the invention, the encoder further comprises a packet transmitter arranged to transmit first encoding data for a first frame in a first data packet and first encoding data for a subsequent frame in a second data packet and to transmit second encoding data of the first frame not representing the time interval of the first frame in the first data packet and the at least some second encoding data in the second data packet. This may allow improved performance and/or facilitated implementation for an encoding system using packet based communication. In particular, an improved packet based communication, streaming and/or real time application may be achieved.

According to an optional feature of the invention, at least one of the first encoder unit and the second encoder unit is a rate distortion optimized encoder. This may allow particularly advantageous performance in many embodiments.

According to an optional feature of the invention, the second encoder unit is arranged to perform a rate distortion parameter determination for a first frame based on the residual signal of a set of frames including the first frame and excluding a subsequent frame of the first frame. This may allow improved performance and/or facilitated implementation. In particular, the rate distortion parameter determination for a first frame may be performed based on the entire residual signal generated for the first frame including a partial residual signal for the time interval. The determination may be without consideration of a residual signal for the time interval determined in response to an encoding of the subsequent frame by

the first encoder but may e.g. take into account a partial residual signal component for the time interval determined in response to an encoding of the first frame by the first encoder. The feature may in particular allow a rate distortion parameter determination to be based on the entire frame without necessitating a wait for a subsequent frame to be encoded. The rate distortion parameter determination may e.g. determine encoding segments and/or bit allocations for the encoding of the first frame by the second encoder unit.

According to an optional feature of the invention, the second encoder unit is arranged to apply parameters of the rate distortion parameter determination for the first frame to the time interval of the first frame. This may allow improved performance and/or facilitated implementation. In particular, the rate distortion parameter determination for a first frame may determine parameters applied to the whole first frame before a complete residual signal representing the difference between the synthesized signal and the audio signal can be determined. According to an optional feature of the invention, the encoder further comprises a combination unit for controlling the combination of the first encoder unit and the second encoder unit, the combination unit being arranged to perform a rate distortion optimization for determining encoding parameters for the first encoder unit and the second encoder unit.

This may allow particularly advantageous performance in many embodiments. According to an optional feature of the invention, the second encoder unit is arranged to delay generation of at least some second encoding data for the time interval of a first frame until a subsequent frame has been received by the encoder.

This may provide particularly advantageous performance and/or facilitated implementation. According to another aspect of the invention, there is provided a decoder for generating a decoded audio signal, the decoder comprising: a receiver for receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; a first decoder unit for, for each frame, generating a first decoded signal in response to first encoding data for the frame; a second decoder unit for, for each frame, generating a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; a combining unit for combining the first decoded signal and the second

decoded signal into the decoded output signal; time interval means for determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining unit is arranged to generate the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

The invention may provide improved performance for decoders decoding encoded signals generated from encoders using cascaded encoder units. In particular, improved framing may be used which reduces transition encoding effects at frame boundaries while maintaining low encoding delays. The invention may allow e.g. improved communication, streaming and/or real time applications using cascaded encoding techniques. It will be appreciated that the comments provided previously with reference to the encoder may apply equally to the decoder as appropriate.

According to an optional feature of the invention, the receiver is arranged to receive a first encoding data group comprising first encoding data for a first frame and second encoding data for the second time interval of the first frame prior to receiving a second encoding data group comprising first encoding data for a subsequent frame and second encoding data for the first time interval of the first frame; and the decoder is arranged to generated the decoded audio signal for the second interval of the first frame prior to completely receiving the second encoding data group. This may allow improved performance and/or facilitated implementation. In particular, a reduced delay may be achieved in many embodiments.

According to an optional feature of the invention, the receiver is arranged to receive the first encoding data group in a first data packet set and to receive the second encoding data group in a later data packet set. This may allow improved performance and/or facilitated implementation for an audio decoding system using packet based communication. In particular, an improved packet based communication, streaming and/or real time application may be achieved. The first and second data packet sets are disjoint sets that each may include one or more data packets. According to another aspect of the invention, there is provided a method of generating an audio signal, the method comprising: dividing the audio signal into a plurality of frames; a first encoder unit generating , for each frame, first encoding data and a residual signal; a second encoder unit encoding, for each frame, the residual signal to generate second encoding data; and generating output encoded data comprising at least the first encoding data

and the second encoding data; wherein at least one of the first encoder unit and the second encoder unit employs at least one gradual frame transition extending into a neighboring frame; and the method further comprises: providing a time interval of each frame corresponding to the gradual frame transition; and delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

According to another aspect of the invention, there is provided a method of encoding an audio signal, the method comprising: dividing the audio signal into a plurality of frames; a first encoder unit generating , for each frame, first encoding data and a residual signal; a second encoder unit encoding, for each frame, the residual signal to generate second encoding data; and generating output encoded data comprising at least the first encoding data and the second encoding data; wherein at least one of the first encoder unit and the second encoder unit employs at least one gradual frame transition extending into a neighboring frame; and the method further comprises: providing a time interval of each frame corresponding to the gradual frame transition; and delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

According to another aspect of the invention, there is provided method of transmitting an audio signal, the method comprising: dividing the audio signal into a plurality of frames; a first encoder unit generating , for each frame, first encoding data and a residual signal; a second encoder unit encoding, for each frame, the residual signal to generate second encoding data; generating output encoded data comprising at least the first encoding data and the second encoding data; and transmitting the output encoded data; wherein at least one of the first encoder unit and the second encoder unit employs at least one gradual frame transition extending into a neighboring frame; and the method further comprises: providing a time interval of each frame corresponding to the gradual frame transition; and delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

According to another aspect of the invention, there is provided method of decoding an audio signal, the method comprising: receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; a first decoder unit generating, for each frame, a first decoded

signal in response to first encoding data for the frame; a second decoder unit generating , for each frame, a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; combining the first decoded signal and the second decoded signal into a decoded output signal; determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining comprises generating the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

According to another aspect of the invention, there is provided method of receiving an audio signal, the method comprising: receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; a first decoder unit generating, for each frame, a first decoded signal in response to first encoding data for the frame; a second decoder unit generating , for each frame, a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; combining the first decoded signal and the second decoded signal into a decoded output signal; determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining comprises generating the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

According to another aspect of the invention, there is provided method of transmitting and receiving an audio signal, the method comprising: a transmitter performing the steps of: dividing the audio signal into a plurality of frames, a first encoder unit generating , for each frame, first encoding data and a residual signal, a second encoder unit encoding, for each frame, the residual signal to generate second encoding data, at least one of the first encoder unit and the second encoder unit employs at least one gradual frame transition extending into a neighboring frame generating output encoded data comprising at least the first encoding data and the second encoding data, providing a time interval of each frame corresponding to the gradual frame transition; delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame, and transmitting the output encoded data; and a receiver performing the steps of: receiving the encoded audio signal, a first decoder unit generating, for each frame, a first

decoded signal in response to first encoding data for the frame, a second decoder unit generating , for each frame, a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal, combining the first decoded signal and the second decoded signal into a decoded output signal, determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining comprises generating the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

According to another aspect of the invention, there is provided a computer program product for executing any of the previously defined methods.

According to another aspect of the invention, there is provided transmitter for transmitting an audio signal, the transmitter comprising: means for dividing the audio signal into a plurality of frames; a first encoder unit for, for each frame, generating first encoding data and a residual signal; a second encoder unit for, for each frame, encoding the residual signal to generate second encoding data; and means for generating output encoded data comprising at least the first encoding data and the second encoding data; wherein at least one of the first encoder unit and the second encoder unit is arranged to employ at least one gradual frame transition extending into a neighboring frame; transmitting the output encoded data; and the transmitter further comprises: time interval means for providing a time interval of each frame corresponding to the gradual frame transition; and delaying means for delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame.

According to another aspect of the invention, there is provided a receiver for receiving an audio signal, the receiver comprising: a receiver unit for receiving an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; a first decoder unit for, for each frame, generating a first decoded signal in response to first encoding data for the frame; a second decoder unit for, for each frame, generating a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal; a combining unit for combining the first decoded signal and the second decoded signal into the decoded output signal; time interval means for determining a first time interval of each frame corresponding to the gradual frame transitions; and

wherein the combining unit is arranged to generate the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

According to another aspect of the invention, there is provided a transmission system for transmitting an audio signal, the transmission system comprising: a transmitter comprising: means for dividing the audio signal into a plurality of frames, a first encoder unit for, for each frame, generating first encoding data and a residual signal, a second encoder unit for, for each frame, encoding the residual signal to generate second encoding data, at least one of the first encoder unit and the second encoder unit being arranged to employ at least one gradual frame transition extending into a neighboring frame, means for generating output encoded data comprising at least the first encoding data and the second encoding data, means for transmitting the output encoded data, first time interval means for providing a time interval of each frame corresponding to the gradual frame transition, and delaying means for delaying the inclusion in the output encoded data of at least some second encoding data for the time interval to a subsequent frame; and a receiver comprising: a receiver for receiving the encoded audio signal, a first decoder unit for, for each frame, generating a first decoded signal in response to first encoding data for the frame, a second decoder unit for, for each frame, generating a second decoded signal in response to second encoding data for the frame, the second decoded signal being a residual signal for the first decoded signal, a combining unit for combining the first decoded signal and the second decoded signal into the decoded output signal, second time interval means for determining a first time interval of each frame corresponding to the gradual frame transitions; and wherein the combining unit is arranged to generate the decoded audio signal for a second time interval of each frame not including the first time interval prior to receiving complete second encoding data for the first time interval of the frame.

According to another aspect of the invention, there is provided an encoded audio signal comprising first encoding data and second encoding data for a plurality of frames, the second encoding data representing an encoding of a residual signal from a first encoding generating the first encoding data and the plurality of frames having gradual frame transitions extending into a neighboring frame; wherein at least some second encoding data for a time interval corresponding to the gradual frame transitions of one frame to a subsequent frame.

According to another aspect of the invention, there is provided storage medium having stored thereon an encoded audio signal as defined above.

According to another aspect of the invention, there is provided an audio playing device comprising a decoder as defined previously. According to another aspect of the invention, there is provided an audio recording device comprising an encoder as defined previously.

These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which

Fig. 1 is an illustration of an example of a framing in a cascaded encoder;

Fig. 2 illustrates an example of a transmission system in accordance with some embodiments of the invention;

Fig. 3 illustrates an example of an encoder in accordance with some embodiments of the invention;

Fig. 4 is an illustration of an example of a framing in a cascaded encoder in accordance with some embodiments of the invention; Fig. 5 is an illustration of an example of framing and data packet generation in a cascaded encoder in accordance with some embodiments of the invention;

Fig. 6 illustrates an example of a decoder in accordance with some embodiments of the invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The following description focuses on embodiments of the invention applicable to a hybrid encoder using a cascade of two encoder units using different encoding algorithms. However, it will be appreciated that the invention is not limited to this application but may be applied to many other encoders using cascaded encoders and in particular to encoders using more than two cascaded encoder units.

Fig. 2 illustrates a transmission system for communication of an audio signal in accordance with some embodiments of the invention. The transmission system comprises a transmitter 201 which is coupled to a receiver 203 through a network 205 which specifically may be the Internet.

In the specific example, the transmitter 201 is a signal recording device and the receiver 203 is a signal player device but it will be appreciated that in other embodiments a transmitter and receiver may used in other applications and for other purposes. For example, the transmitter 201 and/or the receiver 203 may be part of a transcoding functionality and may e.g. provide interfacing to other signal sources or destinations.

In the specific example where a signal recording function is supported, the transmitter 201 comprises a digitizer 207 which receives an analog signal that is converted to a digital PCM (Pulse Code Modulated) signal by sampling and analog-to-digital conversion.

The digitizer 207 is coupled to the encoder 209 of Fig. 2 which encodes the PCM signal in accordance with an encoding algorithm. In the specific example, the encoder 209 is a hybrid encoder comprising two cascaded encoder units as will be described in more detail later. The encoder 209 is coupled to a network transmitter 211 which receives the encoded signal and interfaces to the Internet 205. The network transmitter may transmit the encoded signal to the receiver 203 through the Internet 205. The receiver 203 comprises a network receiver 213 which interfaces to the

Internet 205 and which is arranged to receive the encoded signal from the transmitter 201.

The network receiver 213 is coupled to a decoder 215. The decoder 215 receives the encoded signal and decodes it in accordance with a decoding algorithm. In the specific example, the decoder 215 is a hybrid decoder comprising two cascaded decoding units as will be described in more detail later.

In the specific example where a signal playing function is supported, the receiver 203 further comprises a signal player 217 which receives the decoded audio signal from the decoder 215 and presents this to the user. Specifically, the signal player 217 may comprise a digital-to-analog converter, amplifiers and speakers as required for outputting the decoded audio signal.

In the example, the transmission system supports a real time service where the delay from the input of the transmitter 201 to the output of the transmitter 203 must be less than a given threshold. The threshold may specifically be in the order of 50-100 msec.

Fig. 3 illustrates the encoder 209 in more detail. The encoder 209 comprises a frame processor 301 which receives the audio signal from the digitizer 207. The frame processor 301 then divides the received signal into consecutive frames where each frame corresponds to a given time interval. The time intervals are selected such that consecutive frames have overlapping time intervals i.e. some samples will belong to two consecutive frames. The frame processor 301 then applies a window

function to each frame in order to provide gradual frame transitions. The windows specifically correspond to a central section with a constant amplitude of unity and a begin transitional section and an end transitional section wherein the amplitude is gradually reduced from unity to zero. In the example of Fig. 3, the windows are symmetrical such that the transition at the beginning of the frame is mirrored by the transition at the end of the frame. Furthermore, the window and frame transitions are selected such that the combined amplitude of the window or windows applied to a sample is always unity. Thus, for a sample falling within the overlap region, a weighting by the window of a given frame of x is matched by a weighting by the window of the neighbor overlapping frame of 1-x. However, it will be appreciated that in other embodiments, non-symmetric windows may be used and/or overlapping windows with a total weighting different from unity may be applied. Indeed, the total weight may in some embodiments vary within the overlap time interval.

In the example, the frame processor 301 applies a simple symmetric linear transition in the overlap time intervals resulting in frames having linear gradual frame transitions in an overlap time interval δT as illustrated in Fig. 4.

The frame processor 301 is coupled to a first encoder unit 303 which is further coupled to a second encoder unit 305. The encoder units 303, 305 are coupled in a cascaded configuration wherein the audio signal is first encoded by the first encoder unit 303 to generate first encoding data. The first encoder unit 303 furthermore generates a residual signal which reflects the difference between the encoded signal and the audio signal. Specifically, the first encoder unit 303 can synthesize a signal based on the first encoding data. This signal will correspond to that signal that would be generated by a decoder based on the first encoding data. The residual signal can then be determined as an error signal given by subtracting the synthesized signal from the audio signal (or from the windowed audio signal).

The residual signal is fed to the second encoder unit 305 which proceeds to encode the residual signal to generate second encoding data.

The first encoder unit 303 and the second encoder unit 305 operate on individual frames. Thus, when a frame is generated by the frame processor 301 it is fed to the first encoder unit 303 which generates the first encoding data for the frame. It then proceeds to generate the residual error signal for the frame. However, the first encoding data is generated to accurately represent the windowed signal in the frame, referred to as the frame signal. Due to the overlapping frames, the audio signal in the overlap time interval δT is represented by the encoding of two frames. Thus, the first encoder unit 303 generates first

encoding data for a frame which represents the frame signal including the effect of the transitional windowing in the overlap time interval. Hence, for the overlap time intervals the audio signal is only partially encoded by the individual frame and a decoder recreates the audio signal by decoding two frames and adding the signals in the overlap time intervals. The residual signal which is generated for a given frame is an indication of the error performed by the encoding and specifically is in the example the error signal obtained by subtracting the synthesized signal for the frame from the windowed frame signal. Thus, the residual signal represents a difference between the frame signal and a signal synthesized from the first encoding data. The residual signal for the frame is fed to the second encoder unit 305 which encodes this signal to generate the second encoding data for the frame. In the example, the encoding of the frame by the first encoder unit 303 and the second encoder unit 305 is performed while the audio signal for the following frame is received. Typically, the encoding of a given frame is completed before the following frame is completely received. Thus, the delay introduced by the encoder is roughly equivalent to the frame duration. In a typical real time communication application, frame sizes of around 20-200msecs are used to ensure that the delay is sufficiently small while still achieving an efficient implementation and encoding.

The first encoder unit 303 is coupled to a combine processor 307 which receives the first encoding data and the second encoding data. The combine processor 307 generates output encoded data comprising at least the first encoding data and the second encoding data. In particular, the combine processor 307 generates a data packet for each frame which comprises first encoding data and second encoding data as well as potentially other relevant data, including for example header data, control data etc.

The combine processor 307 is coupled to a transmitting unit 309 which receives the data packets from the combine processor 307 and transmits them to the receiver 203 via the network transmitter 211 and the Internet 205. The transmitting unit 309 is arranged to transmit the encoding data for one frame before the first encoder unit 303 and/or the second encoder unit 305 has finished generating encoding data for the following frame. Indeed, in some examples, the encoding data for one frame may be transmitted before the audio signal for the following frame has been fully received by the encoder 209. In the specific example, the data packet for a given frame is transmitted as soon as it has been generated by the combine processor 307 and before the first encoder unit 303 has completed the generation of the encoding data for the next frame. Thus, the delay of the encoder 209 is kept close to a frame duration.

In the example, the first and second encoder unit 303, 305 may be a sinusoidal encoder and/or a transform encoder and/or a Code Excited Linear Predictive encoder which may generate encoding parameters such as sinusoidal amplitude, frequency and phase (sinusoidal encoder) and/or transform coefficients and scale factors (transform coder), as will be well known to the person skilled in the art.

In the specific example, the combination of the first and the second encoder units 303, 305 is performed using rate distortion techniques. Specifically, the encoder may perform a rate distortion optimization which decides how to distribute the total bit budget over the different encoder units 303, 305. In addition, the individual encoder units are also (sinusoidal, transform, or CELP) are rate-distortion controlled to determine aspects like segmentation, bit allocation over segments, total number of sinusoids, quantization etc. Further description of rate distortion techniques may e.g. be found in the article "rate- distortion optimized hybrid sound coding" by Nicolle H. van Schijndel and Steven van de Par; 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; IEEE; October 16-19, 2005, New Paltz, NY.

In the example, the same windowing and frame transitions are used for the first and second encoder 303, 305 but it will be appreciated that in other embodiments, the first and second encoder may apply different windowing and frame transitions. For example, the second encoder unit 305 may apply longer overlaps than the first encoder unit 303. A problem in a cascaded encoder arrangement such as that of Fig. 3 is that the overlapping frames result in the residual signal generated by the first encoder not being complete in the overlap time interval δT. Specifically, in the overlap time interval, the first encoder unit 303 generates encoding data for two partial signals corresponding to the windowed signals in the two neighboring frames. Accordingly, the residual signal generated for a single frame will not represent the difference between the audio signal and the synthesized signal that will be generated by the decoder but only between the (e.g. windowed) audio signal and the synthesized signal component for the partial signal in the current frame. The complete residual signal for the overlap time interval may be generated once the encoding of the following frame has been completed but this would add a delay corresponding to a frame duration which would be unacceptable in many real time applications.

It should be noted that generating second encoding data based on a complete residual signal results in an improved quality compared to an encoding of the partial residual signals of the overlap time interval in each of the overlapping frames. In particular, applying

an overlap and add approach to the residual signals results in audible encoding artifacts e.g. due to aliasing at frame boundaries etc.

In the encoder of Fig. 3, a time interval which corresponds to the gradual frame transitions (and thus the overlap time interval δT) is determined of each frame. For example, the time interval may be determined as the overlap time interval wherein the signal is also encoded in subsequent frames by the first encoder unit 303. The time interval can specifically be determined to start at the earliest time for which the residual signal is incomplete i.e. in the example of Fig. 4 to start at the earliest time where the next frame (n+1) overlaps the current frame (n). The end of the time interval can be at the end of the frame (n) for the second encoder unit (this can be the same as the end of the frame for the first encoder unit 303 but could be e.g. later if a longer overlap is used for the second encoder unit than for the first encoder unit).

For example, if the first encoder unit 303 does not use any overlap, the residual signal input to the second encoder unit 305 is still not completely known for the entire frame of the second encoder unit 305 if this uses an overlap at the frame boundaries. Thus, if the second encoder unit 305 applies an overlap at the end of the frame this would fall into the subsequent frame for the first encoder unit 303 and would therefore not be known. Thus, second encoding data for this overlap region should be delayed until the next frame. The overlap time interval would in this specific example start at the end of the frame of the first encoder unit 303 and end at the end of the frame for the second encoder unit 305. The inclusion of at least some of the second encoding data for this time interval from the second encoder unit is then delayed to a subsequent frame. Thus, second encoding data corresponding to the overlap interval in which a complete residual signal is not yet available is delayed until the next frame. However, second encoding data is generated for the remaining part of the frame.

In the specific example, the encoder 209 comprises a delay processor 311 which is coupled to the second encoder unit 305 and the combine processor 307. The delay processor 311 receives the second encoding data from the frame and divides it into two categories. The first category corresponds to encoding data for the frame which does not fall within the overlap time interval. This data is forwarded to the combine processor 307 and included in the data packet with the first encoding data from the first encoder unit 303. The second category corresponds to encoding data for the frame which does fall within the overlap time interval. This data is not forwarded to the combine processor 307 but is stored in the delay processor 311 until the next frame has been encoded by the second encoder unit

305. It is then forwarded to the combine processor 307 together with the second encoder unit 305 for the subsequent frame and is included in the data packet generated for the subsequent frame.

It will be appreciated that in other embodiments the delay of the inclusion of this data may be achieved in other ways. For example, the residual signal may effectively be delayed at the input to the second encoder unit 305. Thus, for a given frame, the second encoder unit 305 may encode only the residual signal in the overlapping time interval at the beginning of the frame and in the non-overlapping central time interval but not in the overlapping time interval at the end of the frame. Thus the delay processor 311 may control the encoding of the second encoder unit 305 to not encode (or only partially) encode the residual signal in the overlap time interval until the next frame is being encoded.

The (partial) residual signal in the end overlapping time interval may be stored for the following frame where it is included in the encoding. The residual signal used for the beginning overlapping time interval can be generated by combining the stored residual signal from the previous signal and the residual signal for the interval received for the current frame and the resulting residual signal can be encoded. Thus, at least some second encoding data for the overlap time interval of a previous frame can be determined in response to the residual signal of a subsequent frame. In particular, the residual signal that is used to generate second encoding data may be generated by a combination of a residual signal for the time interval from the two frames.

It will be appreciated that a combination of these approaches may also be used. For example, some encoding parameters such as segmentations and bit allocations may be generated for the entire frame and the parameters relating to the end overlapping interval may be stored for the following frame whereas other encoding parameters are not generated until the next frame is encoded (and thus are not generated until the entire signal is present). In some examples, some or all encoding parameters may be generated twice. E.g. a set of parameters may be generated for a given frame including the delay time interval. This data may then be used e.g. for rate distortion optimization. When the next frame is processed, new values for some or all of the encoding parameters for the delay time interval of the previous frame may be generated as these now can be based on the complete residual signal in this time interval. The new values are then included in the output data stream whereas the previous values are discarded.

In the encoder 209, the time interval is determined by a time processor 313 coupled to the delay processor 311. In the example, the time processor 313 determines the

delay time interval (being the interval for which second encoding data is delayed until the next frame) as the same as the overlap time interval. Thus, the delay time interval is determined to have a start time which is the earliest time for an overlap of the current and subsequent frame of the first encoder unit. From this start time, the encoding of the audio signal is performed in two frames and accordingly only a partial residual signal is initially available. The second encoding data from this time on is therefore delayed to the next frame. The end time of the delay interval is set to the frame end time for the current frame for the second encoder unit (which can be the same as the end time of the frame for the first encoder unit 303). In the combine processor 307, some or all of the second encoding data for the delay time interval of a frame is thus grouped together with some or all of the second encoding data of the subsequent frame. In the example, the grouping is by generating a data packet for each frame which includes the second encoding parameters for the frame except for the delay time interval together with the second encoding parameters for the delay time interval of the previous frame. This allows the data packet for a frame n to be transmitted before frame n+1 has been received and encoded thereby substantially reducing the delay. Specifically, the approach may allow a proper encoding and decoding of overlapping frame transitions while introducing only a delay which corresponds to the overlap time. This time is typically much smaller than the frame duration thereby resulting in a much reduced overall delay and a suitability for communication, streaming and real-time applications.

Fig. 5 illustrates an example of framing and generation of data packets in accordance with such an embodiment.

In the specific exemplary encoder 209 illustrated in Fig. 3, the first and second encoder units 303, 305 use segmentation of the frames into encoding segments. Specifically, both the first encoder unit 303 and second encoder unit 305 are arranged to determine one or more encoding segments (i.e. some but not necessarily all frames will comprise a plurality of encoding segments) in response to the characteristics of the signal being encoded (i.e. the digitized audio signal or the residual signal from the first encoder unit 303. The audio segments may for example for the first encoder unit 303 be determined by fixed segmentation, using a pre-determined segment length to split the frame. In the specific example both the first encoder unit 303 and the second encoder unit 305 uses rate-distortion optimization to determine the segmentation. Thus, the rate distortion optimization investigates different segmentations and selects the segmentation with lowest perceptual distortion at a given bitrate. The segmentation is used to split the frame into

smaller segments, which are more easily manageable by encoding algorithms and provides the advantage of increased coding efficiency.

In the embodiment, the time interval determination and delaying of second encoding data is performed based on the segmentation. Thus, the delay time interval for which the second encoding data is delayed is selected as one or more whole segments.

Specifically, the time processor 313 first receives information from the first encoder unit 303 of the segmentation of the frame. The time processor 313 then proceeds to determine one or more of the encoding segments at the end of the frame which overlap the overlap time interval δT. The start time of the delay time interval is then optionally set to the start time of the encoding segment which comprises the start of the overlap time interval δT. As another example, it may be set to the actual start time of the overlap time interval. The end time is set to the end of the frame (which in the example is the same for the first and second encoder unit 303, 305).

For the encoding segments overlapping the overlap time interval, the complete residual signal is not generated for the part of the segment that falls in the overlap time interval. Accordingly, the second encoding data relating to the residual signal for this interval should be delayed and in some examples the delay time interval is set to the duration of these segments. Thus, the time processor 313 may extend the original overlap time interval to include the whole encoding segment at the beginning of overlap time interval. The time processor 313 then proceeds to receive information of the segmentation performed by the second encoder unit 305. It then determines the encoding segments which fall within or at least partially overlap the delay time interval determined from the segmentation made by the first encoder unit 303. The appropriate second encoding data for all these encoding segments are accordingly delayed until the following frame. Hence, effectively, the time processor 313 extends the delay time interval such that the beginning of this coincides with a beginning of the encoding segment which includes the previously determined start time. This ensures alignment of the delay with the segmentation of both the first and second encoder unit 303, 305 thereby ensuring that the appropriate second encoding data affected by the incomplete residual signal is delayed. In the encoder 209 of Fig. 3, the second encoder unit 305 is a rate distortion optimized encoder which is arranged to determine some encoding parameters based on a rate distortion parameter determination algorithm using the residual signal as the input signal. For a given frame, the rate distortion algorithm evaluates a perceptual distortion measure for different parameter sets and selects the parameter set that results in the lowest perceptual

distortion as will be known to the skilled person. Further details of rate distortion algorithms may for example be found in van Schijndel, N. H., and van de Par, S., "Rate-Distortion Optimized Hybrid Sound Coding," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2005), New Paltz, New York, USA, Oct. 16- 19, 2005, pp. 235-238.

In the encoder 209, the second encoder unit 305 encodes a residual signal which is incomplete in the overlap time interval and for which some encoding data is delayed as a result. Thus, in order to perform a rate distortion optimization and parameter selection based on an accurate residual signal, the perceptual distortion minimized by the algorithm depends on the encoding of the following frame.

However, in the encoder 209 of Fig. 3, the rate distortion parameter determination for each frame is performed taking into account only the residual signal of the current frame and potentially of the previous frame(s) but without consideration of the following frame. Thus, in the encoder 209 the residual signal for the subsequent frame is not taken into account when performing the rate distortion optimization. Rather, the rate distortion algorithm is executed based only on a set of frames which includes only the current frame and possibly one or more previous frames but which does not include the subsequent frame.

It will be appreciated that the same considerations can be applied to the rate optimization of the combination of the first encoder unit 303 and the second encoder unit 305 (e.g. allocation of a total number of bits between the first encoder unit 303 and the second encoder unit 305) and to the first encoder unit 303.

Thus, in the encoder 209, the distortion calculations are performed on a per frame basis, assuming that the distortion is going towards zero outside the end frame boundary or by predicting the distortion in the neighborhood of the end frame boundary.

Specifically, the distortion in frame n is determined by the encoding results of frame n-1 and frame n, but not by encoding results of frame n+1, which has not yet been processed. As a result, the rate distortion optimization can be performed for the current frame and the encoding delay can be substantially reduced. The use of an incomplete residual signal for part of the frame when performing the rate distortion optimization may result in some degradation of the performance. However, in most embodiments and scenarios, this degradation will be acceptable and substantially outweighed by the reduced delay.

The rate distortion optimization may specifically be used to determine parameters such as segmentation and bit allocations for the encoded signal. In the encoder 209 of Fig. 3, these encoding parameters are not only applied to the part of the frame for which a complete residual signal is present but also to the entire frame including the overlap time interval.

Thus, encoder parameter settings such as segmentation and bit allocation are determined for a complete frame at a time and are applied to the entire frame. Hence, this data may be included in the data packet for the current frame. When the residual signal for the next frame is received, the second encoder unit 305 can proceed to determine specific encoding data, such as sinusoidal parameters (amplitudes, frequencies and phases) or transform parameters (MDCT coefficients, scale factors), which describes the complete residual signal during the overlap time interval of the previous frame. However, the bit allocation and segmentation determined by the rate distortion optimization of the previous frame is still maintained for the overlap time interval. Thus, in summary, in the encoder 209, the data packet for a given frame does not comprise all encoding data for the frame. Rather, the data packet(s) for the current frame includes the encoding data which has been derived based on a complete residual signal, i.e. all of the first encoding data and the second encoding data for the frame prior to the delay time interval. In addition, some second encoding data, such as segmentation and bit allocation data, for the delay time interval is included in the data packet as is some second encoding data for the delay time interval of the previous frame. However, some second encoding data for the delay time interval of the current frame is delayed until the data packet(s) for the subsequent frame.

It will be appreciated that various techniques may be used for encoding the first and last frames of the input audio signal. For example, an extra initial and final zero signal frame may be added and used to start and finalize the described encoding approach.

Fig. 6 illustrates the decoder 215 in more detail. The decoder 215 corresponds to the encoder 209 and comprises complementary functionality to this.

Specifically, the decoder 215 comprises a receive processor 601 which receives the data packets generated by the encoder 209. Thus, the received data comprises the first encoding data and second encoding data from the first encoder unit 303 and second encoder unit 305 respectively.

The receive processor 601 is coupled to a first decoding unit 603 and a second decoding unit 605. The first decoding unit 603 receives the first encoding data from the

receive processor 601 and in response it generates a first decoded signal. The first decoded signal corresponds to a synthesis of the audio signal based on the encoding of the first encoder unit 303 and this is equivalent to the signal synthesized in the first encoder unit 303 and used to generate the residual signal. The second decoder unit 605 receives the second encoding data from the receive processor 601 and proceeds to generate a second decoded signal which corresponds to the residual signal encoded by the second encoder unit 305. The first and second decoded signals, i.e. the signals corresponding to the encoded version of the input signals of the first encoder unit 303 and second encoder unit 305 are fed to a merge processor 607 which combines the two signals into a decoded signal that corresponds to the audio signal input to the encoder 209. Specifically, the first and second decoded signals may be added together by the merge processor 607.

However, in addition, the decoder 215 also comprises a delay timing processor 609 which is coupled to the second decoding unit 605 and the merge processor 607. The delay timing processor 609 determines the delay time interval which has been used by the encoder 209. Specifically, the encoder 209 can include an indication of the start time of the delay time interval from which the second encoding data has been delayed to the next frame. This information can be extracted from the data packet by the delay timing processor 609. The delay timing processor 609 then proceeds to control the second decoding unit 605 and the merge processor 607 to compensate for the delay introduced in the decoder 209.

Specifically, while the first decoding unit 603 simply proceeds to generate a first encoded signal for the entire frame of the received data packet, the second decoding unit 605 first proceeds to decode the residual signal from the beginning of the delay time interval of the previous frame based on the second encoding data relating to this time interval but comprised in the current data packet. This decoding may include second encoding data (such as segmentation and bit allocation information) that was received in the data packet(s) for the previous frame.

The second decoding unit 605 then proceeds to decode the residual signal for the part of the current frame which does not fall within the delay interval, i.e. for the part of the frame for which complete second encoding data is available. The decoded residual signal for the delay time interval of the previous frame is then fed to the merge processor 607 where it is combined with the corresponding section of the first decoded signal which has been stored from the previous frame. The decoded residual signal for the part of the current frame which is complete is then fed to the merge processor 607 where this is combined with the

corresponding part of the first decoded signal. Thus a complete decoded signal starting at the beginning of the delay time interval of the previous frame and ending at the beginning of the delay time interval of the current frame is thus generated when a new data packet is received. However, the residual signal for the delay time interval of the current frame is not decoded as some encoding data related to this will not be received until the next data packet(s). Thus, any received second encoding data for this interval is stored in the second decoding unit 605 until the next frame. Similarly, the merge processor 607 stores the corresponding part of the first decoded signal from the first decoding unit 603.

Thus, in the system a decoded audio signal is generated for the second time interval of each frame which does not include the delay interval before complete second encoding data is received for the delay interval. Thus, apart from communication delays, a total delay corresponding to only a frame duration and a duration of the overlap time interval (which is typically much lower than a frame duration) is achieved.

It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.

The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors. Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments

may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.

Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate. Furthermore, the order of features in the claims does not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example shall not be construed as limiting the scope of the claims in any way.