Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BUFFER MANAGEMENT
Document Type and Number:
WIPO Patent Application WO/2008/084179
Kind Code:
A1
Abstract:
A system for processing packets of a media stream, the packets being associated with a plurality of timestamps, the packets being at least partially transferred via a bandwidth constrained network, the system including a receiver to receive the packets, a buffer, operationally connected to the receiver, to store the packets, a decoder, operationally connected to the buffer, to receive the packets from the buffer and to decode the packets, a fill-level determiner, operationally connected to the buffer, to determine a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer and a newest timestamp of the timestamps of the packets currently stored in the buffer, and a fill-level manager, operationally connected to the buffer, to perform an action based on the determined fill-level. Related apparatus and methods are also described.

Inventors:
TAYLOR RAY (GB)
Application Number:
PCT/GB2007/000032
Publication Date:
July 17, 2008
Filing Date:
January 08, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NDS LTD (GB)
TAYLOR RAY (GB)
International Classes:
H04N7/24
Domestic Patent References:
WO2004062291A12004-07-22
Foreign References:
US5822537A1998-10-13
US6665751B12003-12-16
Attorney, Agent or Firm:
WHITE, Duncan, Rohan (90 Long Acre, London WC2E 9RA, GB)
Download PDF:
Claims:

What is claimed is:

CLAIMS

1. A system for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the system comprising: a receiver to receive the packets; a buffer, operationally connected to the receiver, to store the packets; a decoder, operationally connected to the buffer, to receive the packets from the buffer and decode the packets; a fill-level determiner, operationally connected to the buffer, to determine a fill-level of the buffer based on a time difference between: an oldest timestamp of the timestamps of the packets currently stored in the buffer; and a newest timestamp of the timestamps of the packets currently stored in the buffer; and a fill-level manager, operationally connected to the buffer, to perform an action based on the determined fill-level.

2. The system according to claim 1, wherein the fill-level manager is operative to adjust a playback speed of the media stream by the decoder based on the determined fill-level of the buffer.

3. The system according to claim 2, wherein the fill-level manager is operative to adjust the playback speed of the media stream by the decoder according to a plurality of thresholds of the buffer fill-level.

22

4. The system according to claim 2 or claim 3, further comprising an audio module to adjust the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed of the decoder.

5. The system according to claim 4, wherein the audio module is operative to adjust the pitch using pitch shifting.

6. The system according to claim 1, wherein the fill-level manager is operative to reestablish a connection with the network.

7. The system according to any of claims 1-6, wherein the timestamps are assigned by the encoder at a time of encoding of each of the packets.

8. The system according to claim 7, wherein the timestamps are program clock references.

9. The system according to any of claims 1-8, wherein the buffer is a receive buffer, the decoder including a decode buffer.

10. The system according to any of claims 1-8, wherein the buffer includes a receive buffer and a decode buffer.

11. The system according to any of claims 1-10, the packets being at least partially transferred via a bandwidth constrained network.

12. A method for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the method comprising: receiving the packets; storing the packets in a buffer;

23

decoding the packets; determining a fill-level of the buffer based on a time difference between: an oldest timestamp of the timestamps of the packets currently stored in the buffer; and a newest timestamp of the timestamps of the packets currently stored in the buffer; and performing an action based on the determined fill-level.

13. The method according to claim 12, wherein performing the action includes adjusting a playback speed of the media stream based on the determined fill-level of the buffer.

14. The method according to claim 13, wherein the adjusting of the playback speed is performed according to a plurality of thresholds of the buffer fill-level.

15. The method according to claim 13 or claim 14, further comprising adjusting the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed.

16. The method according to claim 15, wherein the adjusting the pitch includes pitch shifting.

17. The method according to claim 12, wherein performing the action includes reestablishing a connection with the network.

18. The method according to any of claims 12-17, wherein the timestamps are assigned by the encoder at a time of encoding of the packets.

19. The method according to claim 18, wherein the timestamp are program clock references.

24

20. The method according to any of claims 12-19, wherein the buffer is a receive buffer, the decoder including a decode buffer.

21. The method according to any of claims 12-19, wherein the buffer includes a receive buffer and a decode buffer.

22. The method according to any of claims 12-21, the packets being at least partially transferred via a bandwidth constrained network.

25

Description:

BUFFER MANAGEMENT

FIELD OF THE INVENTION

The present invention relates to video decoders, and in particular, to buffer fill-level management in video decoders.

BACKGROUND OF THE INVENTION

By way of introduction, bandwidth constrained networks, for example, but not limited to, wireless networks such as wireless home networks, or wired networks such as wired home networks and the Internet, may suffer periods of reduced throughput. Therefore, the available bandwidth for a video service, transmitted via a bandwidth-constrained network to receivers, may be less than that required.

During periods of reduced throughput, a buffer in a receiver may start to empty, until at some point, the buffer run out of data. Whilst the buffer fullness remains above zero, there is no noticeable effect on the video or audio output. However, if tfie buffer empties, the output will suffer, usually noticeable as a glitch or freeze in the video and audio output.

Increasing the buffer size allows longer periods of reduced network throughput. However, there is more data in the buffer, which means that the time delay between entering the buffer and leaving the buffer is increased. This has two disadvantages. First, the cost of the extra memory cannot always be justified. Second, the response time to key presses such as channel change, pause and rewind will always be greater than the delay through the buffer typically making the system too unresponsive for the user.

Another technique to cope with periods of reduced throughput is to recode the video stream in order to reduce the bitrate in line with the available network bandwidth. However, this technique has disadvantages. First, an additional decoder and encoder are required in the transmission line increasing costs. Second, the resulting video is of a lower quality than the originally broadcast video.

The following references are also believed to represent the state of the art:

US Patent 4,413,289 to Weaver, et al;

US Patent 5,396,497 to Veltman; US Patent 5,526,362 to Thompson, et al.;

US Patent 5,644,677 to Park, et al.;

US Patent 5,805,602 to Cloutier, et al.;

US Patent 6,108,286 to Eastty;

US Patent 6,493,298 to Youn; US Patent 6,553,455 to Asano, et al.;

US Patent 6,970,895 to Vaidyanathan, et al.;

US Patent 6,999,447 to D'Amico, et al.;

US Published Patent Application 2003/0208359 of Kang, et al.;

US Published Patent Application 2003/0165326 of Blair, et al.; US Published Patent Application 2004/0204945 of Okuda, et al.;

US Published Patent Application 2004/0019491 of Rhee;

US Published Patent Application 2005/0100054 of Scott, et al.;

US Published Patent Application 2005/0265334 of Koguchi;

US Published Patent Application 2006/0092282 of Herley, et al.; US Published Patent Application 2006/0015348 of Cooper, et al.;

PCT Published Patent Application WO 2004/064301 of Thompson

Licensing S. A.;

PCT Published Patent Application WO 2004/062291 of Koninklijke Philips Electronics N.V. European Published Patent Application EP 1 394 975 of Zarlink

Semiconductor Limited; and

Abstract of Japanese Published Patent Application JP2005322995 of Nippon Telegraph and Telephone.

The disclosures of all references mentioned above and throughout the present specification, as well as the disclosures of all references mentioned in those references, are hereby incorporated herein by reference.

SUMMARY OF THE INVENTION

The present invention seeks to provide an improved buffer fill-level management system.

The present invention, in preferred embodiments thereof, monitors a fill-level of a receive buffer. If the receive buffer starts to empty, an action is performed to correct the situation.

Typically, the fill-level of the receive buffer is managed by varying decoder playback speed based on the fill-level of the receive buffer. If the receive buffer has an adequate fill-level, then playback proceeds at normal play speeds. If the receive buffer reduces below a threshold fill-level, the play speed is reduced in order to replenish the receive buffer. Further play speed reductions may be necessary. If the receive buffer is very depleted, an action such as re-establishing the network connection may be performed.

In a variable bitrate (VBR) service, some times referred to as a statistically multiplexed or a stat-muxed service, the amount of data sent per second changes quite dramatically, usually between 1 megabits per second and 10 megabits per second, by way of example only. Therefore, it is very difficult to set a data threshold for the receive buffer for use in controlling the variable speed playback. For example, a threshold requiring the video to be played slower than real-time for a 1 megabit stream would typically be far lower than that for a 10 megabit stream. However, as the stream bitrate is changing frame by frame in an unpredictable way, there is generally not a stable data-threshold which would be appropriate.

The fill-level of the receive buffer is preferably measured using timing information in the stream in order to determine the amount of playback time in the receive buffer, thereby controlling the playback speed based on time thresholds rather than data thresholds. It will be appreciated by those ordinarily skilled in the art that controlling the playback speed based on timing information can be used for VBR services and non-VBR services alike.

If playback speed is reduced by simply playing video frames and audio samples slower, then the audio is pitch-shifted down giving the actors deeper voices, for instance females would start to sound like males. Although playing the video slightly slower is generally not noticed by the viewers, a small change in audio pitch is readily heard.

Therefore, in accordance with preferred embodiments of the present invention, pitch shifting technology is employed to pitch-shift the audio during the decoding process to cancel out the pitch-shifting introduced by varying the playback speed of the stream. The actors have the same tone of voice, but are delivering the lines at a slower or faster rate.

There is thus provided in accordance with a preferred embodiment of the present invention a system for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the packets being at least partially transferred via a bandwidth constrained network, the system including a receiver to receive the packets, a buffer, operationally connected to the receiver, to store the packets, a decoder, operationally connected to the buffer, to receive the packets from the buffer and decode the packets, a fill-level determiner, operationally connected to the buffer, to determine a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer, and a newest timestamp of the timestamps of the packets currently stored in the buffer, and a fill-level manager, operationally connected to the buffer, to perform an action based on the determined fill-level. Further in accordance with a preferred embodiment of the present invention the fill-level manager is operative to adjust a playback speed of the media stream by the decoder based on the determined fill-level of the buffer.

Still further in accordance with a preferred embodiment of the present invention the fill-level manager is operative to adjust the playback speed of the media stream by the decoder a plurality of thresholds of the buffer fill-level.

Additionally in accordance with a preferred embodiment of the present invention, the system includes an audio module to adjust the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed of the decoder. Moreover in accordance with a preferred embodiment of the present invention the audio module is operative to adjust the pitch using pitch shifting.

Further in accordance with a preferred embodiment of the present invention the fill-level manager is operative to reestablish a connection with the network. Still further in accordance with a preferred embodiment of the present invention the timestamps are assigned by the encoder at a time of encoding of each of the packets.

Additionally in accordance with a preferred embodiment of the present invention the timestamps are program clock references. Moreover in accordance with a preferred embodiment of the present invention the buffer is a receive buffer, the decoder including a decode buffer.

Further in accordance with a preferred embodiment of the present invention the buffer includes a receive buffer and a decode buffer.

There is also provided in accordance with still another preferred embodiment of the present invention a method for processing a plurality of packets of a media stream encoded by an encoder, the packets being associated with a plurality of timestamps such that each of the packets is associated with one of the timestamps, the packets being at least partially transferred via a bandwidth constrained network, the method including receiving the packets, storing the packets in a buffer, decoding the packets, determining a fill-level of the buffer based on a time difference between an oldest timestamp of the timestamps of the packets currently stored in the buffer, and a newest timestamp of the timestamps of the packets currently stored in the buffer, and performing an action based on the determined fill-level.

Still further in accordance with a preferred embodiment of the present invention performing the action includes adjusting a playback speed of the media stream based on the determined fill-level of the buffer.

Additionally in accordance with a preferred embodiment of the present invention the adjusting of the playback speed is performed a plurality of thresholds of the buffer fill-level.

Moreover in accordance with a preferred embodiment of the present invention, the method includes adjusting the pitch of an audio element of the media stream in order to compensate for the adjustment of the playback speed. Further in accordance with a preferred embodiment of the present invention the adjusting the pitch includes pitch shifting.

Still further in accordance with a preferred embodiment of the present invention performing the action includes reestablishing a connection with the network. Additionally in accordance with a preferred embodiment of the present invention the timestamps are assigned by the encoder at a time of encoding of the packets.

Moreover in accordance with a preferred embodiment of the present invention the timestamp are program clock references. Further in accordance with a preferred embodiment of the present invention the buffer is a receive buffer, the decoder including a decode buffer.

Still further in accordance with a preferred embodiment of the present invention the buffer includes a receive buffer and a decode buffer.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more folly from the following detailed description, taken in conjunction with the drawings in which: Fig. 1 is a partly pictorial, partly block diagram view of a media stream system constructed and operative in accordance with a preferred embodiment of the present invention;

Fig. 2 is a partly pictorial, partly block diagram view of a set-top box for use in the system of Fig. 1 constructed and operative in accordance with a preferred embodiment of the present invention;

Fig. 3 is a flow chart showing a preferred method of operation of the set-top box of Fig. 2; and

Fig. 4 is a partly pictorial, partly block diagram view of a media stream system constructed and operative in accordance with another preferred embodiment of the present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

Reference is now made to Fig. 1, which is a partly pictorial, partly block diagram view of a media stream system 10 constructed and operative in accordance with a preferred embodiment of the present invention. The system 10 preferably includes a broadcaster Headend 12 having an encoder 14 for encoding a media stream 16. The media stream 16 generally includes a plurality of packets 18. The Headend 12 also typically includes a clock 22, preferably operationally associated with the encoder 14. The packets 18 are typically associated with a plurality of timestamps 20. The timestamps 20 are preferably assigned by the encoder 14 based on the time provided by the clock 22, such that each packet 18 is associated with one timestamp 20 assigned at the time of encoding each packet 18. By way of example, in an MPEG system, the timestamps 20 are typically program clock references (PCRs).

The Headend 12 also typically includes a transmitter 24 for broadcasting the media stream 16 to a plurality of subscribers 28 (only one shown for the sake of clarity) via a satellite 26. However, it will be appreciated by those ordinarily skilled in the art that the media stream 16 may be transmitted by any suitable transmission method, for example, but not limited to, cable, terrestrial communication or Internet Protocol (IP). The media stream 16 is preferably received by a satellite dish 32 attached to a house 34 of the subscriber 28. The media stream 16 is then typically received by a receiver decoder 30 which is operationally connected to the satellite dish 32. In the example of Fig. 1, the receiver decoder 30 is a personal video recorder incorporating set-top box functionality with video recording functionality. However, it will be appreciated by those ordinarily skilled in the art that the receiver decoder 30 may be any suitable receiving device such as a set-top box or a suitable computer. The receiver decoder 30 is typically connected to other set- top boxes 36 in the house 34 via a home network 38. The home network 38 connects the set-top boxes 36 to the receiver decoder 30 enabling the set-top boxes 36 to play content received via the receiver decoder 30. The home network 38 is typically a bandwidth-constrained network which is a wired or wireless

network. Therefore, the packets 18 of the media stream 16, received by the set-top boxes 36, are at least partially transferred via a bandwidth constrained network.

The media stream 16 is transmitted by a transmitter 40 of the home network 38 from the receiver decoder 30 to the set-top boxes 36. The term "transmitter" as used in the specification and claims is defined as an arrangement to send the media stream 16 from one device to another, via a wired or wireless network. Some systems use rate adaptive encoding in the transmitter 40 to adjust the service bitrate of the media stream 16 to the available bandwidth of the network 38. Adaptive encoding introduces an inherent reduction in quality, due to the non-perfect decode-encode stage. Moreover, picture quality may be reduced considerably when reducing the bitrate during a period of poor performance of the home network 38. The system 10 of the present invention, in preferred embodiments thereof, generally maintains the same video encoding as the original broadcast, preferably eliminating the encode-decode process, which is expensive. However, it will be appreciated by those ordinarily skilled in the art that adaptive encoding can be implemented with a preferred embodiment of the present invention.

Additionally, the media stream 16 is preferably transmitted by the transmitter 40 as fast as the network 38 allows. The transmitter 40 typically needs a buffer (not shown) to store data which cannot be sent immediately. The buffer of the transmitter 40 generally does not incorporate another delay into the system 10 as the default state of the buffer of the transmitter 40 is empty (whereas the default state of a buffer of a receiver is generally full or close to full).

Reference is now made to Figs. 2 and 3. Fig. 2 is a partly pictorial, partly block diagram view of one of the set-top boxes 36, namely, a set-top box 42, for use in the system 10 of Fig. 1, constructed and operative in accordance with a preferred embodiment of the present invention. Fig. 3 is a flow chart showing a preferred method of operation of the set-top box 42 of Fig. 2.

The set-top box 42 preferably includes a receiver 44, a receive buffer 46, a decoder 48, a fill-level determiner 50 and a fill-level manager 52.

The receiver 44 is preferably operative to receive the packets 18 from the home network 38 (block 74).

The receive buffer 46 is preferably operationally connected to the receiver 44. The receive buffer 46 is preferably operative to store the packets 18 (block 76).

The decoder 48, which is preferably operationally connected to the receive buffer 46, is preferably operative to receive the packets 18 from the receive buffer 46 and decode the packets 18 (block 78). The decoder 48 typically includes a decode buffer 45 to receive the packets 18 from the receive buffer 46 prior to decoding. In an MPEG system, the decode buffer 45 is typically the MPEG variable bitrate (VBR) buffer whose level may vary wildly. The decoder 48 is generally in complete control of the decode buffer 45 and typically makes sure that the decode buffer 45 never runs out of data. Ascertaining any information about the level of the decode buffer 45 is generally practically impossible. The decode buffer 45 and receive buffer 46 are typically implemented in a single physical buffer, but are logically separate.

Alternatively, the decode buffer 45 and the receive buffer 46 may be physically and logically separate.

Alternatively, the decode buffer 45 and the receive buffer 46 may be included in a single physical and logical buffer (a hybrid buffer 47) requiring special treatment described below in more detail.

The fill-level determiner 50, preferably operationally connected to the receive buffer 46, is preferably operative to determine a fill-level of the receive buffer 46 as a time difference between: an oldest timestamp 56 of the timestamps 20 of the packets 18 currently stored in the receive buffer 46; and a newest timestamp 58 of the timestamps 20 of the packets 18 currently stored in the receive buffer 46 (block 80).

The timestamps 20 used in the fill-level determination are preferably of the same type (for example, both the oldest timestamp 56 and the newest timestamp 58 are PCRs) and not a combination of different types of

timestamps. It should be noted that each packet 18 may have more than one type of timestamp, for example, but not limited to, timestamps generated by the encoder 14, a multiplexer (not shown) and/or the transmitter 40, such as program clock references (PCRs), frame decode time stamps (DTSs), frame presentation time stamps (PTSs), time stamps of IP packets (e.g.: reference time stamps (RTSs)), and timestamps or time codes originating from video sources (e.g.: vertical interval time code (VITC)).

If the set-top box 42 is implemented with the hybrid buffer 47 combining the receive buffer 46 and the decode buffer 45, then the fill-level determiner 50 needs to take into account the decode timestamps (not shown) of the packets 18, allowing calculation of what the decode buffer 45 fill-level would have been if the decode buffer 45 was separate from the receive buffer 46 (at least logically separate). The decode buffer 45 fill-level then needs to be subtracted from the total fill-level of the hybrid buffer leaving a logical receive buffer fill- level. The total fill-level of the hybrid buffer is determined as a time difference between an oldest timestamp (not shown) of the timestamps 20 of the packets 18 currently stored in the hybrid buffer; and a newest timestamp (not shown) of the timestamps 20 of the packets 18 currently stored in the hybrid buffer.

The fill-level manager 52, which is preferably operationally connected to the receive buffer, is preferably operative to perform an action based on the determined fill-level. The action typically includes adjusting the playback speed of the media stream 16 by the decoder 48 based on the determined fill-level of the receive buffer 46 (block 82).

The playback speed is preferably adjusted in accordance with predefined threshold levels of the buffer fill-level, so that when the buffer fill-level falls below a certain level the playback speed is reduced. When the buffer fill-level falls below another threshold level, the playback speed is reduced again, and so on. Alternatively, the adjustment of the playback speed is performed such that the playback speed is proportional to fill-level, so that the lower the fill-level, the lower the playback speed.

Therefore, the playback speed is generally decreased (either smoothly or in steps) as the fill-level of the receive buffer 46 drops, reducing the rate that the receive buffer 46 empties. If the consumption rate of the media stream 16 is reduced to less than the current network 38 throughput, then the receive buffer 46 generally starts to fill. The receive buffer 46 fill-level is typically in effect acting as a feedback mechanism to match the rate of data consumption from the receive buffer 46 to the rate of data acquisition from the network 38. To ensure that the buffer of the transmitter 40 (Fig. 1) does not continue to grow indefinitely, generally requires the decoder 48 to play faster than real-time once network throughput is restored, thereby allowing the transmitter 40 to transmit the media stream 16 faster than real-time emptying the buffer of the transmitter 40, and restoring the system 10 to the default steady state of empty buffer of the transmitter 40 and full receive buffer 46.

To allow the decoder 48 to determine when to play faster than real- time typically requires the receive buffer 46 to be maintained at a nominal full- level which is slightly less than full capacity of the receive buffer 46. If the receive buffer 46 fill-level exceeds the nominal full-level, then the decoder 48 typically plays faster than real time.

By employing the above method of buffer fill-level control, the size and cost of the receive buffer 46, as well as the delay caused by the receive buffer 46, are generally minimized. Additionally, by maintaining the receive buffer 46 fill-level above zero, the noticeable glitches and stutters in the audio and video caused by reduced network throughput are generally eliminated.

Alternatively or additionally, the action performed by the fill-level manager 52 includes reestablishing a connection with the network 38, for example, if the fill-level of the receive buffer 46 drops below a predetermined level or if the fill-level is below a predetermined level for a predetermined time period

(block 84).

Some advantages of controlling the buffer 46 based on a time related buffer fill-level are now described below.

By way of introduction, in a variable bitrate (VBR) service, the amount of data sent per second typically changes quite dramatically, usually between 1 megabit per second and 10 megabits per second, by way of example only. Therefore, it is very difficult to set a data threshold for the receive buffer 46 for use in controlling the variable speed playback. For example, a threshold requiring the video to be played slower than real-time for a 1 megabit per second stream would typically be far lower than that for a 10 megabits per second stream.

However, as the stream bitrate is typically changing frame by frame in an unpredictable way, there is generally not a stable data-threshold which would be appropriate.

For example, if the network 38 usually introduces a delay of less than 500 milliseconds (ms), but occasionally introduces a delay of up to 1 second, then a sensible receive-buffer level in a non-variable playback speed set-top box may be equal to the maximum expected delay (1 second) multiplied by the maximum bitrate (10 megabits per second) multiplied by a safety factor of 2, giving 20 megabits or 2.5 megabytes. So a buffer of 2.5 megabytes generally insulates against receive buffer under-runs, but typically at a considerable cost in terms of total delay. For example, if the client tunes to the service whilst the service is in the 1 megabit per second mode then 20 seconds of data are buffered prior to decoding.

As described above, with the set-top box 42, the playback speed of the decoder 48 is adjustable allowing the receive buffer 46 to refill. Therefore, in the above example, where the home network 38 usually introduces a delay of less than 500 ms, but occasionally introduces a delay of up to 1 second, the set-top box 42 only needs to delay the media stream 16 by slightly more than the usual network delay (of less than 500 ms) to say, 1000 ms.

Reference is now made to row 1 of table 1. If a fill-level threshold is set at 500 ms then any usual delay (of less than 500 ms) would reduce the receive buffer 46 fill-level from 1000 ms to 501 ms or more and not affect the playback speed.

Reference Is now made to row 2 of table 1. An occasional network delay of 500 ms or more, reduces the determined fill-level of the receive buffer 46 (as determined by the fill-level determiner 50) to 500 ms or less, triggering slower than real-time playback at 80% of the real-time playback speed. The amount of effective time left in the receive buffer 46, known as the new effective fill-level, is inversely proportional to the playback speed. In other words, as the playback speed is reduced, the effective amount of data in the receive buffer 46 increases as the data is used at a slower rate. Therefore, if there is 500 ms of data in the receive buffer 46 when the decoder 48 is playing at 100% playback speed, there is effectively 625 ms of data if played at 80% speed (500 ms divided by 80%). Therefore, there is an effective fill level at the new 80% speed of 625 ms.

Table 1

Effective Effective Fill-

Network Determined Old Fill-level New level

Delay Fill-level Speed at old speed Speed at new speed

<500ms >500ms 100% >500ms 100% >500ms

500ms 500ms 100% 500ms 80% 625ms

625ms 400ms 80% 500ms 60% 667ms

792ms 300ms 60% 500ms 40% 750ms

1042ms 200ms 40% 500ms 20% 1000ms

1542ms 100ms 20% 500ms 0% infinity

Reference is now made to row 3 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay, of say 125 ms, is introduced (giving a total delay of 625 ms). The additional delay of 125 ms is associated with a reduction in the determined fill-level by 100 ms, from 500 ms to

400 ms (as determined by the fill-level determiner 50) at 80% speed (100 ms divided by 80% equals 125 ms). The effective fill level has now also dropped from 625 ms (row 2 of table 1) to 500 ms (row 3 of table 1). The effective fill level is calculated by dividing the determined fill-level of 400 ms by the speed of 80%,

giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 400 ms), triggers a further reduction in the speed of the decoder 48 to 60% speed in order to compensate for the additional network delay, resulting in a new effective fill-level of 667 ms (400 ms divided by 60%). Reference is now made to row 4 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay is introduced, of say 167ms (giving a total delay of 792 ms). The additional delay of 167 ms is associated with a reduction in the determined fill-level by 100 ms, from 400 ms to 300 ms (as determined by the fill-level determiner 50) at 60% speed (100 ms divided by 60% equals 167 ms). The effective fill level has now also dropped from 667 ms (row 3 of table 1) to 500 ms (row 4 of table 1). The effective fill level is calculated by dividing the determined fill-level of 300 ms by the speed of 60%, giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 300 ms), triggers a further reduction in the speed of the decoder 48 to 40% speed in order to compensate for the additional network delay, resulting in a new effective fill-level of 750 ms (300 ms divided by 40%).

Reference is now made to row 5 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay is introduced, for example, 250ms (giving a total delay of 1042 ms). The additional delay of 250 ms is associated with a reduction in the determined fill-level by 100 ms, from 300 ms to 200 ms (as determined by the fill-level determiner 50) at 40% speed (100 ms divided by 40% equals 250 ms). The effective fill level has now also dropped from 750 ms (row 4 of table 1) to 500 ms (row 5 of table 1). The effective fill level is calculated by dividing the determined fill-level of 200 ms by the speed of 40%, giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 200 ms), triggers a further reduction in the speed of the decoder 48 to 20% speed in order to compensate for the additional network delay, resulting in a new effective fill-level of 1000 ms (200 ms divided by 20%).

Reference is now made to row 6 of table 1. If the throughput of the home network 38 deteriorates further, an additional delay is introduced, for example, 500ms (giving a total delay of 1542 ms). The additional delay of 500 ms

is associated with a reduction in the determined fill-level by 100 ms, from 200 ms to 100 ms (as determined by the fill-level determiner 50) at 20% speed (100 ms divided by 20% equals 500 ms). The effective fill level has now also dropped from 1000 ms (row 5 of table 1) to 500 ms (row 6 of table 1). The effective fill level is calculated by dividing the determined fill-level of 100 ms by the speed of 20%, giving 500 ms. The drop in the effective fill-level to 500 ms (and/or the drop in the determined fill-level to 100 ms), triggers a further reduction in the speed of the decoder 48 to 0% speed, effectively freezing the decoder 48 and also typically triggering an action to reestablish a connection with the network 38. It will be appreciated by those ordinarily skilled in the art that an action to reestablish a connection with the network 38 may be taken at any other suitable trigger point and/or after a predetermined time period of reduced network throughput.

Using the parameters of table 1 with the set-top box 42 typically ensures that the receive buffer 46 does not under-run, but the playback speed reduces quite harshly as the delay increases. Choosing different trigger levels and playback speeds for the trigger levels may achieve a more gradual effect, by way of example only. It will be appreciated by those ordinarily skilled in the art that choosing different trigger levels and playback speeds for the trigger levels may achieve an even harsher effect.

In the above extreme example, the delay after channel change has generally reduced from a variable 2 to 20 seconds to a fixed 1 second.

As the playback speed of the decoder 48 is generally adjusted based on the time dependent fill-level of the receive buffer 46, the decoder 48 does not typically under-run when the VBR coding is high (for example, but not limited to, 10 megabits per second) or slowing down unnecessarily when the VBR coding is low (for example, but not limited to, 1 megabit per second). Additionally, the set- top box 42 generally does not cause a short delay (for example, but not limited to, 2 seconds) when tuned to a service operating in a high coding phase, or a long delay (for example, but not limited to, 20 seconds) when tuned to a service operating in a low coding phase. Instead, a steady delay of 1 second is preferably

achieved no matter what phase the VBR coding is in during channel change. Reducing the delay through the system 10 is generally very beneficial, as less memory is typically required in the receive buffer 46, and the system 10 is generally far more responsive to the user. The parameters used in table 1 are only examples. In practice, values are typically chosen based on the statistical distribution of the delays in the system 10, the amount of memory available in the set-top box 42, and the amount of delay the user and/or operator feels is acceptable.

Data may be delayed in the network 38 due to congestion. However, after the congestion clears, the components in the network 38 will generally try to deliver the data as quickly as possible. In wireless networking, the post-congestion delivery may not be significantly faster, as the network 38 normally runs close to capacity. Hence, there is little additional bandwidth to overwhelm the set-top boxes 36. In wired networks, and some wireless networks, data can be transferred many times faster than the normal rate. Therefore, following a period of congestion, the decoder 48 may receive more data than can be handled.

Therefore, during design of the receiver (for example, but not limited to, the set-top box 42 in Fig. 2), sufficient memory is preferably provisioned to meet the demands of the system 10 as well as placing suitable constraints on the data path in the network 38 so that bursts are controlled and the memory of the set-top box 42 is not exhausted. Longer-term throughput problems may result in "back-pressure" on the server of the data. The receiver decoder 30 in Fig. 1 is the server of the data in the system 10. The server then has several options to resolve the problem. First, provide a sufficiently large transmit queue to cater for the variable consumption of the receiver. Second, reduce the bitrate of the media stream 16 to compensate for the lower throughput (if transcoding or similar is available). If transcoding is used, the original timing of the media stream 16 is typically maintained. However, a transcoder (not shown) may generate new timing. If new timing is generated, the fill-level of the receive buffer 46 may be based on the new timing or the original timing. Although, basing the fill-level on the new timing is preferred, the clock drift between the original and new

timestamps is generally so small as to have a negligible effect on the receive buffer 46.

Third, if the media stream 16 is a live stream, the media stream 16 can be recorded to disk. The media stream 16 is then played from the disk so that the disk acts as a suitable buffer. Fourth, the network 38 may be reconfigured, or streams dropped to provide the best service for the available network.

By way of introduction, although the viewer generally does not perceive the slight reduction or increase in video speed, the same is typically not true for audio. If audio is played slower than real-time, the pitch of the actors' voices will be lowered. The opposite is true if the audio is played at a higher rate, the pitch goes up. The frequency shift is generally unacceptable to a listener.

Therefore, in accordance with a most preferred embodiment of the present invention, the set-top box 42 includes an audio module 54.

The audio module 54 is preferably operative to adjust the pitch of an audio element (not shown) of the media stream 16 in order to compensate for the adjustment of the playback speed of the decoder 48 using pitch shifting (block 86).

A preferred method to correct the pitch error caused by adjusting the playback speed is to apply a Fourier transform to convert the audio into the frequency domain. Then, the frequency domain values are shifted up or down in frequency, as appropriate. Finally, applying an inverse Fourier transform converts the audio back to the time domain. Playing the audio at the adjusted playback speed causes the pitch of the audio to be shifted, canceling out the effect of the pitch shift performed with the Fourier transforms so that the pitch of the actor's voice remains constant and the user typically does not perceive the change in playback speed. In effect, it generally appears as if the actor is talking slightly faster, or slower, but still with the same tone of voice.

Pitch shifting is generally quite simple for digitally compressed audio decoders to achieve. In digital audio, the compressed input data is typically already in the frequency domain, and so the first Fourier transform is generally unnecessary. Secondly the audio decoding generally requires an inverse Fourier

transform to be applied anyway. Typically, the only difference is the shifting of the frequency samples up or down by the appropriate amount.

The pitch shifting technique is typically implemented in set-top boxes as part of the review buffer functionality allowing viewers to start watching a live program from the beginning of the program even after the start time. The content is generally played slightly faster that real-time to enable the viewer to gradually catch up with the live action.

Reference is now made to Fig. 4, which is a partly pictorial, partly block diagram view of a media stream system 60 constructed and operative in accordance with another preferred embodiment of the present invention. The system 60 is substantially the same as the system 10 of Figs. 1-3 except for the following differences. The system 60 includes a Headend 62 with an encoder 64. The Headend 62 is preferably operative to broadcast programming via an Internet Protocol 66 (IP) to subscribers including a subscriber 68. The subscriber 68 receives programming from the Headend 62 via a set-top box 70 (or PVR). The set-top box 70 is connected to the Internet Protocol 66 via a residential gateway 72. The system 60 is typically an Internet Protocol Television (IPTV) system and/or a Video-on-demand (VOD) system, by way of example only.

Bandwidth may be restricted at various sections of the system 60. When the Headend 62 is located at the office of the content provider (not shown), the content first needs to be sent across the Internet 66 to the Internet Service Provider (ISP) (not shown) for broadcast to the subscribers. When the Headend 62 is located in the server room, for example, the transfer to the ISP across the Internet 66 is avoided. Congestion may occur at any point in the Internet 66 as well as in the home network.

It is appreciated that software components of the present invention may, if desired, be implemented in ROM (read only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques. It will be appreciated that various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be

provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination. It will also be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the invention is defined only by the claims which follow.