GENERALISED HYPOTHETICAL REFERENCE DECODER FOR SCALABLE VIDEO CODING WITH BITSTREAM REWRITING

Title:

GENERALISED HYPOTHETICAL REFERENCE DECODER FOR SCALABLE VIDEO CODING WITH BITSTREAM REWRITING

Document Type and Number:

WIPO Patent Application WO/2008/084184

Kind Code:

Abstract:

In a generalised hypothetical reference decoder, an encoder provides an encoded video bitstream, using scalable video coding, which may be subject to bitstream rewriting. Bitstream rewriting enables the bitstream to be decoded by devices not compatible with scalable video coding e.g. legacy devices. The encoder further determines a set of parameters for use by a decoder in the event of bitstream rewriting to ensure effective decoding of the video bitstream by the decoder. The set of parameters is introduced in the encoded video bitstream.

More Like This:

JP2005518730	Adaptive memory of media information
WO2004014060	METHOD AND APPARATUS FOR DETERMINING BLOCK MATCH QUALITY
JPH08275112	CONTROL METHOD FOR MEMORY STORAGE DEVICE AND GENERATOR OF OUTPUT SIGNAL REPRESENTING INITIAL CODING RATE

Inventors:

CIEPLINSKI LESZEK (GB)

Application Number:

PCT/GB2007/004661

Publication Date:

July 17, 2008

Filing Date:

December 06, 2007

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MITSUBISHI ELECTRIC INF TECH (GB)
MITSUBISHI ELECTRIC CORP (JP)
CIEPLINSKI LESZEK (GB)

International Classes:

H04N7/26

Other References:

SEGALL A: "Transcoding in Scalability Info SEI" VIDEO STANDARDS AND DRAFTS, XX, XX, no. JVT-U044, 22 October 2006 (2006-10-22), XP030006690
SEGALL A: "CE8: SVC-to-AVC bitstream rewriting for CGS" VIDEO STANDARDS AND DRAFTS, XX, XX, no. JVT-U043, 22 October 2006 (2006-10-22), XP030006689
YE-KUI WANG ET AL: "Storage of AVC buffering parameters in AVC file format" VIDEO STANDARDS AND DRAFTS, XX, XX, no. M12059, 29 April 2005 (2005-04-29), XP030040767

Attorney, Agent or Firm:

PICKER, Madeline, Margaret et al. (26 Caxton Street, London SW1H ORJ, GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS:

1. A method for encoding a video bitstream, comprising: determining if a video bitstream, to be transmitted, may be subject to rewriting, and if so: determining at least one set of parameters for use by a decoder when bitstream rewriting has been carried out on a received bitstream.

2. A method as claimed in claim 1, further comprising; inserting the determined parameters in the encoded bitstream for transmission.

3. A method as claimed in claim 1 or claim 2, wherein the parameters comprise one or more of: buffer size, initial buffer fullness or initial delay and maximum bitrate.

4. A method as claimed in claim 1, 2 or 3, wherein the step of determining if a video bitstream, to be transmitted, may be subject to rewriting, comprises: determining if the video bistream is scalable, and if so, determining that the video bitstream may be subject to rewriting.

5. A method as claimed in claim 4, wherein the step of determining if the video bistream is scalable comprises determining if the bitstream encoding is in accordance with MPEG-4 SVC.

6. A method as claimed in any one of claims 1 to 5, wherein the method comprises: inserting in the bitstream a flag to indicate the presence of said determined at least one set of parameters in the bitstream.

7. A method as claimed in claim 6, wherein the flag can be set only for instances of Coarse Grain Scalability (CGS) layers.

8. A method as claimed in claim 6 or claim 7, further comprising: inserting in the bitstream maximum bit rate (R) and decoder buffer size (B) parameters, of said at least one determined set of parameters, the or each set of parameters for an instance (scalable layer) of scalable coding.

9. A method as claimed in claim 8, wherein the maximum bit rate and decoder buffer size parameters of the or each set of parameters is inserted into the Video Usability Information (VUI) message of an MPEG-4 SVC scalable bitstream.

10. A method as claimed in claim 9, wherein the or each set of parameters is inserted in vui_parameters as additional instances of hrd_parameters as specified in MPEG 4 SVC.

11. A method as claimed in claim 9, wherein the or each set of parameters is inserted as modified hrd_parameters as specified in MPEG 4 SVC, including, for the or each instance (scalable layer), a flag indicating the presence of said parameters together with said corresponding set of parameters.

12. A method as claimed in any one of claims 6 to 11 , further comprising: inserting in the bitstream initial delay (D) parameters of said at least one determined set of parameters, the or each set of parameters for an instance (scalable layer) of scalable coding.

13. A method as claimed in claim 12, wherein the bitstream is encoded at a variable bitrate and the delay parameters further comprise an offset delay parameter.

14 A method as claimed in claim 12 or claim 13, further comprising: inserting in the bitstream removal delay parameters of said at least one determined set of parameters, the or each set of parameters for an instance (scalable layer) of scalable coding.

15. A method as claimed in claim 12 or claim 13, wherein the initial delay and removal delay parameters of the or each set of parameters is inserted into the Supplemental Enhancement Information (SEI) message of an MPEG-4 SVC scalable bitstream.

16. A method as claimed in any preceding claim, wherein the encoder determines the or each set of parameters for Context Adaptive Binary Arithmetic Coding (CABAC) entropy coding.

17. A method as claimed in any preceding claim, wherein the encoder determines the or each set of parameters for Context Adaptive Variable- Length Coding (CAVLC) entropy coding.

18. A method as claimed in any preceding claim, wherein the or each set of parameters includes a set of parameters for bistream verification by all NAL units and a set of parameters for bitstream verification by only Video Coding Layer (VCL) NAL units.

19. An encoder comprising apparatus for performing a method of encoding a video bitstream as claimed in any one of claims 1 to 18.

20. A computer readable medium comprising instructions that, when executed by a computer, perform a method of encoding a video bitstream as claimed in any one of claims 1 to 18.

21. An apparatus for rewriting a video bitstream encoded using the method of any one of claims 1 to 18, the apparatus rewriting the bitstream with, or using, one of said at least one set of parameters.

22. An apparatus as claimed in claim 21, comprising a decoder or an intermediate device between an encoder and a decoder.

23. An apparatus for decoding a video bitstream encoded using the method of any one of claims 1 to 18, the apparatus adapted to detect the presence of said one or more sets of parameters in a received, encoded video bitstream.

24. A method for decoding a video bitstream encoded using the method of any one of claims 1 to 18, the method comprising: receiving the encoded video bitstream; determining if a video bitstream can be decoded by the decoder without rewriting, and if it is determined that the video bitstream can be decoded without rewriting, decoding the bitstream, or if it is determined that the video bitstream cannot be decoded without rewriting, performing rewriting on the video bitstream, and decoding the rewritten video bitstream using one of said at least one set of parameters.

25. A method as claimed in claim 24, further comprising, prior to said step of decoding the rewritten bitstream: selecting one of said at least one set of parameters for use in decoding based on the coded picture size.

26. An apparatus for decoding a video bitstream using a method as claimed in claim 24 or claim 25.

27. A computer readable medium comprising instructions that, when executed by a computer, perform a method of decoding a video bitstream as claimed in claim 24 or claim 25.

Description:

GENERALISED HYPOTHETICAL REFERENCE DECODER FOR SCALABLE VIDEO CODING WITH BITSTREAM REWRITING

BACKGROUND OF THE INVENTION FIELD OF THE INVENTION

The present invention relates to video coding, and more particularly to an apparatus and method for scalable video coding.

DESCRIPTION OF THE RELATED ART Scalable video coding

Video coding has traditionally been employed to optimize video quality at a given size, frame rate and bitrate (single-layer coding). The emergence of increasingly complex networks has led to growing interest in the development of a video codec that can dynamically adapt to the network architecture and temporal variations in network conditions such as bandwidth and error probability. Channel bandwidth may easily vary by several orders of magnitude between different users on the same network. Furthermore, the rapid progression towards network inter-connectivity has meant that devices such as mobile phones, handheld personal digital assistants and desktop workstations, each of which have different display resolutions and processing capabilities, may all have access to the same digital media content. The recent progress in video coding research is enabling the development of such codecs that achieve compression performance comparable to the state of the art non- scalable solutions, such as MPEG-4 Advanced Video Coding standard (ISO/IEC 14496-10, also known as MPEG-4 AVC/H.264), while providing a wide range of adaptability in spatial resolution, frame rate and quality/bitrate.

The most distinctive feature of scalable video coding, as compared to traditional techniques, is the flexibility of the codec design, which allows for almost arbitrary combinations of bitstream layers in temporal, spatial and

quality dimensions. Scalable video coding aims to address the diversity of video communications networks and end-user interests, by compressing the original video content in such a way that efficient reconstruction at different bit-rates, frame-rates and display resolutions from the same bitstream is supported. Bit-rate/quality scalability refers to the ability to reconstruct a compressed video over a fine gradation of bitrates, without loss of compression efficiency. This allows a single compressed bitstream to be accessed by multiple users, each user utilizing all of their available bandwidth. Without quality scalability, several versions of the same video data would have to be made available on the network, significantly increasing the storage and transmission burden. Other important forms of scalability include spatial resolution and frame-rate (temporal resolution) scalability. These allow the compressed video to be efficiently reconstructed at various display resolutions, thereby catering for the different capabilities of all sorts of end-user devices. This is particularly significant as the new audio-visual applications and products require adaptation of the content to complex network architectures, diverse user terminals and varying bandwidths and error conditions. Traditional, non-scalable codecs are not well-suited for such environments and this is where the benefits of scalability are most clearly visible.

These advantages of scalable video coding have resulted in significant interest from industry, which led to the joint standardization activity currently taking place in ISO/IEC Motion Pictures Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) under the auspices of the Joint Video Team (JVT), which will become MPEG-4 Scalable Video Coding standard (MPEG- 4 SVC). The work on the new standard is in the final stages and its completion is planned for early 2007. The current draft of the standard (which will become ISO/IEC 14496-10/AMD2 and ITU-T Recommendation H.264 Annex F) is the Joint Draft 8, Joint Video Team document JVT-U201. An overview of the new codec can be found in a paper by H. Schwartz, D. Marpe

and T. Wiegand entitled "Overview of the Scalable Extension of the H.264/MPEG-4 AVC Video Coding Standard", IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, pl9 September 2007 (also published as draft JVT document JVT-U145).

The overall structure of the scalable coder is illustrated in Figure 1 for two levels of spatial scalability. Pyramid spatial decomposition is used so the video is downsampled the required number of times prior to further processing. If required, temporal downsampling can be performed at the same time by dropping frames that are not needed in the lower layers. The signal is subsequently passed to the "core encoders", which are similar to the non- scalable MPEG-4 AVC coders with extensions for inter-layer prediction and quality scalability.

Temporal scalability is achieved by the use of B frames (Bi-predicted pictures). In contrast to the previous coding standards, MPEG-4 AVC allows for the B frames to be used for further prediction. This feature is used to perform hierarchical temporal decomposition, which allows for multiple layers of temporal scalability while at the same time providing significant compression efficiency improvement.

Similarly as in MPEG-4 AVC, the frames are processed macroblock by macroblock. The range of coding modes for macroblocks is extended by the addition of inter-layer prediction modes. For motion-compensated macroblocks, it is possible to re-use the motion vectors from lower layers if they provide sufficiently good prediction, otherwise new motion vectors are sent in the enhancement layer. The texture of intra macroblocks can be predicted from lower layer macroblocks instead of their spatial neighbourhood, as in MPEG-4 AVCs intra-frame prediction. Similar prediction process can be also applied to the motion compensation prediction residual for inter-prediction macroblocks. The selection of the scalable or

non-scalable prediction modes is based on a rate-distortion optimisation process, which is a generalisation of a technique familiar from non-scalable coding. The use of the scalable macroblock modes is signalled using a set of flags designed for coding efficiency and minimisation of changes to the non- scalable decoder processes.

The prediction residual is encoded using one of the transforms specified in MPEG-4 AVC. The resulting transform coefficients are quantised and entropy coded to obtain base quality level for the given spatial layer. Quality enhancements can be provided either at coarse (CGS) or fine granularity (FGS). Coarse granularity is achieved using similar prediction modes as those used for spatial scalability and is more efficient when a limited number of quality layers are required. FGS (called progressive refinement in the MPEG- 4 SVC draft) is achieved using a variation of bitplane coding. The transform coefficients are coded in multiple passes (over the whole picture), with every pass containing a refinement of the representation of the coefficients sent in the previous pass.

The CGS approach is less flexible than FGS because of a more limited choice of available bitrates and because switching between the layers can only be performed at pre-determined points. On the other hand, the impact of switching at "illegal" points can be controlled in many applications and CGS has the advantage of being significantly simpler both in terms of implementation and computational complexity. It is therefore considered to be a more suitable choice for applications that do not require the extreme flexibility supported by FGS.

Generalised hypothetical reference decoder

An important aspect of the design of a practical video encoding system is the control of the operation of the encoder to ensure that it is possible to decode the generated bitstream smoothly and using limited resources, hi order to

achieve this, an idealised model of the decoder was introduced. This model is usually called Video Buffer Verifier (VBV in MPEG-2 Video, ISO/IEC 13818-2) or Hypothetical Reference Decoder (HRD in MPEG-4 AVC, ISO/IEC 14496-10). An overview of the operation of these models, as well as a description of a Generalised HRD, which is supported by the MPEG-4 AVC standard is given by J. Ribas-Corbera, P. A. Chou and S. L. Regunathan in "A Generalised Hypothetical Reference Decoder for H.264/AVC", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 67 - 687, July 2003.

The operation of the HRD is based on the concept of leaky bucket model as a model of the operation of the encoder. It is assumed that the encoder instantaneously encodes frame i into bj bits at time s _{ and outputs the resulting bits into a buffer, which "leaks" the bits into the transmission channel at a certain rate R. The time instants correspond to the frame rate of the video, which is usually constant, i.e. the interval Sj ₊ j - Si is constant. The buffer can be drained at either constant bitrate (CBR), in applications such as broadcasting or variable bitrate (VBR), e.g. in storage applications such as DVD playback.

The behaviour of the HRD is similarly idealised. It is assumed that the decoder receives data bits at a certain rate, stores them in a memory buffer (called coded picture buffer, or CPB in MPEG-4 AVC) and at time instants U - S _t + δ, instantaneously decodes frame i and removes the corresponding bj bits from the CPB. The correct operation of the HRD consists in ensuring that at any frame time tj, the buffer contains enough bits to decode the picture (i.e. does not underflow) but not more bits than the total buffer size (i.e. does not overflow). If the former occurred, the decoder would not be able to decode the picture, resulting in a delay of its display. In the latter case, the decoder would be forced to remove some bits from the buffer before their decoding, which would lead to loss of at least one picture and, in the typical case of motion

compensated coding, incorrect decoding of the following pictures until the next synchronisation point in the bitstream (typically an I picture, which does not use motion-compensated prediction) is reached. The correct operation of the HKD is illustrated in Figure 2.

It can be shown that the behaviour of the leaky bucket model determines the behaviour of the HRD coded picture buffer and that in order to ensure the correct operation of the HRD, the bitstream must obey certain constraints, which can be characterised by three parameters: • The size of the decoder memory buffer assigned to holding the coded data prior to decoding B.

• The maximum rate R.

• The initial buffer fullness F or equivalently, the initial delay D. Given maximum rate R, the delay can be derived from buffer fullness as D = F/R.

As illustrated in the Figure 2, the decoder buffer is filled at rate R with bits for the time D (from to-D to to), at which point it reaches the initial buffer fullness F. At this point, b ₀ bits are removed and instantaneously decoded to produce the first decoded picture. Then bits continue flowing into the buffer at rate R and at time tj another bj bits are removed to decode the next picture etc. hi order to ensure that the decoder's buffer does not underflow or overflow, the encoder needs to maintain a matching "leaky bucket" model and verify that its buffer does not overflow or underflow. If the encoding is performed offline, i.e. the whole stream is encoded and stored prior to transmission and decoding, it is also possible to calculate the HRD parameters for an existing bitstream from the sizes of individual encoded frames.

While the earlier video coding standards assume that the bitstream will be consumed under fixed constraints, MPEG-4 AVC provides for a variety of consumption options by generalising the HRD. Such a generalisation is

particularly useful for video storage (e.g. DVD) or streaming applications, where a single video bitstream may be delivered in different ways (different network bandwidth and reliability, different decoder capabilities). The basic idea of the extension is to enable different tradeoffs between the constraints on the memory buffer size (B), the available bandwidth (R), and initial delay (D). The interesting aspect of this generalisation is that it is possible for a single bitstream to support multiple HRD parameter sets. Similarly as for the single HRD case, both offline and online modes of operation are possible.

In MPEG-4 AVC, the bulk of the HRD parameters (including the additional parameters for generalised HRD) are sent in two parts:

• the buffer size and maximum bitrate information is part of the Video Usability Information (VUI) message, which is expected to be fixed for a given channel and decoder. This is contained in hrd_parameters part of the VUI message.

• the delay information, which may change within the sequence and which is not required to be fixed for a given channel/decoder is sent in the buffering period and picture timing Supplemental Enhancement Information (SEI) messages.

Since the bitstream verification in the HRD can be performed either for all NAL units or only VCL (video coding layer) NAL units, two sets of parameters may be sent. As the checking procedure is the same in both cases we will only describe one set of parameters in the following.

The hrdjparameters are sent as a part of the vui_parameters, specified in section E of the MPEG-4 AVC standard, which in turn is a part of the sequence parameter set (SPS). The relevant part of the vui_parameters syntax table is as follows:

The syntax table for the hrd_parameters is as follows:

The most important elements of hrd_parameters are

• cpb_cnt_minusl: this element specifies the number of alternative decoder buffer (CPB) models supported by the generalised HRD, as discussed above. Up to 32 CPB specifications are allowed.

• bit_rate_value_minusl[SchedSelIdx]: this element is used to derive the maximum input bitrate for the instance of the generalised HRD specified by index SchedSelldx. This corresponds to parameter R in the description above. The CPBs are ordered in the order of increasing bitrates.

• cpb_size_value_minusl [SchedSelldx]: this element is used to derive the size of the decoder buffer (CPB) for the instance of the generalised

HRD specified by index SchedSelldx. This corresponds to parameter B in the description above. The CPBs are ordered in the order of decreasing buffer sizes.

• cbr_flag[SchedSelIdx]: this element specifies whether the instance of the generalised HRD specified by index SchedSelldx operates in constant bitrate (CBR) mode.

Additionally, low_delay_hrd_fiag is sent when hrd_parameters are present in the bitstream. If this flag is equal to 1 the bitstream may occasionally contain access unit which violate the nominal CPB removal time (e.g. if an intra coded frame required a high number of bits, the input frame immediately following it may be skipped). This is expected to be used in real-time visual communications (e.g. visual telephony), where low delay is more important than completely smooth playback.

The buffering_period SEI message is specified in section D.I.I and D.2.1 of the MPEG-4 AVC standard. The syntax table for this message is as follows:

The elements relevant to HRD behaviour are:

• initial_cpb_removal_delay[SchedSelIdx]: this syntax element is used to derive the initial delay D defined above for the instance of the generalised HRD specified by index SchedSelldx.

• initial_cpb_removal_delay_offset[SchedSelIdx]: this syntax element is only used when the stream is encoded at variable bitrate. It is used to specify the im ^' tial arrival time for the following NAL unit when the decoder buffer (CPB) is full.

The picture_timing SEI message is specified in sections D.1.2 and D.2.2 of the MPEG-4 AVC standard. The relevant part of the syntax table is:

The element to HRD operation is:

• cpb_removal_delay: this syntax element is used to specify the removal time of the associated access unit from the decoder buffer (CPB) with respect to the removal time of the first access unit in the current buffering period (or the first access unit of the previous buffering period for the first access unit in the new buffering period that does not initialise the HRD). In low-delay mode of operation, the actual removal time may have to be higher to ensure that the whole access unit has arrived in the CPB before it is removed.

For the new MPEG-4 SVC standard, the same principles can be applied to each scalable layer (as identified by dependency_id syntax element). In JVT document WT-Ul I l entitled "SVC Hypothetical Reference Decoder", an extension of the HRD is proposed, which implements this idea by adding a set of HRD parameters for each dependency_id in the scalable bitstream. The semantics of the bufferingjperiod and picture_timing SEI messages are also extended to allow for the timing information to be provided for each dependency_id.

Bitstream rewriting

In JVT document JVT-T061 entitled "SVC-to-AVC Bit-stream Rewriting for Coarse Grain Scalability", A. Segall introduces the concept of bitstream rewriting for CGS-scalable bitstreams. The goal of bitstream rewriting is to

allow lossless modification of a bitstream conforming to MPEG-4 SVC standard into a bitstream conforming to the MPEG-4 AVC standard. Such functionality is expected to be very useful for so called "legacy" devices, i.e. those that only support the existing MPEG-4 AVC standard and cannot be easily modified to support the MPEG-4 SVC standard. The bitstream rewriting, which can be performed by an intermediate device, and which is much less complex than full decoding, can be used to adapt the incoming scalable bitstream into a form that can be consumed by such a legacy device.

In the general case of scalability (particularly spatial), such rewriting would not be possible without introducing a difference in the pixel values of the decoded picture. However, the author argues that it is possible to achieve this when only CGS is used if certain restrictions are placed on the syntax elements' values. Since the restrictions result in somewhat reduced compression performance of the scalable bitstream (without, however compromising the compression efficiency for the single-layer bitstream resulting from rewriting), it is proposed to signal the use of these additional restrictions using syntax element called svc_rewrite_flag in the bitstream. The decoding process also needs to be modified to deal with the changed semantics of the bitstream.

An additional possibility of changing the entropy coding mode during rewriting from CABAC (context-adaptive binary arithmetic coding) to CAVLC (context-adaptive variable-length coding) or vice versa is introduced in the JVT document JVT-U044 entitled "Transcoding in Scalability Info SEI". This document proposes the extension of the Scalability Info SEI message to support bitstream rewriting and also raises the possibility of a change in entropy coding mode during the rewriting. The author therefore proposes the inclusion of the bitrate information corresponding to the bitstream being rewritten using both CABAC and CAVLC entropy coding.

The generalised hypothetical reference decoder (GHRD) developed for the single-layer codec MPEG-4 AVC/H.264 is generally considered sufficient for MPEG-4 SVC as it can be applied on layer-by-layer basis. While this is true in the usual application of MPEG-4 SVC, when "bitstream rewriting" is used, the assumptions made for the calculation of the HRD parameters become invalid thus making the resulting values of the parameters invalid as well. Namely, the coded picture sizes in bits, which are used in the derivation of buffer sizes, the initial buffer fullness values and the maximum rates change. This may result in incorrect decoder operation, resulting from buffer overflows (when the number of bits in the coded pictures decreases as a result of rewriting) and buffer underflows (when the number of bits in the coded pictures increases). It should be noted that this information cannot be generated by the device performing the rewriting.

This problem exists in both operating modes of the encoder, i.e. when the parameters are used to control the encoder and when they are computed after the bitstream has been created.

SUMMARY OF THE INVENTION According to a first aspect, the present invention provides a method for encoding a video bitstream as defined in accompanying claim 1.

Other aspects of the present invention include an encoder adapted to perf Loi rm the encoding method of the first aspect of the present invention, and a computer readable medium comprising instructions that, when executed, perform the encoding method of the first aspect of the present invention.

According to another aspect, the present invention provides an apparatus, for example a decoder or an intermediate device, for rewriting a video bitstream encoded using the method of the first aspect of the present invention, the

apparatus rewriting the bitstream with, or using, one of said at least one set of parameters.

According to another aspect, the present invention provides a method for decoding a video bitsteam encoded using the method of the first aspect as defined in accompanying claim 23.

Yet another aspect of the present invention provides a decoder for performing the above decoding method.

The invention relates to use of the features (for example, additional HRD parameters and syntax) as set out below. The invention relates especially to a method of encoding a sequence of images representing a video, and similarly, a method of decoding a sequence of images representing a video, using features as set out below. Furthermore, the invention relates to an apparatus for encoding and/or decoding a sequence of images representing an image using features as set out below.

Similarly, the invention relates to the use of modifications of prior art techniques (including encoding, decoding, coder/decoder apparatus) as set out below.

In general terms, the inventive idea is as follows.

In order to ensure correct operation of the (G)HRD when bitstream rewriting is used, additional HRD parameters are defined, derived from the coded pictures sizes resulting from bitstream rewriting.

Other preferred and optional features of embodiments of the invention will be apparent from the following description and accompanying claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be described with reference to the accompanying drawings, of which:

Fig. 1 is a block diagram of a scalable video codec; and

Fig. 2 is a graph illustrating decoder buffer fullness as a function of time.

DESCRIPTION OF THE EMBODIMENTS An embodiment of the present invention comprises a method for providing additional HRD parameters (or equivalent) when encoding a video bitstream. The additional HRD parameters are for use by a decoder, or intermediate device, when bitstream rewriting is performed by a device receiving the encoded bitstream. The preferred method of providing this information consists in extending the VUI (video usability information) message of the MPEG-4 AVC/SVC standard for scalable bitstream by adding to the usual HRD parameters a complementary set of parameters for each CGS layer present in the bitstream. A similar set of changes are also made for the SEI messages relating to the functioning of the HRD, i.e. the buffering_period and picture_jtiming SEI messages. In order to support the option of a change in entropy coder during the rewriting, this information is optionally provided separately for the CAVLC and CABAC entropy coders.

The additional HRD parameters are provided by reusing the existing hrd_pararneters structure, and providing additional instances of it in the vui_parameters using the following syntax:

where hrd_parameters() has the same syntax and semantics as in MPEG-4 AVC. The new flags (nal_hrd_parameters_rewriting_cabac_present_flag, etc.) can only be equal to 1 for CGS layers (extended_spatial_scalability = 0) and the bitstream rewriting is in use (avc_rewrite_flag = 1). This implementation has the advantage of requiring the least modification in the syntax tables but the disadvantage that some of the fields in hrd_parameters are unnecessarily repeated.

An alternative approach therefore leaves the vui_parameters unchanged but modifies the hrd_parameters resulting in the following syntax table:

Note that in this case we have a flag for every dependericy_id (i.e. scalable layer), which can be used to restrict the provision of additional HRD parameters to only those layers which are required to be verified. Similarly as in the first case, text in the semantics section is then needed to clarify that these flags can only be equal to 1 for CGS layers

(extended_spatial_scalability = 0) and when svc_rewrite_flag is equal to 1. Alternatively, the presence of the flags can be conditioned on the value of extended_spatial_scalability value in the syntax table, resulting in slightly lower overhead. The new flags (hrd_parameters_rewriting_cabac _flag[i] and hrdjparameters _rewriting_cavlc _flag[i]) simply signal the presence of the additional parameters. The semantics of all the new parameters: cabac_bit_rate_value_minusl[ i ][ SchedSelldx ], cabac_cpb_size_value_minusl[ i ][ SchedSelldx ], cabac_cbr_flag[ i ][ SchedSelldx ], cavlc_bit_rate_value_minusl[ i ][ SchedSelldx ], cavlc_cpb_size_value_minusl[ i ][ SchedSelldx ] and cavlc_cbr_flag[ i ][ SchedSelldx ] is the same as in the original MPEG-4 AVC specification, with the exception that the additional parameters apply to the corresponding CGS layers after bitstream rewriting.

This second approach has the advantage of reducing the number of bits used for signalling of the additional HRD parameters, as fields such as bit_rate_scale, cpb_scale_size, etc. are not repeated. In contrast, the first approach repeats these fields in every instance of hrd_parameters, which not only increases the overhead, but also requires that the redundant fields be constrained to be the same in all the instances of hrdjparameters. On the other hand, allowing different values of these fields for different layers does add flexibility to the design although at the expense of additional complexity.

The implementation of the bufferingjperiod SEI message needs to be extended to include the delay information for the rewritten bitstreams. The new syntax table is as follows.

The flags signalling the presence of the additional parameters in the above text are taken to be the corresponding flags in vui_parameters. If the second implementation of the HRD parameters syntax is used, they are replaced with the corresponding values of hrdjparameters_rewriting_cabac_flag[i] and hrd_parameters_rewriting_cavlc_flag[i]. The semantics of all the new parameters: cabac_initial_cpb_removal_delay[ SchedSelldx ], cabac_initial_cpb_removal_delay_offset[ SchedSelldx ], cavlc_initial_cpb_removal_delay[ SchedSelldx ],

cavlc_initial_cpb_removal_delay_offset[ SchedSelldx ] is the same as in the original MPEG-4 AVC specification, with the exception that the additional parameters apply to the corresponding CGS layers after bitstream rewriting.

The syntax table of the picture_timing SEI message is amended as follows:

The interpretation of the flags is similar as for the buffering period SEI message with the exception that the flags are set equal to 1 when at least one of the corresponding NAL-related or VCL-related flags are equal to 1. The semantics of all the new parameters: cabac_cpb_removal_delay, cabac_dpb_removal_delay, cavlc_cpb_removal_delay and cavlc_dpb_removal_delay is the same as in the original MPEG-4 AVC specification, with the exception that the additional parameters apply to the corresponding CGS layers after bitstream rewriting.

In the description above, the additional parameters for both CABAC and CAVLC entropy coders are provided. In an alternative implementation, the change of entropy coder during the bitstream rewriting is not allowed and, consequently, only one flag is necessary. This results in all the syntax tables above being reduced to a single case and only one flag called hrd_parameters_rewriting_present_flag being used.

Thus, an embodiment of the present invention enables an encoder to provide additional information, in the form of one or more sets of HRD parameters, in the encoded bitstream. The information is provided when the video bitstream may be subject to bitstream rewriting by a decoder or intermediate device receiving the encoded bitstream. Each set of HRD parameters, which corresponds to an instance or scalable layer of the scalable coding, is determined by the encoder based on the coded picture sizes that would result from bitstream rewriting, for example using the techniques outlined below.

With this information available in the encoded video bitstream, in accordance with an embodiment of the present invention, the device that performs the bitstream rewriting detects the presence of the one or more sets of HRD parameters in the encoded bitstream. The device then selects the set of HRD parameters corresponding to the bitstream resulting from the rewriting, and discards other sets of HRD parameters. Alternatively, the parameter sets corresponding to all the available CGS layers are sent to the receiver and are used in the negotiation of the highest rewritten CGS layer that can be used given the constraints on the bitrate and the corresponding HRD parameters.

As in the case of single-layer coder, the HRD parameters can be used in two ways. In the first case, the bitstream is generated first and the calculation of the HRD parameters, by the encoder, is performed using an algorithm similar to that proposed in "A Generalised Hypothetical Reference Decoder for

H.264/AVC" supra. Alternatively, the HRD parameters are used to drive the encoder to ensure that pre-set conditions are met using a rate control algorithm such as the one described in the JVT document JVT-HOl 7: "Proposed Draft of Adaptive Rate Control". In the case of real-time operation, the encoder has to ensure that the bitstream obeys the constraints for both (all three when both CABAC and CAVLC are considered for bitstream rewriting) coded picture sizes for each leaky buffer.

The invention can be implemented using a system similar to a prior art system with suitable modifications .

The invention is preferably implemented by processing electrical signals using a suitable apparatus.

The invention can be implemented for example in a computer system, with suitable software and/or hardware modifications. For example, the invention can be implemented using a computer or similar having control or processing means such as a processor or control device, data storage means, including image storage means, such as memory, magnetic storage, CD, DVD etc, data output means such as a display or monitor or printer, data input means such as a keyboard, and image input means such as a scanner, or any combination of such components together with additional components. Aspects of the invention can be provided in software and/or hardware form, or in an application-specific apparatus or application-specific modules can be provided, such as chips. In other words, aspects of the invention may be provided in the form of a computer-readable storage medium storing computer-executable steps for executing the aspects of the invention. Components of a system in an apparatus according to an embodiment of the invention may be provided remotely from other components, for example, over the internet. A suitable coder and a corresponding decoder (or codec)

may have, for example, corresponding components for performing the inverse coding and decoding operations.

As the skilled person will appreciate, many variations and modifications can be made to the described embodiments. It is intended to include all such variations, modifications and equivalents to the described embodiments, that fall within the scope of the present invention, as defined in the accompanying claims.

Previous Patent: TOBACCO PACK

Next Patent: IMPROVED IMAGE IDENTIFICATION