Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SCALABLE VIDEO CODING AND DECODING
Document Type and Number:
WIPO Patent Application WO/2008/007337
Kind Code:
A3
Abstract:
A system and method for conveying supplemental enhancement information related to different pictures in one access unit. Different pictures in one access unit may be pictures of each layer of a scalable video, pictures of each view of a multiview video, or pictures of each description in a multiple description coding (MDC) video. An indication is included in the coded bitstream indicating the pictures with which the supplemental enhancement information message is associated. During a subsequent decoding process, the device performing the decoding recognizes this information and uses it appropriately.

Inventors:
HANNUKSELA MISKA (FI)
WANG YE-KUI (FI)
Application Number:
PCT/IB2007/052752
Publication Date:
April 10, 2008
Filing Date:
July 10, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA CORP (FI)
NOKIA INC (US)
HANNUKSELA MISKA (FI)
WANG YE-KUI (FI)
International Classes:
H04N7/26
Domestic Patent References:
WO2005074295A12005-08-11
WO2007042916A12007-04-19
Foreign References:
US20040006575A12004-01-08
US20050175098A12005-08-11
Other References:
See also references of EP 2041978A4
Attorney, Agent or Firm:
ALBERT, G.Peter (11250 El Camino RealSuite 20, San Diego CA, US)
Download PDF:
Claims:

WHAT IS CLAIMED IS:

1. A method for encoding video into a scalable video bitstream, comprising: encoding a plurality of pictures into an access unit; and encoding a first indication associated with the access unit; and encoding at least one message associated with the first indication, the first indication indicating which encoded pictures in the access unit the at least one message applies to.

2. The method of claim 1 , wherein the video bitstream is a scalable video bitstream.

3. The method of claim 2, wherein the first indication includes dependencyjd and quality_level values.

4. The method of claim 1 , wherein the first indication comprises a nesting supplemental enhancement information (SEl) message, and wherein the at least one message comprises any SEI message other than a nesting SEI message.

5. The method of claim 4, wherein the nesting SEI message further includes a third indication associating the nested SEI message to at least one redundant coded picture.

6. The method of claim 1, wherein the first indication comprises a second indication indicating that the at least one message applies to all encoded pictures in the access unit.

7. The method of claim 1, wherein, if the first indication indicates whether the at least one message applies to either only some of the encoded pictures in the access unit or to all of the encoded pictures in the access unit.

8. The method of claim 1 , wherein the video bitstream comprises a multi- view video bitstream.

9. The method of claim 8, wherein the first indication includes a view_id value,

10. The method of claim 1, wherein the video bitstream comprises a multiple description bitstream.

11. The method of claim 10, wherein the first indication includes a description_id value.

12. The method of claim 1, wherein the video bitstream comprises a combination scalable video and multiview video bitstream.

13. The method of claim 1 , wherein the video bitstream comprises a combination scalable video, multiview video and multiple description bitstream.

14. A computer program product, embodied in a computer-readable medium, for encoding video into a scalable video bitstream, comprising computer code configured to perform the process of claim 1.

15. An apparatus, comprising: a processor; and a memory unit communicatively connected to the processor and including: computer code for encoding a plurality of pictures into an access unit; and computer code for encoding a first indication associated with the access unit; and computer code for encoding at least one message associated with the first indication, the first indication indicating which encoded pictures in the access unit the at least one message applies to.

16. The apparatus of claim 15, wherein the video bitstream is a scalable video bitstream.

17. The apparatus of claim 15, wherein the first indication comprises a nesting supplemental enhancement information (SEI) message, and wherein the at least one message comprises any SEI message other than a nesting SEI message.

18. The apparatus of claim 15, wherein the first indication comprises a second indication indicating whether the at least one message applies to either only some encoded pictures in the access unit or all encoded pictures in the access unit.

19. The apparatus of claim 15, wherein, if the first indication indicates whether the at least one message applies to at least some of the encoded pictures in the access unit.

20. The apparatus of claim 15, wherein the video bitstream comprises a multiple description bitstream,

21. The apparatus of claim 15, wherein the video bitstream comprises a combination scalable video and multiview video bitstream.

22. The apparatus of claim 15, wherein the video bitstream comprises a multi-view video bitstream.

23. The apparatus of claim 15, wherein the video bitstream comprises a combination scalable video, multiview video and multiple description bitstream.

24. A method for processing video from a scalable video bitstream, comprising: decoding a first indication associated with an access unit; processing at least one message associated with the first indication, the first indication indicating which of an encoded plurality of pictures in the access unit the at least one message applies to; and processing from the bitstream the plurality of pictures from the access unit in accordance with the processed first indication.

25. The method of claim 24, wherein the video bitstream is a scalable video bitstream.

26. The method of claim 24, wherein the first indication comprises a nesting supplemental enhancement information (SEI) message, and wherein the at least one message comprises any SEI message other than a nesting SEI message.

27. The method of claim 26, wherein the nesting SEI message further includes a third indication associating the nested SEI message to at least one redundant coded picture.

28. The method of claim 24, wherein the first indication indicates whether the at least one message applies to either only some encoded pictures in the access unit or all of the encoded pictures in the access unit.

29. The method of claim 24, wherein the first indication further indicates which encoded pictures within the access unit the at least one message is associated with.

30. The method of claim 24, wherein the first indication includes dependency id and qualityjevel values.

31. The method of claim 24, wherein the video bitstream comprises a multi-view video bitstream.

32. The method of claim 31, wherein the first indication includes a view_id value.

33. The method of claim 24, wherein the video bitstream comprises a multiple description bitstream.

34. The method of claim 33, wherein the first indication includes a description_id value.

35. The method of claim 24, wherein the video bitstream comprises a combination scalable video and multiview video bitstream.

36. The method of claim 24, wherein the video bitstream comprises a combination scalable video, multiview video and multiple description bitstream.

37. A computer program product, embodied in a computer-readable medium, for processing video from a scalable video bitstream, comprising computer code configured to perform the process of claim 24.

38. An apparatus, comprising: a processor; and a memory unit communicatively connected to the processor and including: computer code for processing a first indication associated with an access unit; computer code for processing at least one message associated with the first indication, the first indication indicating which of an encoded plurality of pictures in the access unit the at least one message applies to; and computer code for processing from the bitstream the plurality of pictures from the access unit in accordance with the processed first indication.

39. The apparatus of claim 38, wherein the video bitstream is a scalable video bitstream.

40. The apparatus of claim 38, wherein the first indication comprises a nesting supplemental enhancement information (SEI) message, and wherein the at least one message comprises any SEI message other than a nesting SEI message.

41. The apparatus of claim 38, wherein the first indication indicates whether the at least one message applies to either only some encoded pictures in the access unit or all of the encoded pictures in the access unit.

42. The apparatus of claim 38, wherein the first indication further indicates which encoded pictures within the access unit the at least one message is associated with.

43. 4The apparatus of claim 38, wherein the video bitstream comprises a multi-view video bitstream.

44. The apparatus of claim 38, wherein the video bitstream comprises a multiple description bitstream.

45. The apparatus of claim 38, wherein the video bitstream comprises a combination scalable video and multiview video bitstream.

46. The apparatus of claim 38, wherein the video bitstream comprises a combination scalable video, multiview video and multiple description bitstream.

Description:

SCALABLE VIDEO CODINGAND DECODING

FIELD OF THE INVENTION

[0001] The present invention relates generally to Scalable Video Coding and Decoding. More particularly, the present invention relates to the use of Supplemental Enhancement Information messages in Scalable Video Coding and Decoding.

BACKGROUND OF THE INVENTION

[0002] This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.

[0003] Multimedia applications include local playback, streaming or on-demand, conversational and broadcast/multicast services. Technologies involved in multimedia applications include, among others, media coding, storage and transmission. Different standards have been specified for different technologies. [0004] Video coding standards include ITU-T H.261 , ISO/IEC MPEG-I Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC). In addition, there are currently efforts underway with regards to the development of new video coding standards. One such standard under development is the scalable video coding (SVC) standard, which will become the scalable extension to the H.264/ AVC standard (H.264/ AVC). Another such effort involves the development of China video coding standards. Another such standard under development is the multi-view video coding (MVC) standard, which will become another extension to H.264/ A VC. [0005] SVC can provide scalable video bitstreams. In SVC, a video sequence can be coded in multiple layers, and each layer is one representation of the video sequence at a certain spatial resolution or temporal resolution or at a certain quality level or

some combination of the three. A portion of a scalable video bitstream can be extracted and decoded at a desired spatial resolution, temporal resolution, a certain quality level or some combination of these resolutions. A scalable video bitstream contains a non-scalable base layer and one or more enhancement layers. An enhancement layer may enhance the temporal resolution (i.e. the frame rate), the spatial resolution, or simply the quality of the video content represented by the lower layer or part thereof. In some cases, data of an enhancement layer can be truncated after a certain location, even at arbitrary positions, and each truncation position can include some additional data representing increasingly enhanced visual quality. Such scalability is referred to as fine-grained (granularity) scalability (FGS). In contrast to FGS, the scalability provided by a quality enhancement layer that does not provide fined-grained scalability is referred as coarse-grained scalability (CGS). Base layers can be designed to be FGS scalable as well. SVC is one example of scalable coding of video. A draft of the SVC standard is described in JVT-S202, "Joint Scalable Video Model JSVM-6: Joint Draft 6 with proposed changes," 19th JVT Meeting, Geneva, Switzerland, April 2006.

[0006] In multiple description coding (MDC), an input media sequence is encoded into more than one sub-stream, each of which is referred to as a description. Each description is independently decodable and represents a certain media quality. However, based on the decoding of one or more descriptions, additional decoding of another description can result in an improved media quality. MDC is discussed in detail in Y. Wang, A. Reibman, and S. Lin, "Multiple description coding for video delivery," Proceedings of the IEEE, vol. 93, no.l, Jan. 2005.

[0007] In multi-view video coding, video sequences output from different cameras, each corresponding to a view, are encoded into one bitsream. After decoding, to display a certain view, the decoded pictures belong to that view are displayed. A draft of the MVC standard is described in JVT-T208, "Joint multiview video model (JMVM 1.0)," 20 th JVT meeting, Klagenfurt, Austria, July 2006. [0008J The H.264/AVC standard and its extensions include the support of supplemental enhancement information (SEI) signaling through SEI messages. SEI messages are not required by the decoding process to generate correct sample values

in output pictures. Rather, they are helpful for other purposes, e.g., error resilience and display. H.264/AVC contains the syntax and semantics for the specified SEI messages, but no process for handling the messages in the recipient is defined. Consequently, encoders are required to follow the H.264/AVC standard when they create SEI messages, and decoders conforming to the H.264/AVC standard are not required to process SEI messages for output order conformance. One of the reasons to include the syntax and semantics of SEI messages in H.264/ AVC is to allow system specifications, such as 3GPP multimedia specifications and DVB specifications, to interpret the supplemental information identically and hence interoperate. It is intended that system specifications can require the use of particular SEI messages both in encoding end and in decoding end, and the process for handling SEI messages in the recipient may be specified for the application in a system specification.

[0009] The mechanism for providing temporal scalability in the latest SVC specification is referred to as the "hierarchical B pictures" coding structure. This feature is fully supported by H.264/AVC, and the signaling portion can be performed by using sub-sequence-related SEI messages.

[0010] The SEI messages in H.264/AVC are described without any references to the scalable extension annex. Consequently H.264/AVC encoders generate and H.264/ AVC decoders interpret the messages as described and suggested by the semantics of the messages in the H.264/AVC standard, respectively, and the messages cannot be used as such for signaling the properties of pictures above the base layer in an SVC bitstream. The access units and pictures to which H.264/ AVC SEI messages pertain are specified in the semantics of each SEI message. For example, the information in a sub-sequence layer information SEI message is valid from the access unit that contains the SEI message until the next access unit containing a subsequence layer information SEI message, exclusive, or the end of the bitstream if no succeeding sub-sequence layer information SEI message is present. The pan-scan rectangle SEl message contains a syntax element (pan_scan_rectj:epetition_j>eriod), specifying for which pictures the message is valid. The sub-sequence information SEI message contains data that is valid only for the access unit that contains it.

[0011] An access unit according to the H.264/AVC coding standard comprises zero or more SEI messages, one primary coded picture, zero or more redundant coded pictures, and zero or more auxiliary coded pictures. In some systems, detection of access unit boundaries can be simplified by inserting an access unit delimiter into the bitstream. An access unit according to SVC comprises at least one coded picture that is not a redundant or auxiliary coded picture. For example, an SVC access unit may comprise one primary coded picture for the base layer and multiple enhancement coded pictures. A coded picture as described herein refers to all of the network abstraction layer (NAL) units within an access unit having particular values of dependency_id and qualityjevel.

[0012] There are a number of different possibilities for the scope of an SEI message. When an SEI message contains data that pertain to more than one access unit (for example, when the SEI message has a coded video sequence as its scope), it is contained in the first access unit to which it applies. SEI messages that contain data which pertain to a single access unit, such as scene information, stereo video, etc, equally apply for all of the pictures in the access unit. An SEI message may relate to filler data, user data, etc. and not be associated to any particular access unit. [0013] In scalable video coding, multiple description coding, multiview video coding, and other video coding methods, an access unit may comprise multiple coded pictures, wherein each picture is one representation of the video sequence or sequences at a certain spatial resolution, temporal resolution, certain quality level, view, description or some combination thereof. In certain applications, for example, it may be desirable to apply the method of pan and scan only to the pictures with the same picture size so that they can be shown on one type of display while a different type of pan and scan may be desired at a different picture size. In this situation, it would be desirable to have a mechanism for specifying which pictures within an access unit a particular SEI message applies to. For example, it would be helpful to have a mechanism for specifying a pan-scan rectangle for each picture size present in an access unit and have as many reference picture marking repetition SEI messages as needed to represent each possibility for memory management control operations.

[0014] Lastly, the semantics of an SEI message according to the H.264/AVC standard may apply only to the AVC picture (e.g., the base layer picture in SVC) in the access unit. In this situation, it may be desirable to extend the scope and semantics of the SEI message to any other picture in the access unit. This is the case for SEI messages indicating items such as spare pictures, sub-sequence layer characteristics, sub-sequence characteristics, motion-constrained slice group sets, film grain characteristics, deblocking filter display preferences, etc. For example, the current semantics of the statistical data in sub-sequence layer characteristics and subsequence characteristics are for the H.264/AVC base layer only, but similar statistics could also be meaningful for pictures with particular values of dependency_id and quality level. In another example, the indicated spare picture is sufficient for the base layer, but when the picture quality improves in enhancement layers, the corresponding picture in an enhancement layer may not be sufficient as a spare picture.

SUMMARY OF THE INVENTION

[0015] Various embodiments of the present invention provide systems and methods for conveying supplemental enhancement information related to encoded pictures in one access unit. In particular, the encoded pictures in one access unit may be pictures of each layer of a scalable video, pictures of each view of a multiview video, or pictures of each description in a multiple description coding (MDC) video. The system and method of various embodiments involves an indication in the coded bitstream indicating the pictures with which the supplemental enhancement information message is associated. The description herein is provided in terms of various embodiments using H264/AVC SEI messages as examples, with the addition of syntax and semantics that are used specifically for this purpose, with the appropriate syntax being added during the encoding process. During a subsequent decoding process, the device performing the decoding recognizes this information and uses it appropriately. The system and method enable the reuse of H.264/AVC SEI messages for scalability layers, views and/or MDC descriptions, with minor additions in the semantics of the messages.

[0016] Various embodiments of the present invention also provide a system and method for conveying supplemental enhancement information related to pictures of each layer of a scalable video, redundant coded pictures or auxiliary coded pictures in the coded bitstream. The system and method involves an indication in the coded bitstream indicating the pictures of a scalable video with which the supplemental enhancement information message is associated.

[0017] Various embodiments of the present invention comprise a method, computer program product and apparatus for encoding video into an encoded video bitstream. After a plurality of pictures are encoded into an access unit, a first indication associated with the access unit is encoded. At least one message associated with the first indication is then encoded. The first indication indicates which encoded pictures in the access unit the at least one message applies to.

[0018] Other embodiments of the present invention comprise a method, computer program product and apparatus for decoding video from an encoded video bitstream. Before encoding, during encoding, or after encoding a first indication associated with an access unit is decoded, at least one message associated with the first indication is decoded. The first indication indicates which of an encoded plurality of pictures in the access unit the at least one message applies to.

[0019] The plurality of pictures from the access unit are then processed from the bitstream in accordance with the decoded first indication.

[0020] These and other advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] Figure 1 shows a generic multimedia communications system for use with the present invention;

[0022J Figure 2 is a perspective view of a mobile telephone that can be used in the implementation of the present invention;

[0023] Figure 3 is a schematic representation of the circuitry of the mobile telephone of Figure 2;

[0024] Figure 4 is an illustration of the structure and order of elements of access units according to H.264/ A VC; and

[0025] Figure 5 is a representation of a scalable nesting SEI message constructed in accordance with one embodiment of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0026] Various embodiments of the present invention provide systems and methods for conveying supplemental enhancement information related to encoded pictures in one access unit. In particular, the encoded pictures in one access unit may be pictures of each layer of a scalable video, pictures of each view of a multiview video, or pictures of each description in a multiple description coding (MDC) video. The system and method of various embodiments involves an indication in the coded bitstream indicating the pictures with which the supplemental enhancement information message is associated. The description herein is provided in terms various embodiments using H264/AVC SEI messages as examples, with the addition of syntax and semantics that are used specifically for this purpose, with the appropriate syntax being added during the encoding process. During a subsequent decoding process, the device performing the decoding recognizes this information and uses it appropriately. The system and method enables the reuse of H.264/AVC SEI messages for scalability layers, views and/or MDC descriptions, with minor additions in the semantics of the messages.

[0027] Various embodiments of the present invention also provide a system and method for conveying supplemental enhancement information related to pictures of each layer of a scalable video, redundant coded pictures or auxiliary coded pictures in the coded bitstream. The system and method involves an indication in the coded bitstream indicating the pictures of a scalable video with which the supplemental enhancement information message is associated.

{0028] Figure 1 shows a generic multimedia communications system for use with the present invention. As shown in Figure 1 , a data source 100 provides a source

signal in an analog, uncompressed digital, or compressed digital format, or any combination of these formats. An encoder 110 encodes the source signal into a coded media bitstream. The encoder 110 may be capable of encoding more than one media type, such as audio and video, or more than one encoder 1 10 may be required to code different media types of the source signal. The encoder 110 may also get synthetically produced input, such as graphics and text, or it may be capable of producing coded bitstreams of synthetic media. In the following, only processing of one coded media bitstream of one media type is considered to simplify the description. It should be noted, however, that typically real-time broadcast services comprise several streams (typically at least one audio, video and text sub-titling stream). It should also be noted that the system may include many encoders, but in the following only one encoder 1 10 is considered to simplify the description without a lack of generality.

[0029] Some systems also include an editor (not shown in Figure 1) that receives the coded bitstream as input from the storage 120 and writes the result of the editing operations back to the storage 120. The editor modifies the coded media bitstream by removing and/or inserting data elements. For example, the editor may remove access units for which the temporal_level is higher than a determined threshold. [0030] Supplemental information may be generated either in the encoder 110 (during the encoding process of the video sequence) or in the editor (during an "editing" process of the coded video). For example, the pan-and-scan information may be generated after the encoding process.

[0031] The coded media bitstream is transferred to a storage 120. The storage 120 may comprise any type of mass memory to store the coded media bitstream. The format of the coded media bitstream in the storage 120 may be an elementary self- contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. Some systems operate "live", i.e. omit storage and transfer coded media bitstream from the encoder 110 directly to the sender 130. The coded media bitstream is then transferred to the sender 130, also referred to as the server, on a need basis. The format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, or one or more coded media

bitstreams may be encapsulated into a container file. The encoder 110, the storage 120, and the sender 130 may reside in the same physical device or they may be included in separate devices. The encoder 110 and sender 130 may operate with live real-time content, in which case the coded media bitstream is typically not stored permanently, but rather buffered for small periods of time in the content encoder 110 and/or in the sender 130 to smooth out variations in processing delay, transfer delay, and coded media bitrate. It is also possible to use different communication protocols for different parts of the coded media bitstream. For example, the parameter set NAL units may be conveyed using the Session Description Protocol (SDP), while the remaining of the data are conveyed using RTP

[0032] The sender 130 sends the coded media bitstream using a communication protocol stack. The stack may include but is not limited to Real-Time Transport Protocol (RTP), User Datagram Protocol (UDP), and Internet Protocol (IP). When the communication protocol stack is packet-oriented, the sender 130 encapsulates the coded media bitstream into packets. For example, when RTP is used, the sender 130 encapsulates the coded media bitstream into RTP packets according to an RTP payload format. Typically, each media type has a dedicated RTP payload format. It should again be noted that a system may contain more than one sender 130, but for the sake of simplicity, the following description only considers one sender 130. [0033] The sender 130 may or may not be connected to a gateway 140 through a communication network. The gateway 140 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data streams according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions. Examples of gateways 140 include multipoint conference control units (MCUs), gateways between circuit-switched and packet- switched video telephony, Push-to-talk over Cellular (PoC) servers, IP encapsulators in digital video broadcasting-handheld (DVB-H) systems, or set-top boxes that forward broadcast transmissions locally to home wireless networks. When RTP is

used, the gateway 140 is called an RTP mixer and acts as an endpoint of an RTP connection.

[0034] Alternatively, the coded media bitstream may be transferred from the sender 130 to the receiver 150 by other means, such as storing the coded media bitstream to a portable mass memory disk or device when the disk or device is connected to the sender 130 and then connecting the disk or device to the receiver 150. [0035] The system includes one or more receivers 150, typically capable of receiving, de-modulating, and de-capsulating the transmitted signal into a coded media bitstream. De-capsulating may include the removal of data that receivers are incapable of decoding or that is not desired to be decoded. The codec media bitstream is typically processed further by a decoder 160, whose output is one or more uncompressed media streams. Finally, a Tenderer 170 may reproduce the uncompressed media streams with a loudspeaker or a display, for example. The receiver 150, decoder 160, and renderer 170 may reside in the same physical device or they may be included in separate devices.

[0036] Scalability in terms of bitrate, decoding complexity, and picture size is a desirable property for heterogeneous and error prone environments. This property is desirable in order to counter limitations such as constraints on bit rate, display resolution, network throughput, and computational power in a receiving device. [0037] In the H.264/AVC and SVC standards, an elementary coded unit for transmission is referred to as a Network Abstraction Layer (NAL) unit. In packet- oriented communication systems and storage formats, NAL units are encapsulated into transport packets and storage units, respectively. In stream-oriented communications systems and storage formats, a self-contained byte stream is formed from the NAL units of the coded video sequences by preceding each NAL unit with a start code. Each NAL unit comprises a NAL unit header and NAL unit payload. The NAL unit header indicates the type of the NAL unit and other information. The NAL unit payload Includes a raw byte sequence payload (RBSP) which is modified not to contain any emulations of start codes. The structure of the RBSP is determined by the NAL unit type that contains the RBSP. An SEI RBSP contains one or more SEI messages. Each SEI message contains three fields: a SEI payload type, a SEI payload

size, and a SEI payload. The SEI payload type indicates the syntax and semantics of the SEI payload. The SEI payload size indicates the size of the SEI payload (in terms of bytes). The SEI payload contains the syntax elements for the SEI payload. [0038] Figure 4 illustrates the structure and order of elements of access units according to H.264/AVC. As can be seen in Figure 4, an access unit delimiter 400 is followed by a SEI 410, a primary coded picture 420 and a coded picture (not primary) 430, which are followed by an end of sequence 440 and end of stream 450, respectively. The block coded picture (not primary) 430 refers to any combination of redundant coded pictures, auxiliary coded pictures, or enhancement coded pictures. The structure for an SVC access unit is identical to the structure illustrated in Figure 4, except for the fact that the primary coded picture (for the base layer) may not be present.

[0039] In scalable video coding, an access unit may comprise multiple coded pictures, wherein each picture is one representation of the video sequence at a certain spatial resolution, temporal resolution, at a certain quality level or some combination of the three. An access unit according to the SVC standard comprises one primary coded picture for the base layer and may contain multiple enhancement coded pictures, but at most one enhancement coded picture that is not a redundant coded picture per each unique combination of dependencyjd, temporaMevel, and qualityjevel. In other words, a combination of dependency jd, temporaMevel and qualityjevel uniquely determines a layer in the scalable video bitstream. The unique combination of dependencyjd and qualityjevel associated with an access unit in a scalable video bitstream according to the SVC standard uniquely determines the coded pictures of the access unit. Other unique identifiers of the pictures may also be used. Such unique identifiers may be a combination of multiple values and such values may be either explicitly conveyed in the coded bitstream or implicitly determined using any other information conveyed in the bitstream. [0040] Various embodiments of the present invention involve the introduction of a signal in the form of a new SEI message, referred to herein as a scalable nesting SEI message. It should be understood, however, that the present invention is not intended to be limited to the term used to describe the nesting SEI message, and that a different

term may also be used. A representation of a scalable nesting SEI message is depicted in Figure 5. In one embodiment, the scalable nesting SEI message contains a nested SEI message of any type, including a further nested SEI message. In another embodiment, it may be restricted that the nested SEI message is an SEI message of any type other than another scalable nesting SEI message. The size of the contained message matches with the indicated payload size of the scalable nesting SEI message (taking into account the "header" of the scalable nesting SEI payload). The scalable nesting SEI message may include an indication, e.g., a flag such as pictures__in au_flag, which indicates whether the nested SEI message applies to all coded pictures of an access unit at issue. If the message applies only to some of coded pictures of the access unit, then the scalable nesting SEI message contains the indication of the pictures within the access unit with which the nested SEI message is associated, e.g., values of dependency_id and quality_level of the pictures. Other unique identifiers of the pictures may also be used. Such unique identifiers may be a combination of multiple values and such values may be either explicitly conveyed in the coded bitstream or implicitly determined using any other information conveyed in the bitstream. This information, which is encoded by the encoder 110 of Figure 1, is decoded by the decoder 160 and used to assist in processes related to decoding, display or other purposes, depending on the semantics of the nested SEI message. [0041] The relevant syntax structures of the present invention, according to one particular embodiment, is as follows.

sei payload( payloadType, payloadSize ) { Descriptor if( payloadType = = 0 ) bufferingjperiod( payloadSize )

j pic timingj j ?aγ_lc)adSize^)

else if( payloadType = = SCALABLE_NESTING )

Il SCALABLE NESTING is the next unallocated constant scalable nesting( payloadSize ) l__

[0042J In addition to the above, the scalable nesting SEl message may also contain a redundant_pic_cnt if it is desired to associate SEI messages to redundant pictures and an indication for auxiliary picture.

[0043] In another embodiment of the invention, the nesting SEI message may contain multiple nested SEI messages, each having the same scope, i.e., all of the multiple nested SEI messages are associated with the same pictures. In this embodiment, the syntax of the scalable ^ nesting syntax structure is specified in the following table. more_sei_message_data( ) is a function returning TRUE if there is more data to be parsed in the SEI message that a call to the function (i.e. the scalable nesting SEI message in this case). Otherwise, more_sei_message_data( ) returns FALSE.

[0044] In addition to the scalable nesting SEI message, the semantics of each SEI message of H.264/AVC maybe redefined to include semantics of the message when used within the scalable nesting SEI message. The semantics of some messages, such as the pan-scan rectangle SEI message, can be generalized to be associated with certain enhancement coded pictures too. In general, the semantics of any H.264/ A VC SEI message may be re-defined to address all of the pictures in the scope identified by the scalable nesting SEI message. This is in contrast to the semantics specified by H.264/AVC, which typically address only the primary coded pictures for which the message pertains. For example, when a scalable nesting SEI message contains an SEl message, such as the motion-constrained slice-group set SEI message, that pertains to

all access units in a coded video sequence but only to certain pictures of these access units having the same unique indication, e.g., dependencyjd and qualityjevel (for example, it would typically be contained in the first access unit which contains the pictures it applies). If it is desirable that a particular SEI message pertains to pictures having a certain value of temporaMevel, the SEI message would typically be contained in the first access unit to which the message is applied and contains pictures having the desired temporaMevel. According to the SVC specification, the primary coded picture and the enhancement coded pictures in an access unit have the same value of temporaMevel. If it is desirable that a particular SEI message pertains to pictures having a certain range of temporaMevel values, the SEI message syntax may contain a mechanism to indicate the range of temporaMevel values, or the range can be derived from the temporaMevel of the pictures contained in the same access unit as the SEI message.

[0045] In one embodiment of the invention, the semantics of a particular SEI message of H.264/AVC, namely the spare picture SEl message, may be defined as follows when used within the scalable nesting SEI message: When the spare picture SEI message is included in a scalable nesting SEI message, the message is applicable when the target picture and the spare picture defined below are decoded from the NAL units with nal_unit_type in the range of 1 to 5, inclusive, and the NAL units with nal unit_type equal to 20 or 21 and for which the dependency__id is among deρendency_id[ i ] and quality_level is among quality_level[ i ]. This SEI message indicates that certain slice group map units, referred to as spare slice group map units, in one or more decoded reference pictures resemble the co-located slice group map units in a specified decoded picture called the target picture. A spare slice group map unit may be used to replace a co-located, incorrectly decoded slice group map unit in the target picture. A decoded picture containing spare slice group map units is referred to as a spare picture.

[0046] Relevant syntax structures according to another embodiment of the invention involving MVC are as follows:

[0047] In this syntax, view_id[i] indicates the identifier of the view to which the i-th picture belongs. Other syntax elements have the same semantics as in the embodiments discussed previously. Other unique identifiers of the pictures belonging to a view may also be used.

[0048] Relevant syntax structures according to yet another embodiment of the invention involving MDC are as follows:

[0049] In the above syntax, description_id[i] indicates the identifier of the MDC description to which the i-th picture belongs. Other syntax elements have the same semantics as in the first embodiment described previously. Other unique identifiers of the pictures belonging to a MDC description may also be used. |0050] Relevant syntax structures according to a further embodiment of the invention involving a combination of SVC and MVC are as follows:

[0051] In the above syntax, view_id[i] indicates the identifier of the view to which the i-th picture belongs. Other syntax elements have the same semantics as in the first embodiment described previously. Other unique identifiers of the pictures belonging to a scalable layer and view may also be used.

[0052] Relevant syntax structures according to yet another embodiment of the invention involving a combination of SVC, MVC and MDC are as follows:

[0053 J In the above syntax, view_id[i] indicates the identifier of the view to which the i-th picture belongs. description_id[i] indicates the identifier of the MDC description to which the i-th picture belongs. Other syntax elements have the same

semantics as in the first embodiment described previously. Other unique identifiers of the pictures belonging to a scalable layer, view and MDC description may also be used.

[0054] Figures 2 and 3 show one representative mobile telephone 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of mobile telephone 12 or other electronic device. Some or all of the features depicted in Figures 2 and 3 could be incorporated into any or all of the devices represented in Figure 1.

[0055] The mobile telephone 12 of Figures 2 and 3 includes a housing 30, a display 32 in the form of a liquid crystal display, a keypad 34, a microphone 36, an ear-piece 38, a battery 40, an infrared port 42, an antenna 44, a smart card 46 in the form of a UICC according to one embodiment of the invention, a card reader 48, radio interface circuitry 52, codec circuitry 54, a controller 56 and a memory 58. Individual circuits and elements are all of a type well known in the art, for example in the Nokia range of mobile telephones.

[0056] Communication devices of the present invention may communicate using various transmission technologies including, but not limited to, Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Transmission Control Protocol/Internet Protocol (TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc. A communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.

[0057] The present invention is described in the general context of method steps, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement

particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps. [0058] Software and web implementations of the present invention could be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps. It should also be noted that the words "component" and "module," as used herein and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs. [0059] The foregoing description of embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the present invention. The embodiments were chosen and described in order to explain the principles of the present invention and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated.