Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CONVEYING AUXILIARY INFORMATION IN A MULTIPLEXED STREAM
Document Type and Number:
WIPO Patent Application WO/2009/027923
Kind Code:
A1
Abstract:
A system (650) receives a multiplexed stream. The system comprises a demultiplexer (654) for extracting at least a video elementary stream and an audio elementary streamfrom the multiplexed stream. The system further comprises a decoder (656) for extracting auxiliary information from the audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of the video elementary 5 stream. The auxiliary information comprises depthinformation relating to at least one video frame encoded in the video elementary stream.

Inventors:
NEWTON PHILIP S (NL)
BRULS WILHELMUS H A (NL)
Application Number:
PCT/IB2008/053401
Publication Date:
March 05, 2009
Filing Date:
August 25, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KONINKL PHILIPS ELECTRONICS NV (NL)
NEWTON PHILIP S (NL)
BRULS WILHELMUS H A (NL)
International Classes:
H04N13/00
Foreign References:
AU738692B22001-09-27
EP1519582A12005-03-30
Other References:
AVARO O ET AL: "The MPEG-4 systems and description languages: A way ahead in audio visual information representation", SIGNAL PROCESSING. IMAGE COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 9, no. 4, 1 May 1997 (1997-05-01), pages 385 - 431, XP004075337, ISSN: 0923-5965
Attorney, Agent or Firm:
UITTENBOGAARD, Frank et al. (AE Eindhoven, NL)
Download PDF:
Claims:

CLAIMS:

1. A system (650) for handling a multiplexed stream, the system comprising: an input (652) for receiving the multiplexed stream; a demultiplexer (654) for extracting at least a video elementary stream and an audio elementary stream from the multiplexed stream; and a decoder (656) for extracting auxiliary information from the audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of the video elementary stream.

2. The system according to claim 1, wherein the auxiliary information comprises depth information relating to at least one video frame encoded in the video elementary stream.

3. The system according to claim 1, further comprising: means (658) for removing the auxiliary information from the audio elementary stream to obtain a second audio elementary stream free from auxiliary information; and an output for providing the second audio elementary stream.

4. The system according to claim 2, wherein the depth information comprises at least part of a depth elementary stream comprising depth and/or disparity values complementing the video elementary stream.

5. The system according to claim 1, wherein the audio elementary stream comprises at least 7 audio channels corresponding to at least 6 surround channels and at least one bass channel; and the decoder is arranged for extracting the auxiliary information from a number of the surround channels, wherein the number is at most equal to the number of surround channels minus five.

6. The system according to claim 1, wherein the input comprises a standardized digital audio/video interface.

7. The system according to claim 1, wherein the input comprises a reader (660) for reading a media carrier to supply the multiplexed stream.

8. The system according to claim 1, further comprising means for establishing whether the auxiliary information has been included in the audio elementary stream and activating the decoder upon detection of the auxiliary information.

9. A system (600) for providing a multiplexed stream, the system comprising: an encoder (602) for encoding auxiliary information together with audio data into an audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of a video elementary stream; a multiplexer (604) for combining at least the video elementary stream and the audio elementary stream into a multiplexed stream; and an output (606) for providing the multiplexed stream.

10. The system according to claim 0, further comprising a reader (608) for obtaining the video elementary stream, the auxiliary information, and the audio data from a content data carrier, wherein the auxiliary information has been encoded on the content data carrier in a format different from an elementary audio stream.

11. A content data carrier (660) comprising a multiplexed stream comprising at least a video elementary stream and an audio elementary stream, wherein the audio elementary stream comprises auxiliary information, and wherein the auxiliary information comprises information for enhancing a visual experience of a rendering of the video elementary stream.

12. The content data carrier according to claim 0, wherein the auxiliary information is stored in a first audio data stream and audio information is stored in a second audio data stream, and further comprising: a playlist comprising an instruction to mix the first audio data stream and the second audio data stream into a single audio elementary stream.

13. A signal (640) representing a multiplexed stream comprising at least a video elementary stream and an audio elementary stream, wherein the audio elementary stream comprises auxiliary information, and wherein the auxiliary information comprises information for enhancing a visual experience of a rendering of the video elementary stream.

14. A method of handling a multiplexed stream (640), the method comprising: receiving the multiplexed stream; extracting at least a video elementary stream and an audio elementary stream from the multiplexed stream; and extracting auxiliary information from the audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of the video elementary stream.

15. A method of providing a multiplexed stream (640), the method comprising: encoding auxiliary information together with audio data into an audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of a video elementary stream; combining at least the video elementary stream and the audio elementary stream into a multiplexed stream; and providing the multiplexed stream.

16. A computer program product comprising instructions for causing a processor to perform the method according to claim 14 or 15.

Description:

Conveying auxiliary information in a multiplexed stream

FIELD OF THE INVENTION

The invention relates to handling a multiplexed stream. More particularly, the invention relates to conveying auxiliary information in a multiplexed stream. The invention also relates to conveying depth information.

BACKGROUND OF THE INVENTION

3D video is facilitated by recent advancements in auto-stereoscopic displays and improvement of existing techniques, such as beamers and monitors with high refresh rates. Such displays are capable of rendering a video scene with a depth impression. By ensuring that the two eyes of the viewer perceive two slightly different images, some rendered objects in the image are perceived as closer to the viewer and some rendered objects are perceived as being further away from the viewer. This enhances the viewing experience.

In the design of 3D video systems, one aspect that needs to be addressed is how to convey the depth information, which is needed to create the depth impression, to the display. Such depth information may have the form of additional information which supplements a conventional 2D video scene. This way, conventional 2D displays may use the video data by simply rendering the 2D video scene, whereas 3D displays may process the 2D video scene based on the depth information to realize the depth impression.

However, many devices operate according to standards defining how to handle 2D video. Consequently, not all existing hardware and software are well prepared for handling the depth information.

SUMMARY OF THE INVENTION

It would be advantageous to have an improved way of conveying auxiliary information in a multiplexed stream. To better address this concern, in a first aspect of the invention a system is presented that comprises: an input for receiving the multiplexed stream; a demultiplexer for extracting at least a video elementary stream and an audio elementary stream from the multiplexed stream; and

a decoder for extracting auxiliary information from the audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of the video elementary stream.

Such a system may receive the multiplexed signal with auxiliary information from an interface, a device, or a program that is capable of handling video elementary streams and audio elementary streams, even if the interface, device, or program does not specifically support the handling of that particular kind of auxiliary information. The bandwidth reserved for the video is not used, which allows the video frames to have full resolution. In an embodiment the auxiliary information comprises depth information. The depth information may relate to at least one video frame encoded in the video elementary stream. The depth information may comprise occlusion information and/or disparity information.

An embodiment comprises: means for removing the auxiliary information from the audio elementary stream to obtain a second audio elementary stream free from auxiliary information; and an output for providing the second audio elementary stream.

This allows the system to forward the pure audio information to a device that handles audio data, for example an audio renderer such as an amplifier or a surround system. This allows the multiplexed stream including auxiliary information to be used in conjunction with an amplifier or surround system that is not adapted to handle an audio stream that contains that auxiliary information.

In an embodiment, the depth information comprises at least part of a depth elementary stream comprising depth and/or disparity values complementing the video elementary stream.

In an embodiment, the audio elementary stream comprises at least 7 audio channels corresponding to at least 6 surround channels and at least one bass channel; and the decoder is arranged for extracting the auxiliary information from a number of the surround channels, wherein the number is at most equal to the number of surround channels minus five.

Advantage may be taken of the fact that some media standards can accommodate more than five, for example seven surround audio channels (and often one bass audio channel), whereas most households do not use more than five surround speakers (and

the bass audio speaker). In such a situation, at least one, for example two, audio channels are unused. This embodiment uses the unused capacity to convey auxiliary information without any compromise to the video and audio rendering.

In an embodiment, the input comprises a standardized digital audio/video interface. Some standardized digital audio/video interfaces, for example HDMI or DVI, provide a standardized way to exchange 2D video and audio data between devices, but do not provide a standardized way to exchange some particular types of auxiliary information. This embodiment provides one way to transport the auxiliary information via the audio/video interface without making any modifications to the audio/video interface or the corresponding standard.

In an embodiment, the input comprises a reader for reading a media carrier to supply the multiplexed stream. This embodiment allows to store the auxiliary information as at least part of an audio track on a media carrier, for example a Blu-Ray disc or a HD-DVD.

An embodiment comprises means for establishing whether the auxiliary information has been included in the audio elementary stream and activating the decoder upon detection of the auxiliary information. This allows for flexible handling of multiplexed streams with and without auxiliary information in an audio elementary stream.

An embodiment comprises a system for providing a multiplexed stream, the system comprising: an encoder for encoding auxiliary information together with audio data into an audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of a video elementary stream; a multiplexer for combining at least the video elementary stream and the audio elementary stream into a multiplexed stream; and an output for providing the multiplexed stream.

This allows to prepare transmission or storage of a combined audio/video/auxiliary stream, so that it may be used or conveyed by devices, interfaces, or programs that do not need to be adapted to accommodate the auxiliary information.

An embodiment comprises a reader for obtaining the video elementary stream, the auxiliary information, and the audio data from a content data carrier, wherein the auxiliary information has been encoded on the content data carrier in a format different from an elementary audio stream. A reader and a television may be prepared for 3D, but the interface connecting the two may not be prepared. This embodiment allows the reader to read

the auxiliary information from a media carrier and encode it in an audio channel for transmission via the unprepared interface.

An embodiment comprises a content data carrier comprising a multiplexed stream comprising at least a video elementary stream and an audio elementary stream, wherein the audio elementary stream comprises auxiliary information, and wherein the auxiliary information comprises information for enhancing a visual experience of a rendering of the video elementary stream.

This allows the content data carrier to be used on a reader that is not prepared for the auxiliary information. The reader simply may forward the audio/video stream to a display, where the display may extract the auxiliary information from the audio stream and render the video, using the auxiliary information, and the audio.

In an embodiment, the auxiliary information is stored in a first audio data stream and audio information is stored in a second audio data stream, and a playlist is provided comprising an instruction to mix the first audio data stream and the second audio data stream into a single audio elementary stream.

By storing the auxiliary information and audio information in separate streams, different playlists may be defined for renderings with and without enhancement by means of the auxiliary information. This embodiment refers to a playlist that mixes the audio information with the depth information into a single audio output which may be rendered as 3D depth impression. The playlist may be associated with a menu item that may be selected by the user. Another menu item may be provided for rendering of only the audio information without including the auxiliary information in the audio elementary stream.

An embodiment comprises a signal representing a multiplexed stream comprising at least a video elementary stream and an audio elementary stream, wherein the audio elementary stream comprises auxiliary information, and wherein the auxiliary information comprises information for enhancing a visual experience of a rendering of the video elementary stream. The signal may be transmitted for example via DVB broadcasts or via the Internet.

An embodiment comprises a method of handling a multiplexed stream, the method comprising: receiving the multiplexed stream; extracting at least a video elementary stream and an audio elementary stream from the multiplexed stream; and

extracting auxiliary information from the audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of the video elementary stream.

An embodiment comprises a method of providing a multiplexed stream, the method comprising: encoding auxiliary information together with audio data into an audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience of a video elementary stream; combining at least the video elementary stream and the audio elementary stream into a multiplexed stream; and providing the multiplexed stream.

An embodiment comprises a computer program product comprising instructions for causing a processor to perform any of the methods set forth.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of the invention will be further elucidated and described with reference to the drawing, in which

Fig. 1 shows an example image with a depth map;

Fig. 2 shows an example image with a depth map and occlusion information; Fig. 3 illustrates an embodiment;

Fig. 4 illustrates a surround system;

Fig. 5 illustrates an embodiment; and

Fig. 6 illustrates an embodiment.

DETAILED DESCRIPTION OF EMBODIMENTS

3D Video is experiencing a revival with the introduction of new autostereoscopic displays and improvement of existing techniques, such as beamers and monitors with high refresh rates. Introducing 3D video and 3D graphics requires changes to MPEG standards for the video. Work is ongoing in MPEG to standardize various 3D video formats, one of these is based on 2D plus depth. This approach is based on 3D display technology, which can calculate multiple views using a 2D image and an additional image, called a depth map. This depth map conveys information about the depth of objects in the 2D image. The grey scale values in the depth map indicate the depth of the associated pixel in the 2D image. A stereo display can calculate the additional view required for stereo by using the

depth value from the depth map and by calculating the required pixel transformation. Methods to do this are known in the art, see for example "An image-based approach to three dimensional computer graphics", by L. McMillan,, Ph.D. thesis, UNC Computer Science, TR97-013, 1997. Figure 1 shows an example of a 2D image and an associated depth map. A problem that occurs is, that when generating multiple views using 2D + depth as input, some objects in the video may become visible in a certain view which in the 2D that are occluded by other objects in the transmitted 2D image. This is referred to as the de-occlusion problem. Currently this is solved be using hole-filling techniques to cover up the missing parts. However with objects that have a high level of depth this causes visible artifacts. A solution is to send additional data to the display, which can be used to fill in the de-occluded areas.

Current autostereoscopic displays sacrifice resolution to render multiple views on the screen. In the Philips 9 view lenticular display the resolution of the stereo image is 960x540. This reduction in resolution makes it possible to send all the data required in a 1920x1080 frame.

Figure 2 shows a 1920x1080 frame with four 960x540 quadrants. The top left quadrant carries 2D image data. The top right quadrant carries the depth information. The bottom left and bottom right quadrants carry occlusion data. In this text, "depth information" shall refer to any information that is required or that may help to turn a 2D image into a 3D image. Accordingly, depth information may or may not include occlusion information.

Introducing stereo video based on 2D +depth +occlusion data has the problem that adding the additional data for stereo video increases the required bandwidth significantly. Not only the reading bandwidth from a disc, but also the decoding bandwidth and the amount of data that has to be sent over the interface from the player to a display. Therefore a solution has been sought to reduce the size of the additional data to be sent. One solution found is to reduce the resolution of the depth map. This reduction in resolution may be as large as 1/8 of the 2D content (1/8 of the vertical and horizontal resolution). For a HD resolution 2D video (1920x1080) this means that the depth map may be as small as 240x144. Current 3D displays may sacrifice resolution (pixels) to render multiple views to a user. These displays take as input format for example a frame of 1920x1080 whereby the 2D video takes up a size of 960x540. Based on the reduction in resolution to 1/8 the 2D screen size, the size of the depth map may be a little as 120x67,5 whilst still maintaining the same quality of the 3D video perception to the user.

A problem is how to carry the additional depth information using existing movie publishing formats such as DVD, Blu-ray disc and HD-DVD. Because of the low bit rate of the depth map and because of the fact that the depth map uses only a limited set of depth values, it is possible to code the depth map information as an audio stream, even in the case of high-resolution video such as HDTV. This "audio" stream could be sent to a 3D display together with the "real" audio stream, where the "real" audio stream comprises the audio waveforms to be rendered through the speakers. The 3D display can then decode the audio stream containing the depth map separately and use it together with the 2D video stream to generate the 3D impression. This would allow 3D video using existing standards such as Blu-ray disc and HD-DVD, as well as existing interfaces such as HDMI.

Figure 3 illustrates an audio mixing model supported by Blu-ray disc. A similar audio mixing model is supported by HD-DVD. This model allows a secondary audio stream 322 and/or an interactive audio stream 323 to be mixed with a primary stream 321. The secondary stream 322 may consist of a lower bit rate audio stream that is synchronized and mixed with the primary audio at mixer 312. It may consist for example of a DTS-HD or DOLBY DIGITAL (Plus) stream. The interactive audio stream 323 may consist of LPCM audio which is activated by an application on disc and mixed at mixer 314 with the primary stream 321 after the primary stream 321 and secondary stream 322 have been mixed. It is typically used for dynamic sounds associated with events related to the application. Bl, B2, and B3 represent audio buffers for buffering the received streams 321,322,323 to ensure a continuous availability of data to be processed. The volume of primary stream 321 and secondary stream 322 may be adapted with gains Dl and D2, respectively. Gain D2 sends secondary audio mixing metadata 302 to a switch that has positions metadata on and metadata off. In case of metadata on, the secondary audio mixing metadata is forwarded to converter Xl which takes care of conversion to Xl mix matrix in dependence on BD-J gain control 306. BD-J stands for Blu-ray disc Java, which is the interactive portion of the Blu-ray disc specification. A BD-J application may control the value of the gain and the value of the panning using the appropriate BD-J controls. In case of metadata off, BD-J pan control signal 304 is forwarded to the converter Xl. The switch is controlled by BD-J Metadata API 318. The converter Xl provides the Xl mix matrix to mixer 312. Converter X2 provides an X2 mix matrix to mixer 314 in dependence on BD-J pan control 308 and BD-J gain control 310. Mixer 314 mixes the interactive audio 323 with the output of mixer 312 in dependence on the BD-J pan control 308 and BD-J gain control 310.

The interactive audio channel 323 and the secondary stream 322 could be used to mix-in an audio stream that carries the depth map. By adjusting the mixing parameters of mixer 312 and mixer 314 the audio containing the depth map can be mixed such that it ends up in audio channels of output 316. Preferably the depth map is mixed into one or more channels which are not being used in 3D mode or are empty. These empty or unused channels are mixed such that they contain the depth map "audio" streams and are sent to the 3D display through for example the HDMI interface. These mixing parameters can be set by a Java application or can be located in the secondary audio stream as metadata. The switch selecting between the two is operated by the BD-J Metadata API 318. For example the output audio elementary stream 316 may contain up to 7.1 channels: 7 surround channels and a bass channel. Figure 4 shows an example of such a channel setup with a central speaker C behind a display, and speakers on the left L, right R, left back Lb, right back Rb, left side Ls, right side Rs, and bass speaker (not shown). In many audio systems, the left back Lb and right back Rb speakers are missing. The Lb and Rb channels in this channel setup may be replaced by 2 channels, which carry the depth map information audio stream. This replacement can be done by mixing with a secondary audio stream 322 which carries the depth map information audio stream. The mixing metadata 302 for the secondary audio stream 322 instructs the mixer 312 via converter Xl to mix the secondary audio stream 322 with the primary audio stream 321 as indicated in Figure 3. The mixing metadata 302 prescribes that the gain for the Lb and Rb channels from the primary audio stream are set to mute and the gain for those same channels in the secondary stream are set to 1. This basically replaces the Lb and Rb channels in the primary audio stream with those from the secondary audio stream. The whole 7.1 audio stream in this example is either re-encoded (which may cause some distortion in the depth map stream) and sent over the HDMI interface, or is left as LPCM and sent out, as is shown in Figure 5. The 3D TV extracts the depth map "audio" channels and forwards the other channels to an audio receiver. Figure 5 shows that the Blu-ray disc player BD sends the mixed 7.1 audio stream 502 to a 3D TV. The 3D TV extracts the depth information from the Lb and Rb channels and forwards the remaining 5.1 audio stream 504 to a surround system 506.

Other ways of mixing the depth map information audio stream with the primary audio stream are also possible. For example in the case of a 7.1 Dolby digital plus stream the depth map information could be embedded in the extension packets of the stream. Dolby digital plus uses this extension mechanism to provide a mechanism that is compatible

with 5.1 decoders. The bitstream contains the original 5.1 core packets with extensions packets that provide the additional channels. This way the 7.1 Dolby digital plus stream may be sent directly to a 5.1 surround system without passing the 3D TV, because the 5.1 surround system does not process the extension packets. This is an advantage because usually the disc player is connected directly to the surround system.

Other channels can be used instead of the Lb and Rb channels. Also, the interactive audio channel 323 may be used to carry the depth information instead of the secondary channel 322. Also audio matrixing may be used as is defined in Dolby Pro-Logic. However, this might degrade the audio quality to some extend. When playing in normal 2D mode, no audio channels need be sacrificed. As the full 7.1 stream can be used as-is without mixing in the depth map streams. For 3D mode the content author can include the secondary audio stream on the disc and reference it through a playlist (playlists are defined in the Blu-ray disc specification. A similar concept is defined in the HD-DVD specification) which indicates that this secondary stream should be presented together with the primary stream. This playlist can be selected by the user as an additional title in the menu on the disc indicating that this will play the movie in 3D mode. As explained earlier this then mixes the secondary and primary stream such that two channels of the 7.1 stream are replaced by the depth map "audio" stream. A disadvantage is that when in 3D mode some audio channels are lost. I.e. instead of 7.1 only 5.1 is available. It is also possible to use only a portion of the bandwidth of one or more audio channels for depth information, so that all audio channels are available to the audio system, albeit possibly at a reduced quality. In another configuration, separate elementary audio streams are mixed for the "3D TV" output of the player and for the "surround system" output of the player. The depth information is only mixed in for the stream supplied to the "3D TV" output of the player. The full, original audio stream is provided to the "surround system" output of the player.

The mixing of the audio stream with the depth map "audio" stream is done at LPCM level and can be re-encoded before being sent out. LPCM samples may have a sample frequency up to 192 kHz (48, 96 and 192) and may have up to 24 bits per sample. A depth map of 120x67,5 requires a bit rate of 1.6 Mb/s. So using 16 bit samples and a sample frequency of 96kHz would almost be sufficient (1.5Mb/s). With a slight reduction in the number of bits used to represent the depth values this would then suffice. Of course multiple variations are possible. A higher sample frequency (192kHz) could be used and/or a larger sample size (up to 24 bits). Alternatively multiple audio channels could be used. For example

half of the size of the depth map could be carried in a left and the other half in a right channel.

Converting the depth map into an audio stream can be done bit-by-bit and row-by-row. A new frame may be indicated by a marker, for example one or more zero bytes. Also the most significant bit is alternated such that the resulting signals are more like a genuine audio signal. This helps to prevent damage to audio receivers when the signal inadvertently ends up in such an audio receiver. For example by making the wrong settings or cable connections.

Using the embodiments described, 3D video may be provided using existing formats such as Blu-ray disc and HD-DVD. Preferably it is taken care of that the manipulated audio is sent out through the HDMI interface to the display and not to an external audio decoder as in such a case certain channels used to carry the depth information will contain noise. Preferably a separate audio output is provided with the "clean" non-manipulated audio. An alternative solution is to provide a digital audio out on the 3D display and notify the user that an external audio decoder such as a receiver should be connected to this output rather than directly to the output on the Blu-ray disc player. The 3D display filters out the depth information audio streams and forwards the remaining channels to the external decoder. In yet another embodiment, the audio decoder is arranged to detect the presence of depth information and subsequently ignore it. The concepts disclosed herein may be extended to other types of information than depth information. For example, any auxiliary information may be included in an audio stream instead of depth information. Such information might relate to immersive experience control data, for example control commands to control light sources in the room may be provided in the auxiliary information. Such light sources may be able to produce light of which the color and/or intensity may be controlled. For example, ambilight is provided by light sources on the side of the display. Including such control commands in the audio stream instead of or in addition to the depth information provides similar advantages as including the depth information in the audio stream.

Figure 6 illustrates an embodiment of the invention. Data, for example available at a broadcast company or a content provider, is provided to a first system 600 that creates a signal 640 representing a multiplexed stream. The multiplexed stream comprises at least a video elementary stream and an audio elementary stream, wherein the audio elementary stream comprises auxiliary information. The auxiliary information comprises information for enhancing a visual experience of a rendering of the video elementary stream.

For example, the auxiliary information comprises depth information. The depth information comprises depth maps and/or disparity maps and/or occlusion information. The depth information may be encoded as a depth elementary stream.

The first system 600 transforms the original data into the multiplexed stream 640. The first system 600 comprises an encoder 602 for encoding auxiliary information together with audio data into an audio elementary stream. For example, one or more of the audio channels are filled with the auxiliary information. The first system 600 comprises a multiplexer 604 for combining the video elementary stream and the audio elementary stream into multiplexed stream 640. Output 606 provides the multiplexed stream for transportation to a receiver, for example via a media carrier or via a broadcast or video-on-demand transmission (schematically indicated by dashed arrow 640).

In an embodiment which is suitable for implementation in a video player, the system 600 also may comprise a reader 608 for obtaining the video elementary stream, the auxiliary information, and the audio data from a content data carrier such as a DVD, Blu-ray disc, or HD-DVD, wherein the auxiliary information has been encoded on the content data carrier in a format different from an elementary audio stream. One advantage of this is that a standard interface may be used between the player and the television.

The second system 650 transforms the multiplexed stream 640 into genuine video, auxiliary, and/or audio signals. It may be part of a 3D television, or for example be implemented as a separate settop box receiving an input from a disc player and providing outputs to a television and an audio system. The second system comprises an input 652 for receiving the multiplexed stream. For example it receives the stream from another device, from a broadcast transmission, or from reader 660. The second system comprises a demultiplexer 654 for extracting the video elementary stream and the audio elementary stream from the multiplexed stream. The second system 650 comprises a decoder 656 for extracting the auxiliary information from the audio elementary stream.

The second system 650 comprises a means 658 for removing the auxiliary information from the audio elementary stream to obtain a second audio elementary stream free from auxiliary information. This second audio elementary stream is provided via an output.

The auxiliary information may comprise a depth elementary stream comprising depth and/or disparity values complementing the video elementary stream.

In an embodiment, the audio elementary stream comprises at least 7 audio channels corresponding to at least 6 (usually 7) surround channels and at least one bass

channel. Up to six channels are used for surround sound (i.e., five surround channels and a bass channel). The decoder 656 is arranged for extracting the auxiliary information from the remaining channels, i.e., from a number of the surround channels, wherein the number is at most equal to the number of surround channels minus five. In an embodiment, the input 652 comprises a standardized digital audio/video interface. In another embodiment, the input 652 comprises a reader 660 for reading a media carrier. A content data carrier may be inserted in the reader 660. The content data carrier comprises a multiplexed stream comprising at least a video elementary stream and an audio elementary stream, wherein the audio elementary stream comprises auxiliary information, and wherein the auxiliary information comprises information for enhancing a visual experience of a rendering of the video elementary stream. The auxiliary information may be stored in a first audio data stream and audio information may be stored in a second audio data stream, and a playlist may be provided comprising an instruction to mix the first audio data stream and the second audio data stream into a single audio elementary stream. An embodiment comprises means for establishing whether the auxiliary information has been included in the audio elementary stream and enabling the decoder 656 upon detection of the auxiliary information.

According to a method of handling a multiplexed stream 640, the following steps are performed: receiving the multiplexed stream; extracting at least a video elementary stream and an audio elementary stream from the multiplexed stream; and extracting auxiliary information from the audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience.

According to a method of providing a multiplexed stream 640, the following steps are performed: encoding auxiliary information together with audio data into an audio elementary stream, wherein the auxiliary information comprises information for enhancing a visual experience; combining at least a video elementary stream and the audio elementary stream into a multiplexed stream; and providing the multiplexed stream.

It will be appreciated that the invention also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as partially compiled form, or in any other form suitable for use in the implementation of the method according to the invention. It will also be appreciated that such a program may have many different architectural designs. For example, a program code implementing the functionality of the method or system according to the invention may

be subdivided into one or more subroutines. Many different ways to distribute the functionality among these subroutines will be apparent to the skilled person. The subroutines may be stored together in one executable file to form a self-contained program. Such an executable file may comprise computer executable instructions, for example processor instructions and/or interpreter instructions (e.g. Java interpreter instructions). Alternatively, one or more or all of the subroutines may be stored in at least one external library file and linked with a main program either statically or dynamically, e.g. at run-time. The main program contains at least one call to at least one of the subroutines. Also, the subroutines may comprise function calls to each other. An embodiment relating to a computer program product comprises computer executable instructions corresponding to each of the processing steps of at least one of the methods set forth. These instructions may be subdivided into subroutines and/or be stored in one or more files that may be linked statically or dynamically. Another embodiment relating to a computer program product comprises computer executable instructions corresponding to each of the means of at least one of the systems and/or products set forth. These instructions may be subdivided into subroutines and/or be stored in one or more files that may be linked statically or dynamically.

The carrier of a computer program may be any entity or device capable of carrying the program. For example, the carrier may include a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc or hard disk. Further the carrier may be a transmissible carrier such as an electrical or optical signal, which may be conveyed via electrical or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant method.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb "comprise" and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these

means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.