AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR VOLUMETRIC VIDEO

Title:

AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR VOLUMETRIC VIDEO

Document Type and Number:

WIPO Patent Application WO/2024/079383

Kind Code:

Abstract:

A method comprising: obtaining a binary image comprising a grid of binary data blocks (400); encoding pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function (402); determining a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state (404); adjusting said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image (406); and signalling said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation (408).

Inventors:

SCHWARZ SEBASTIAN (DE)
KONDRAD LUKASZ (DE)
RONDAO ALFACE PATRICE (BE)
ILOLA LAURI ALEKSI (FI)

Application Number:

PCT/FI2023/050549

Publication Date:

April 18, 2024

Filing Date:

September 28, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NOKIA TECH OY (FI)

International Classes:

H04N1/413; H04N13/161; H04N19/147; H04N19/192; H04N19/597

Attorney, Agent or Firm:

NOKIA TECHNOLOGIES OY et al. (FI)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. An apparatus comprising: means for obtaining a binary image comprising a grid of binary data blocks; means for encoding pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; means for determining a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state; means for adjusting said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image; and means for signalling said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

2. The apparatus according to claim 1, wherein said rate distortion cost function is adjusted as

MR, for D_R < 1 oo, for D_R > 1 where DR denotes the number of incorrectly reconstructed binary pixel for a given threshold.

3. The apparatus according to claim 1 or 2, comprising means for providing a plurality of threshold values to be used for adjusting said rate distortion cost function; and means for selecting one of said plurality of threshold values resulting in correct reconstruction of the decoded pixel values of said continuous value range to be sent to the receiver.

4. The apparatus according to claim 3, comprising means for determining, for each of the plurality of threshold values, a condition whether a minimum decoded pixel value reconstructed to the first binary state is smaller than a maximum decoded pixel value reconstructed to the second binary state; and means for discarding said threshold value upon meeting the condition.

5. The apparatus according to any preceding claim, comprising means for signalling said threshold value dynamically per frame or subpicture or tile.

6. The apparatus according to any of claims 1 - 4, comprising means for signalling said threshold value on a unit level.

7. The apparatus according to any preceding claim, wherein said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) occupancy information syntax structure.

8. The apparatus according to any preceding claim, wherein said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) atlas frame parameter set RBSP syntax structure.

9. The apparatus according to any preceding claim, wherein said signalling of the threshold value is carried out as included in a Supplemental Enhancement Information (SEI) message.

10. A method comprising: obtaining a binary image comprising a grid of binary data blocks; encoding pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; determining a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state; adjusting said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image; and signalling said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

11. An apparatus comprising: means for obtaining a binary image comprising pixels encoded with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; means for obtaining a threshold value for decoded pixel values of said continuous value range of the binary image; and means for adjusting said rate distortion cost function based on said threshold values such that the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state.

12. The apparatus according to claim 11, comprising means for receiving said threshold value in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

13. The apparatus according to claim 12, wherein signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric videobased coding (V3C) occupancy information syntax structure.

14. The apparatus according to claim 12, wherein signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric videobased coding (V3C) atlas frame parameter set RBSP syntax structure.

15. A method comprising: obtaining a binary image comprising pixels encoded with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; obtaining a threshold value for decoded pixel values of said continuous value range of the binary image; and adjusting said rate distortion cost function based on said threshold values such that the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state.

Description:

AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR

VOEUMETRIC VIDEO

TECHNICAE FIEED

[0001 ] The present invention relates to an apparatus, a method and a computer program for volumetric video coding.

BACKGROUND

[0002] Visual volumetric video-based coding (V3C; defined in ISO/IEC DIS 23090-5) provides a generic syntax and mechanism for volumetric video coding. The generic syntax can be used by applications targeting volumetric content, such as point clouds, immersive video with depth, and mesh representations of volumetric frames. The purpose of the specification is to define how to decode and interpret the associated data (atlas data in ISO/IEC 23090-5) which tells a Tenderer how to interpret two-dimensional (2D) frames for reconstructing volumetric frames.

[0003] An occupancy component is used to inform a V3C decoding and/or rendering system about which samples in the 2D components are associated with data in the final 3D representation. The occupancy information may be encapsulated in the geometry image as a pre-defined depth value, or the occupancy information is sent separately as the occupancy map. The occupancy map consists of a binary 2D representation where each cell of the grid indicates whether the corresponding cell of other 2D representation (e.g. of geometry or attribute) contains invalid or valid value that is relevant to the reconstruct 3D representation.

[0004] However, transmitting the occupancy information as an additional occupancy map increases the bit rate, and neither of the above-mentioned approaches allows for lossless transmission of the occupancy map at reasonable bitrates.

SUMMARY

[0005] Now, an improved method and technical equipment implementing the method has been invented, by which the above problems are alleviated. Various aspects include a method, an apparatus and a computer readable medium comprising a computer program, or a signal stored therein, which are characterized by what is stated in the independent claims. Various details of the embodiments are disclosed in the dependent claims and in the corresponding images and description.

[0006] The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.

[0007] According to a first aspect, there is provided an apparatus comprising means for obtaining a binary image comprising a grid of binary data blocks; means for encoding pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; means for determining a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state; means for adjusting said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image; and means for signalling said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

[0008] According to an embodiment, said rate distortion cost function is adjusted as MR, for D _R < 1 oo, for D _R > 1

[0009] where DR denotes the number of incorrectly reconstructed binary pixel for a given threshold.

[0010] According to an embodiment, the apparatus comprises means for providing a plurality of threshold values to be used for adjusting said rate distortion cost function; and means for selecting one of said plurality of threshold values resulting in correct reconstruction of the decoded pixel values of said continuous value range to be sent to the receiver.

[0011] According to an embodiment, the apparatus comprises means for determining, for each of the plurality of threshold values, a condition whether a minimum decoded pixel value reconstructed to the first binary state is smaller than a maximum decoded pixel value reconstructed to the second binary state; and means for discarding said threshold value upon meeting the condition.

[0012] According to an embodiment, the apparatus comprises means for signalling said threshold value dynamically per frame or subpicture or tile.

[0013] According to an embodiment, the apparatus comprises means for signalling said threshold value on a unit level.

[0014] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) occupancy information syntax structure.

[0015] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) atlas frame parameter set RBSP syntax structure.

[0016] According to an embodiment, said signalling of the threshold value is carried out as included in a Supplemental Enhancement Information (SEI) message.

[0017] A method according to a second aspect comprises obtaining a binary image comprising a grid of binary data blocks; encoding pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; determining a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state; adjusting said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image; and signalling said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

[0018] An apparatus according to a third aspect comprises means for obtaining a binary image comprising pixels encoded with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; means for obtaining a threshold value for decoded pixel values of said continuous value range of the binary image; and means for adjusting said rate distortion cost function based on said threshold values such that the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state.

[0019] A method according to a fourth aspect comprises obtaining a binary image comprising pixels encoded with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; obtaining a threshold value for decoded pixel values of said continuous value range of the binary image; and adjusting said rate distortion cost function based on said threshold values such that the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state.

[0020] Computer readable storage media according to further aspects comprise code for use by an apparatus, which when executed by a processor, causes the apparatus to perform the above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] For a more complete understanding of the example embodiments, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:

[0022] Figs, la and lb show an encoder and decoder for encoding and decoding 2D pictures;

[0023] Figs. 2a and 2b show a compression and a decompression process for 3D volumetric video; [0024] Fig. 3a - 3d illustrate an example of a typical lossy encoding and reconstruction of a binary image;

[0025] Fig. 4 shows a flow chart for an encoding method according to an embodiment; [0026] Figs. 5a and 5b show an example of reconstruction of a binary image according to an embodiment;

[0027] Figs. 6a - 6d show an example of encoding and reconstruction of a binary image according to another embodiment; and

[0028] Fig. 7 shows a flow chart for a decoding method according to an embodiment.

DETAILED DESCRIPTON OF SOME EXAMPLE EMBODIMENTS

[0029] A video codec comprises an encoder that transforms the input video into a compressed representation suited for storage/transmission, and a decoder that can uncompress the compressed video representation back into a viewable form. An encoder may discard some information in the original video sequence in order to represent the video in a more compact form (i.e. at lower bitrate).

[0030] Volumetric video may be captured using one or more three-dimensional (3D) cameras. When multiple cameras are in use, the captured footage is synchronized so that the cameras provide different viewpoints to the same world. In contrast to traditional 2D/3D video, volumetric video describes a 3D model of the world where the viewer is free to move and observer different parts of the world.

[0031] Volumetric video enables the viewer to move in six degrees of freedom (6DOF): in contrast to common 360° video, where the user has from 2 to 3 degrees of freedom (yaw, pitch, and possibly roll), a volumetric video represents a 3D volume of space rather than a flat image plane. Volumetric video frames contain a large amount of data because they model the contents of a 3D volume instead of just a two-dimensional (2D) plane. However, only a relatively small part of the volume changes over time. Therefore, it may be possible to reduce the total amount of data by only coding information about an initial state and changes which may occur between frames. Volumetric video can be rendered from synthetic 3D animations, reconstructed from multi-view video using 3D reconstruction techniques such as structure from motion, or captured with a combination of cameras and depth sensors such as LiDAR (Light Detection and Ranging), for example. [0032] Volumetric video data represents a three-dimensional scene or object, and thus such data can be viewed from any viewpoint. Volumetric video data can be used as an input for augmented reality (AR), virtual reality (VR) and mixed reality (MR) applications. Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, . ..), together with any possible temporal changes of the geometry and attributes at given time instances (e.g. frames in 2D video). Volumetric video is either generated from 3D models, i.e. computer-generated imagery (CGI), or captured from real-world scenes using a variety of capture solutions, e.g. a multi-camera, a laser scan, a combination of video and dedicated depths sensors, etc. Also, a combination of CGI and real-world data is possible. Examples of representation formats for such volumetric data are triangle meshes, point clouds, or voxel. Temporal information about the scene can be included in the form of individual capture instances, i.e. “frames” in 2D video, or other means, e.g. position of an object as a function of time.

[0033] Increasing computational resources and advances in 3D data acquisition devices has enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the 3D data is used. Dense voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications, such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is coding this 3D data as a set of texture and depth map as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.

[0034] In 3D point clouds, each point of each 3D surface is described as a 3D point with color and/or other attribute information such as surface normal or material reflectance. Point cloud is a set of data points in a coordinate system, for example in a three- dimensional coordinate system being defined by X, Y, and Z coordinates. The points may represent an external surface of an object in the screen space, e.g. in a three-dimensional space.

[0035] In dense point clouds or voxel arrays, the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations are to be stored or interchanged between entities, then efficient compression of the presentations becomes fundamental. Standard volumetric video representation formats, such as point clouds, meshes, voxel, suffer from poor temporal compression performance. Identifying correspondences for motion-compensation in 3D-space is an ill-defined problem, as both, geometry and respective attributes may change. For example, temporal successive “frames” do not necessarily have the same number of meshes, points or voxel. Therefore, compression of dynamic 3D scenes is inefficient. 2D-video based approaches for compressing volumetric data, i.e. multiview with depth, have much better compression efficiency, but rarefy cover the full scene. Therefore, they provide only limited 6DOF capabilities.

[0036] Instead of the above-mentioned approach, a 3D scene, represented as meshes, points, and/or voxel, can be projected onto one, or more, geometries. These geometries may be “unfolded” or packed onto 2D planes (two planes per geometry: one for texture, one for depth), which are then encoded using standard 2D video compression technologies. Relevant projection geometry information may be transmitted alongside the encoded video files to the decoder. The decoder decodes the video and performs the inverse projection to regenerate the 3D scene in any desired representation format (not necessarily the starting format).

[0037] Projecting volumetric models onto 2D planes allows for using standard 2D video coding tools with highly efficient temporal compression. Thus, coding efficiency can be increased greatly. Using geometry-projections instead of 2D-video based approaches based on multiview and depth, provides a better coverage of the scene (or object). Thus, 6DOF capabilities are improved. Using several geometries for individual objects improves the coverage of the scene further. Furthermore, standard video encoding hardware can be utilized for real-time compression/decompression of the projected planes. The projection and the reverse projection steps are of low complexity. [0038] Figs, la and lb show an encoder and decoder for encoding and decoding the 2D texture pictures, geometry pictures and/or auxiliary pictures. A video codec consists of an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. Typically, the encoder discards and/or loses some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate). An example of an encoding process is illustrated in Figure la. Figure la illustrates an image to be encoded (I ⁿ); a predicted representation of an image block (P' ⁿ); a prediction error signal (D ⁿ); a reconstructed prediction error signal (D' ⁿ); a preliminary reconstructed image (I' ⁿ); a final reconstructed image (R' ⁿ); a transform (T) and inverse transform (T ¹); a quantization (Q) and inverse quantization (Q ¹); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F).

[0039] An example of a decoding process is illustrated in Figure lb. Figure lb illustrates a predicted representation of an image block (P' ⁿ); a reconstructed prediction error signal (D' ⁿ); a preliminary reconstructed image (I' ⁿ); a final reconstructed image (R' ⁿ); an inverse transform (T ¹); an inverse quantization (Q ¹); an entropy decoding (E ¹); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F). [0040] Many hybrid video encoders encode the video information in two phases. Firstly pixel values in a certain picture area (or “block”) are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, i.e. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g. Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate). Video codecs may also provide a transform skip mode, which the encoders may choose to use. In the transform skip mode, the prediction error is coded in a sample domain, for example by deriving a sample-wise difference value relative to certain adjacent samples and coding the sample-wise difference value with an entropy coder.

[0041] Many video encoders partition a picture into blocks along a block grid. For example, in the High Efficiency Video Coding (HEVC) standard, the following partitioning and definitions are used. A coding block may be defined as an NxN block of samples for some value of N such that the division of a coding tree block into coding blocks is a partitioning. A coding tree block (CTB) may be defined as an NxN block of samples for some value of N such that the division of a component into coding tree blocks is a partitioning. A coding tree unit (CTU) may be defined as a coding tree block of luma samples, two corresponding coding tree blocks of chroma samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples. A coding unit (CU) may be defined as a coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples. A CU with the maximum allowed size may be named as LCU (largest coding unit) or coding tree unit (CTU) and the video picture is divided into non-overlapping LCUs.

[0042] In HEVC, a picture can be partitioned in tiles, which are rectangular and contain an integer number of LCUs. In HEVC, the partitioning to tiles forms a regular grid, where heights and widths of tiles differ from each other by one LCU at the maximum. In HEVC, a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit. In HEVC, a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan and contained in a single NAL unit. The division of each picture into slice segments is a partitioning. In HEVC, an independent slice segment is defined to be a slice segment for which the values of the syntax elements of the slice segment header are not inferred from the values for a preceding slice segment, and a dependent slice segment is defined to be a slice segment for which the values of some syntax elements of the slice segment header are inferred from the values for the preceding independent slice segment in decoding order. In HEVC, a slice header is defined to be the slice segment header of the independent slice segment that is a current slice segment or is the independent slice segment that precedes a current dependent slice segment, and a slice segment header is defined to be a part of a coded slice segment containing the data elements pertaining to the first or all coding tree units represented in the slice segment. The CUs are scanned in the raster scan order of LCUs within tiles or within a picture, if tiles are not in use. Within an LCU, the CUs have a specific scan order.

[0043] Entropy coding/decoding may be performed in many ways. For example, context-based coding/decoding may be applied, where in both the encoder and the decoder modify the context state of a coding parameter based on previously coded/decoded coding parameters. Context-based coding may for example be context adaptive binary arithmetic coding (CABAC) or context-adaptive variable length coding (CAVLC) or any similar entropy coding. Entropy coding/decoding may alternatively or additionally be performed using a variable length coding scheme, such as Huffman coding/decoding or Exp-Golomb coding/decoding. Decoding of coding parameters from an entropy-coded bitstream or codewords may be referred to as parsing.

[0044] The phrase along the bitstream (e.g. indicating along the bitstream) may be defined to refer to out-of-band transmission, signaling, or storage in a manner that the out- of-band data is associated with the bitstream. The phrase decoding along the bitstream or alike may refer to decoding the referred out-of-band data (which may be obtained from out-of-band transmission, signaling, or storage) that is associated with the bitstream. For example, an indication along the bitstream may refer to metadata in a container file that encapsulates the bitstream.

[0045] A first texture picture may be encoded into a bitstream, and the first texture picture may comprise a first projection of texture data of a first source volume of a scene model onto a first projection surface. The scene model may comprise a number of further source volumes. [0046] In the projection, data on the position of the originating geometry primitive may also be determined, and based on this determination, a geometry picture may be formed. This may happen for example so that depth data is determined for each or some of the texture pixels of the texture picture. Depth data is formed such that the distance from the originating geometry primitive such as a point to the projection surface is determined for the pixels. Such depth data may be represented as a depth picture, and similarly to the texture picture, such geometry picture (such as a depth picture) may be encoded and decoded with a video codec. This first geometry picture may be seen to represent a mapping of the first projection surface to the first source volume, and the decoder may use this information to determine the location of geometry primitives in the model to be reconstructed. In order to determine the position of the first source volume and/or the first projection surface and/or the first projection in the scene model, there may be first geometry information encoded into or along the bitstream. It is noted that encoding a geometry (or depth) picture into or along the bitstream with the texture picture is only optional and arbitrary for example in the cases where the distance of all texture pixels to the projection surface is the same or there is no change in said distance between a plurality of texture pictures. Thus, a geometry (or depth) picture may be encoded into or along the bitstream with the texture picture, for example, only when there is a change in the distance of texture pixels to the projection surface.

[0047] An attribute picture may be defined as a picture that comprises additional information related to an associated texture picture. An attribute picture may for example comprise surface normal, opacity, or reflectance information for a texture picture. A geometry picture may be regarded as one type of an attribute picture, although a geometry picture may be treated as its own picture type, separate from an attribute picture.

[0048] Texture picture(s) and the respective geometry picture(s), if any, and the respective attribute picture(s) may have the same or different chroma format.

[0049] Terms texture (component) image and texture (component) picture may be used interchangeably. Terms geometry (component) image and geometry (component) picture may be used interchangeably. A specific type of a geometry image is a depth image. Embodiments described in relation to a geometry (component) image equally apply to a depth (component) image, and embodiments described in relation to a depth (component) image equally apply to a geometry (component) image. Terms attribute image and attribute picture may be used interchangeably. A geometry picture and/or an attribute picture may be treated as an auxiliary picture in video/image encoding and/or decoding.

[0050] Figures 2a and 2b illustrate an overview of exemplified compression/ decompression processes. The processes may be applied, for example, in MPEG visual volumetric video-based coding (V3C), defined currently in ISO/IEC DIS 23090-5: “Visual Volumetric Video-based Coding and Video-based Point Cloud Compression”, 2nd Edition. [0051 ] V3C specification enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C components, before coding such information. Such representations may include occupancy, geometry, and attribute components. The occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation. The geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g. texture or material information, of such 3D data. An example of volumetric media conversion at an encoder is shown in Figure 2a and an example of a 3D reconstruction at a decoder is shown in Figure 2b.

[0052] There are alternatives to capture and represent a volumetric frame. The format used to capture and represent the volumetric frame depends on the process to be performed on it, and the target application using the volumetric frame. As a first example a volumetric frame can be represented as a point cloud. A point cloud is a set of unstructured points in 3D space, where each point is characterized by its position in a 3D coordinate system (e.g., Euclidean), and some corresponding attributes (e.g., color information provided as RGBA value, or normal vectors). As a second example, a volumetric frame can be represented as images, with or without depth, captured from multiple viewpoints in 3D space. In other words, the volumetric video can be represented by one or more view frames (where a view is a projection of a volumetric scene on to a plane (the camera plane) using a real or virtual camera with known/ computed extrinsic and intrinsic). Each view may be represented by a number of components (e.g., geometry, color, transparency, and occupancy picture), which may be part of the geometry picture or represented separately. As a third example, a volumetric frame can be represented as a mesh. Mesh is a collection of points, called vertices, and connectivity information between vertices, called edges. Vertices along with edges form faces. The combination of vertices, edges and faces can uniquely approximate shapes of objects.

[0053] Depending on the capture, a volumetric frame can provide viewers the ability to navigate a scene with six degrees of freedom, i.e., both translational and rotational movement of their viewing pose (which includes yaw, pitch, and roll). The data to be coded for a volumetric frame can also be significant, as a volumetric frame can contain many numbers of objects, and the positioning and movement of these objects in the scene can result in many dis-occluded regions. Furthermore, the interaction of the light and materials in objects and surfaces in a volumetric frame can generate complex light fields that can produce texture variations for even a slight change of pose.

[0054] A sequence of volumetric frames is a volumetric video. Due to large amount of information, storage and transmission of a volumetric video requires compression. A way to compress a volumetric frame can be to project the 3D geometry and related attributes into a grid of 2D images along with additional associated metadata. The projected 2D images can then be coded using 2D video and image coding technologies, for example ISO/IEC 14496-10 (H.264/AVC) and ISO/IEC 23008-2 (H.265/HEVC). The metadata can be coded with technologies specified in specification such as ISO/IEC 23090-5. The coded images and the associated metadata can be stored or transmitted to a client that can decode and render the 3D volumetric frame.

[0055] In the following, a short reference of ISO/IEC 23090-5 Visual Volumetric Video-based Coding (V3C) and Video-based Point Cloud Compression (V-PCC) 2nd Edition is given. ISO/IEC 23090-5 specifies the syntax, semantics, and process for coding volumetric video. The specified syntax is designed to be generic, so that it can be reused for a variety of applications. Point clouds, immersive video with depth, and mesh representations can all use ISO/IEC 23090-5 standard with extensions that deal with the specific nature of the final representation. The purpose of the specification is to define how to decode and interpret the associated data (for example atlas data in ISO/IEC 23090-5) which tells a Tenderer how to interpret 2D frames to reconstruct a volumetric frame.

[0056] Two applications of V3C (ISO/IEC 23090-5) have been defined, V-PCC (ISO/IEC 23090-5) and MPEG Immersive Video (MIV) (ISO/IEC 23090-12). MIV and V- PCC use a number of V3C syntax elements with a slightly modified semantics

[0057] MPEG 3DG (ISO SC29 WG7) group has started a work on a third application of V3C - the mesh compression. It is also envisaged that mesh coding will re-use V3C syntax as much as possible and can also slightly modify the semantics.

[0058] V3C bitstream is a sequence of bits that forms the representation of coded volumetric frames and the associated data making one or more coded V3C sequences (CVS). V3C bitstream is composed of V3C units that contain V3C video sub-bitstreams, V3C atlas sub-bitstreams, or V3C Parameter Set (VPS). Fig. 3 illustrates an example of VC3 bitstream. Video sub-bitstream and atlas sub-bitstreams can be referred to as V3C sub-bitstreams. Each V3C unit has a V3C unit header and a V3C unit payload. A V3C unit header in conjunction with VPS information identify which V3C sub-bitstream a V3C unit contains and how to interpret it.

[0059] V3C bitstream can be stored according to Annex C of ISO/IEC 23090-5, which specifies syntax and semantics of a sample stream format to be used by applications that deliver some or all of the V3C unit stream as an ordered stream of bytes or bits within which the locations of V3C unit boundaries need to be identifiable from patterns in the data.

[0060] The occupancy information or occupancy picture, typically referred to as the occupancy map, consists of a binary 2D representation where each cell of the grid indicates whether the corresponding cell of other 2D representation (e.g. of geometry or attribute) contains invalid or valid value that is relevant to the reconstruct 3D representation. In V3C, the occupancy map can be downscaled to minimise bitrate requirements. The occupancy map is then compressed using lossless video compression and is upscaled to the nominal resolution at the decoder. The nearest neighbour method is applied for upscaling. Additional filtering tools to improve the reconstruction quality for both, the nominal resolution occupancy map and the 3D representation, are optional in the V3C standard. [0061] Thus, for an accurate 2D-3D reconstruction, the decoder must be aware which 2D values are valid and which values stem from interpolation/padding. This requires the transmission of additional data, i.e. occupancy information. In MPEG V3C MIV, this data may be encapsulated in the geometry image as a pre-defined depth value (e.g. 0) or a predefined range of depth values. In MPEG V3C V-PCC, this data is sent separately as the occupancy map.

[0062] However, both approaches have distinct drawbacks. In MPEG V3C MIV, the geometry image is not blurred/padded, wherein the increase in coding efficiency is only achieved on the texture image. Furthermore, encoding artefacts at the object boundaries of the geometry image may create severe artefacts, which require post processing and may not be concealable.

[0063] In MPEG V3C V-PCC, the occupancy information as an additional occupancy map is costly to transmit. To minimize the cost, the occupancy map is spatially downsampled. Nevertheless, it still requires 8-18% of the overall bit rate budget. In addition, 3D resampling and additional motion information is required to reconstruct the occupancy map. Without downsampling, the cost of lossless transmission becomes even higher.

[0064] Moreover, neither of the above-mentioned approaches allows for lossless transmission of the occupancy map at reasonable bitrates. Figures 3a - 3d illustrate an example of a typical lossy encoding of a binary image, i.e. an occupancy map in this case, where a threshold for pixel values of the underlying 2D representation is set and transmitted to a decoder for reconstruction, wherein the decoder reconstructs the binary values of the occupancy map to 1 for the corresponding pixels having value equal to or greater than the threshold and 0 for the corresponding pixels having value less than the threshold.

[0065] Figure 3a shows the original binary occupancy map, which is coded using lossy video coding, generating blurring at the occupancy edges. Figure 3b shows the 8-bit decoded pixel values of lossy encoded binary occupancy map with aforementioned blurring at the edges. To reconstruct the binary image, i.e. the occupancy information, from the blurred/continuous image, a threshold is typically sent in or along the V3C bitstream to the decoder. The decoder then reconstructs the binary values of the occupancy map to 1 for the corresponding pixels having value equal to or greater than the threshold.

[0066] Figure 3c shows the reconstruction of the occupancy map, where a threshold value of 160 has been applied. As shown, the occupancy map is not reconstructed correctly, but it covers an additional pixel having the value of 160. Figure 3d, in turn, shows the reconstruction of the occupancy map, where a threshold value of 180 has been applied. Again, the occupancy map is not reconstructed correctly, but it lacks an underlying pixel having the value of 160 in the upper-right comer. Accordingly, there is typically no threshold that would allow for 100% correct lossless reconstruction of the occupancy information.

[0067] In the following, an enhanced method for binary image data encoding will be described in more detail, in accordance with various embodiments.

[0068] The method, which is disclosed in Figure 4 and illustrates the operation of an encoder, comprises obtaining (400) a binary image comprising a grid of binary data blocks,; encoding (402) pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; determining (404) a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state; adjusting (406) said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image; and signalling (408) said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

[0069] Thus, the method enables to ensure lossless reconstruction of binary image data, while using lossy encoding to minimize bit rate requirements. Herein, the rate distortion (RD) cost function is modified to ensure that upon reconstruction, all pixel values fall in the correct range according to a given reconstruction threshold. I.e. if a threshold A is given, all occupied pixels (having a value 1) in the original binary image, such as an occupancy map (OM), will be reconstructed to a value of A or larger, and all unoccupied pixels (having a value 0) in the original binary image will be reconstructed to a value of A- 1 or lower.

[0070] An example of Figures 5a and 5b illustrates the effects of the method as addressing the drawback of the example shown in Figures 3a - 3d. The determined threshold value and the accordingly adjusted rate distortion cost function result in the upper-right comer pixel value to decoded as 165 instead of 160, as shown in Figure 5a. Thus, any threshold value above 160 will result in the occupancy map, as shown in Figure 5b, to be coded using lossy video encoding, while still ensuring correct reconstruction according to the original OM shown in Figure 3a.

[0071] It is noted that the binary image processing as described in the embodiments may refer to an occupancy map (OM). However, the embodiments are applicable to any other form of binary image requiring lossless reconstruction.

[0072] In video coding, the rate distortion cost function may be defined, for example, as the cost K for encoding a unit with the distortion D at rate R:

K = D + A R

[0073] wherein the distortion D is given by the error between original and reconstructed data, for example by the sum of absolute differences between original and reconstructed data, and the rate R describes the cost of coding the current unit. The weighting factor A contributes the allowed quality degradation, given by the quantization parameter QP, e.g.

[0074] According to an embodiment, said rate distortion cost function is adjusted as AR, for D _R < 1 oo, for D _R > 1 where DR denotes the number of incorrectly reconstructed binary pixel for a given threshold.

[0075] As a result, all reconstructed pixel values will fall in the correct range according to a given reconstruction threshold.

[0076] According to an embodiment, the method comprises providing a plurality of threshold values to be used for adjusting said rate distortion cost function; and selecting one of said plurality of threshold values resulting in correct reconstruction of the decoded pixel values of said continuous value range to be sent to the receiver.

[0077] Thus, a flexible adaption of the threshold value is provided, wherein a plurality of threshold values is evaluated by the encoder, similar to any other mode decision. The cost function may remain the same as above. One of the threshold values resulting in correct reconstruction of the decoded pixel values represented by said continuous value range is selected and sent in or along the V3C bitstream, wherein the bitstream may comprise the binary image, such as the occupancy map, and/or an associated 2D image data representation, such as as V3C sub-bitstreams. This modification allows for a more flexible adjustment of the reconstruction threshold, thus reducing bitrate requirements. However, compared to the embodiment with only one determined threshold value, the encoding complexity is increased due to the additional mode decisions.

[0078] An example of this embodiment is illustrated in Figures 6a - 6d. Figure 6a shows the original occupancy map, similar to Figure 3a. Figure 6b shows the 8-bit decoded pixel values of the underlying 2D image representation similar to Figure 3b. As the first step of the mode decision process, the encoder tests the reconstruction at given threshold value, for example A equal to 160, resulting in an incorrect reconstruction shown in Figure 6c. In the next step of the mode decision process, the encoder increases the threshold value, for example A equal to 161, which result in a lossless reconstruction shown in Figure 6d. Thus, this mode is selected, and the new reconstruction threshold is signalled in or along the bitstream.

[0079] The threshold values may be determined in various ranges, for example, including but not limiting to following:

The full range of the available values, e.g. 0 to 255 for 8-bit content Any sub-range of the full range, e.g. 120 to 155, wherein said subrange may be indicated to the encoder (no signalling to the decoder is required)

- Around a given threshold, e.g. 20 values up and down around 100, wherein the value and optionally the range may be indicated to the encoder (no signalling to the decoder is required). [0080] According to an embodiment, the method comprises determining, for each of the plurality of threshold values, a condition whether a minimum decoded pixel value reconstructed to the first binary state is smaller than a maximum decoded pixel value reconstructed to the second binary state; and discarding said threshold value upon meeting the condition.

[0081] Accordingly, the encoder may test for each mode decision that the maximal value of the pixels corresponding to an occupancy value equal to zero is not higher than the minimal value of the pixels corresponding to an occupancy value equal to one. If this condition is not met for the tested encoding mode, said encoding mode is discarded, and another encoding mode is tested.

[0082] According to an embodiment, the method comprises signalling said threshold value dynamically per frame or subpicture or tile.

[0083] According to an embodiment, the method comprises signalling said threshold value on a unit level.

[0084] Some video coding decisions are typically made per frame or a sub-frame, such as subpicture or tile. Video coding decisions may also be made on a unit level (CTU, CU, PU, TU). Threshold decisions on a unit level, for example at the lowest unit level, may therefore be more precise, ensuring better compression and lower bit rate requirements. Each unit-level reconstruction threshold may be signalled along the respective unit.

[0085] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) occupancy information syntax structure.

[0086] Thus, the reconstruction threshold may be signalled globally, e.g. in the V3C occupancy information syntax, using a syntax element, which may be referred to as, for example, oi occupancy reconstruction threshold. An example of the syntax is given below:

[0087] Herein, oi_occupancy_reconstruction_threshold [ j ] indicates the value at or above which an occupancy pixel shall be reconstructed to 1, otherwise reconstructed to 0. oi occupancy reconstruction threshold [ j ] shall be in the range of 0 to 2 ^Aoi_occupancy_2d_bit_depth_minus, inclusive.

[0088] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) atlas frame parameter set RBSP syntax structure.

[0089] Thus, the reconstruction threshold may be signalled on frame level, e.g. in the V3C atlas frame parameter set RBSP syntax, using at least a syntax element, which may be referred to as, for example, afps occupancy reconstruction threshold. An example of the syntax is given below:

[0090] Herein, afps reconstruction threshold present flag equal to 1 specifies that the syntax element afps occupancy reconstruction threshold is present in the atlas_frame_parameter_set_rbsp( ). afps_ reconstruction_threshold_present flag equal to 0 specifies that the syntax element afps occupancy reconstruction threshold is not present in the atlas_frame_parameter_set_rbsp( )

[0091] afps_occupancy_reconstruction_threshold [ j ] indicates the value at or above which an occupancy pixel shall be reconstructed to 1, otherwise reconstructed to 0. afps occupancy reconstruction threshold [ j ] shall be in the range of 0 to 2 ^Aoi_occupancy_2d_bit_depth_minus, inclusive.

[0092] According to an embodiment, said signalling of the threshold value is carried out as an extension to any V3C parameter set (VPS, ASPS, AFPS, CASPS, CAF).

[0093] According to an embodiment, said signalling of the threshold value is carried out as included in a video coding standard used for occupancy map compression. [0094] Especially indicating the coding unit-level reconstruction threshold values require a finer granularity, which cannot be easily signalled within the V3C atlas data. Thus, the signalling of the threshold values may be included in the video coding standard used for occupancy map compression. An example of the syntax for H.266/VVC is given below:

[0095] Herein, cu binary coding flag [ xO ][ yO ] equal to 1 specifies that for the current coding unit the syntax element cu binary coding threshold is present. cu binary coding flag [ xO ][ yO ] equal to 1 specifies that for the current coding unit the syntax element cu binary coding threshold is not present.

[0096] cu_binary_coding_threshold indicates the value at or above which value a pixel shall be reconstructed to maximum value, otherwise reconstructed to minimum value. [0097] According to an embodiment, said signalling of the threshold value is carried out as included in a Supplemental Enhancement Information (SEI) message.

[0098] Thus, a new SEI message may be defined, with similar syntax and semantics, that would carry occupancy reconstruction threshold as part of a V3C atlas sub-bitstream. [0099] The method disclosed in Figure 7 illustrates the operation of a decoder, wherein the method comprises obtaining (700) a binary image comprising pixels encoded with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; obtaining (702) a threshold value for decoded pixel values of said continuous value range of the binary image; and adjusting (704) said rate distortion cost function based on said threshold values such that the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state.

[0100] Thus, the decoder obtains a binary image, such as an occupancy map, e.g. receives the binary image in a bitstream or retrieves it from a memory, wherein the binary image comprises pixels encoded with a continuous value range, wherein the continuous value range is determined by a predefined rate distortion cost function. The decoder also obtains a threshold value, e.g. receives it as signalling in or along the bitstream comprising the binary image. The decoder then uses the threshold value to adjust the rate distortion cost function to ensure that all decoded pixel values will fall in the correct binary state upon reconstruction. In other words, the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state (such as to value 1), and the decoded pixel values less than the threshold are reconstructed to a second binary state (such as to value 0).

[0101] The embodiments relating to the encoding aspects may be implemented in an apparatus comprising: means for obtaining a binary image comprising a grid of binary data blocks; means for encoding pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; means for determining a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state; means for adjusting said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image; and means for signalling said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

[0102] According to an embodiment, said rate distortion cost function is adjusted as MR, for D _R < 1 oo, for D _R > 1

[0103] where DR denotes the number of incorrectly reconstructed binary pixel for a given threshold.

[0104] According to an embodiment, the apparatus comprises means for providing a plurality of threshold values to be used for adjusting said rate distortion cost function; and means for selecting one of said plurality of threshold values resulting in correct reconstruction of the decoded pixel values of said continuous value range to be sent to the receiver. [0105] According to an embodiment, the apparatus comprises means for determining, for each of the plurality of threshold values, a condition whether a minimum decoded pixel value reconstructed to the first binary state is smaller than a maximum decoded pixel value reconstructed to the second binary state; and means for discarding said threshold value upon meeting the condition.

[0106] According to an embodiment, the apparatus comprises means for signalling said threshold value dynamically per frame or subpicture or tile.

[0107] According to an embodiment, the apparatus comprises means for signalling said threshold value on a unit level.

[0108] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) occupancy information syntax structure.

[0109] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) atlas frame parameter set RBSP syntax structure.

[0110] According to an embodiment, said signalling of the threshold value is carried out as included in a Supplemental Enhancement Information (SEI) message.

[011 1 ] The embodiments relating to the encoding aspects may likewise be implemented in an apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: obtain a binary image comprising a grid of binary data blocks; encode pixels of said binary image with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; determine a threshold for decoded pixel values of said continuous value range of the binary image, wherein the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state; adjust said rate distortion cost function such that all decoded pixel values of said continuous value range are reconstructed to the same binary states as in the obtained binary image; and signal said threshold to a receiver in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

[0112] According to an embodiment, said rate distortion cost function is adjusted as

MR, for D _R < 1 oo, for D _R > 1

[01 13] where DR denotes the number of incorrectly reconstructed binary pixel for a given threshold.

[0114] According to an embodiment, the apparatus comprises code causing the apparatus to provide a plurality of threshold values to be used for adjusting said rate distortion cost function; and select one of said plurality of threshold values resulting in correct reconstruction of the decoded pixel values of said continuous value range to be sent to the receiver.

[0115] According to an embodiment, the apparatus comprises code causing the apparatus to determine, for each of the plurality of threshold values, a condition whether a minimum decoded pixel value reconstructed to the first binary state is smaller than a maximum decoded pixel value reconstructed to the second binary state; and discard said threshold value upon meeting the condition.

[0116] According to an embodiment, the apparatus comprises code causing the apparatus to signal said threshold value dynamically per frame or subpicture or tile. [0117] According to an embodiment, the apparatus comprises code causing the apparatus to signal said threshold value on a unit level.

[0118] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) occupancy information syntax structure.

[0119] According to an embodiment, said signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) atlas frame parameter set RBSP syntax structure.

[0120] The embodiments relating to the decoding aspects may be implemented in an apparatus comprising means for obtaining a binary image comprising pixels encoded with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; means for obtaining a threshold value for decoded pixel values of said continuous value range of the binary image; and means for adjusting said rate distortion cost function based on said threshold values such that the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state.

[0121] According to an embodiment, the apparatus comprises means for receiving said threshold value in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

[0122] According to an embodiment, signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) occupancy information syntax structure.

[0123] According to an embodiment, signalling of the threshold value is carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) atlas frame parameter set RBSP syntax structure.

[0124] The embodiments relating to the decoding aspects may likewise be implemented in an apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: obtain a binary image comprising pixels encoded with a continuous value range, wherein said continuous value range is determined by a predefined rate distortion cost function; obtain a threshold value for decoded pixel values of said continuous value range of the binary image; and adjust said rate distortion cost function based on said threshold values such that the decoded pixel values equal to or greater than the threshold are reconstructed to a first binary state, and the decoded pixel values less than the threshold are reconstructed to a second binary state.

[0125] According to an embodiment, the apparatus comprises code causing the apparatus to receive said threshold value in or along a bitstream comprising said binary image and/or an associated 2D image data representation.

[0126] Such apparatuses may comprise e.g. the functional units disclosed in any of the Figures la, lb, 2a and 2b for implementing the embodiments. [0127] In the above, some embodiments have been described with reference to encoding. It needs to be understood that said encoding may comprise one or more of the following: encoding source image data into a bitstream, encapsulating the encoded bitstream in a container file and/or in packet(s) or stream(s) of a communication protocol, and announcing or describing the bitstream in a content description, such as the Media Presentation Description (MPD) of ISO/IEC 23009-1 (known as MPEG-DASH) or the IETF Session Description Protocol (SDP). Similarly, some embodiments have been described with reference to decoding. It needs to be understood that said decoding may comprise one or more of the following: decoding image data from a bitstream, decapsulating the bitstream from a container file and/or from packet(s) or stream(s) of a communication protocol, and parsing a content description of the bitstream.

[0128] In the above, where the example embodiments have been described with reference to an encoder or an encoding method, it needs to be understood that the resulting bitstream and the decoder or the decoding method may have corresponding elements in them. Likewise, where the example embodiments have been described with reference to a decoder, it needs to be understood that the encoder may have structure and/or computer program for generating the bitstream to be decoded by the decoder.

[0129] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0130] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate. [0131] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

[0132] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended examples. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.

Previous Patent: SPACE RESERVATION

Next Patent: DISPLAY STRUCTURE