Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR VOLUMETRIC VIDEO
Document Type and Number:
WIPO Patent Application WO/2023/037040
Kind Code:
A1
Abstract:
A method comprising: receiving a volumetric media frame comprising three-dimensional (3D) data content (700); encoding the 3D data content into 3D spatial regions using a mesh compression algorithm (702); identifying patch boundaries and vertex density among the mesh-compressed 3D spatial regions (704); determining at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density (706); and signaling said at least one surface reconstruction parameter and the mesh-compressed 3D spatial regions in at least one bitstream (708).

Inventors:
ILOLA LAURI ALEKSI (DE)
KONDRAD LUKASZ (DE)
BACHHUBER CHRISTOPH (DE)
SCHWARZ SEBASTIAN (DE)
Application Number:
PCT/FI2022/050507
Publication Date:
March 16, 2023
Filing Date:
July 28, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA TECHNOLOGIES OY (FI)
International Classes:
H04N19/597; G06T7/10; G06T9/00; G06T15/08; G06T17/20; H04N13/178; H04N19/192
Foreign References:
US20210090301A12021-03-25
US20120086720A12012-04-12
US20200286261A12020-09-10
Other References:
DANILLO GRAZIOSI (SONY), ALEXANDRE ZAGHETTO (SONY), ALI TABATABAI (SONY): "[V-PCC][EE2.6-related] Mesh Patch Data", 132. MPEG MEETING; 20201012 - 20201016; ONLINE; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), 7 October 2020 (2020-10-07), XP030292889
BERNARDINI, F. ET AL.: "The ball-pivoting algorithm for surface reconstruction", IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, vol. 5, no. 4, 1 October 1999 (1999-10-01), pages 349 - 359, XP000908551, Retrieved from the Internet [retrieved on 20220509], DOI: 10.1109/2945.817351
Attorney, Agent or Firm:
NOKIA TECHNOLOGIES OY et al. (FI)
Download PDF:
Claims:
CLAIMS

1. An apparatus comprising: means for receiving a volumetric media frame comprising three-dimensional (3D) data content; means for encoding the 3D data content into 3D spatial regions using a mesh compression algorithm; means for identifying patch boundaries and vertex density among the mesh- compressed 3D spatial regions; means for determining at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and means for signaling said at least one surface reconstruction parameter and the mesh-compressed 3D spatial regions in at least one bitstream.

2. The apparatus according to claim 1, comprising means for performing a test-rendering using a surface reconstruction algorithm, said at least one surface reconstruction parameter and information about the mesh- compressed 3D spatial regions; and means for re-determining, based on the test-rendering, said at least one surface reconstruction parameter.

3. The apparatus according to claim 2, comprising means for repeating the test-rendering using the re-determined at least one surface reconstruction parameter until a predetermined threshold value descriptive of image quality is reached.

4. The apparatus according to any preceding claim, comprising means for signaling, in or along the at least one bitstream, an indication about one or more spatial regions where the surface reconstruction is to be applied.

35

5. The apparatus according to any preceding claim, wherein the signaling of spatial regions and sub-regions comprises offsets and sizes of the spatial regions.

6. The apparatus according to any preceding claim, wherein the surface reconstruction parameters include an identification of at least one relevant spatial region.

7. The apparatus according to any preceding claim, comprising means for signaling the spatial regions and the surface reconstruction parameters individually.

8. The apparatus according to any preceding claim, wherein the signaling of the parameters for spatially optimized mesh reconstruction is configured to be carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) parameter set extension syntax structure.

9. The apparatus according to 8, wherein the surface reconstruction parameters and the spatial regions are signaled in one of the following: an atlas sequence parameter set extension, an atlas frame parameter set extension, one or more SEI messages.

10. A method comprising: receiving a volumetric media frame comprising three-dimensional (3D) data content; encoding the 3D data content into 3D spatial regions using a mesh compression algorithm; identifying patch boundaries and vertex density among the mesh-compressed 3D spatial regions; determining at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and signaling said at least one surface reconstruction parameter and the mesh- compressed 3D spatial regions in at least one bitstream.

36

11. The method according to claim 10, comprising: performing a test-rendering using a surface reconstruction algorithm, said at least one surface reconstruction parameter and information about the mesh-compressed 3D spatial regions; and re-determining, based on the test-rendering, said at least one surface reconstruction parameter.

12. The method according to claim 11, comprising: repeating the test-rendering using the re-determined at least one surface reconstruction parameter until a predetermined threshold value descriptive of image quality is reached.

13. The method according to any of claims 10 - 12, comprising: signaling an indication about the used mesh compression algorithm in or along the at least one bitstream.

14. An apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: receive a volumetric media frame comprising three-dimensional (3D) data content; encode the 3D data content into 3D spatial regions using a mesh compression algorithm; identify patch boundaries and vertex density among the mesh-compressed 3D spatial regions; determine at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and signal said at least one surface reconstruction parameter and the mesh- compressed 3D spatial regions in at least one bitstream.

15. An apparatus comprising: means for receiving a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh-compressed 3D spatial regions; means for receiving, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; means for generating, a mesh representation from the 3D volumetric representation data; means for applying the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and means for rendering the mesh with the reconstructed surfaces.

16. A method comprising: receiving a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh-compressed 3D spatial regions; receiving, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; generating a mesh representation from the 3D volumetric representation data; applying the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and rendering the mesh with the reconstructed surfaces.

17. An apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: receive a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh-compressed 3D spatial regions; receive, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; generate a mesh representation from the 3D volumetric representation data; apply the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and render the mesh with the reconstructed surfaces.

39

Description:
AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR

VOEUMETRIC VIDEO

TECHNICAE FIEED

[0001 ] The present invention relates to an apparatus, a method and a computer program for volumetric video coding.

BACKGROUND

[0002] Visual volumetric video-based coding (V3C; defined in ISO/IEC DIS 23090-5) provides a generic syntax and mechanism for volumetric video coding. The generic syntax can be used by applications targeting volumetric content, such as point clouds, immersive video with depth, and mesh representations of volumetric frames. The purpose of the specification is to define how to decode and interpret the associated data (atlas data in ISO/IEC 23090-5) which tells a Tenderer how to interpret 2D frames for reconstructing volumetric frames.

[0003] The current definition of V3C (ISO/IEC 23090-5) comprises two applications, i.e. video-based point cloud compression (V-PCC; defined in ISO/IEC 23090-5) and MPEG immersive video (MIV; defined in ISO/IEC 23090-12). Moreover, MPEG 3DG (ISO SC29 WG7) group has started a work on a third volumetric video coding application, i.e. V3C mesh compression.

[0004] The current development in MPEG 3DG for integration of MESH compression into the V3C family of standards typically relies on mesh reconstruction at the decoder, which allows reducing the amount of information required for compressing traditional mesh information. While the reconstruction of meshes at the decoder side works well in some situations, e.g. when a mesh is constructed based on pixels a single geometry patch, some inherent properties of video coding cause the mesh reconstruction to be problematic at the decoder in some other situations, especially if the surface for the mesh generation originates from vertices connected from two or more patches. SUMMARY

[0005] Now, an improved method and technical equipment implementing the method has been invented, by which the above problems are alleviated. Various aspects include a method, an apparatus and a computer readable medium comprising a computer program, or a signal stored therein, which are characterized by what is stated in the independent claims. Various details of the embodiments are disclosed in the dependent claims and in the corresponding images and description.

[0006] The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.

[0007] According to a first aspect, there is provided a method comprising: receiving a volumetric media frame comprising three-dimensional (3D) data content; encoding the 3D data content into 3D spatial regions using a mesh compression algorithm; identifying patch boundaries and vertex density among the mesh-compressed 3D spatial regions; determining at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and signaling said at least one surface reconstruction parameter and the mesh-compressed 3D spatial regions in at least one bitstream.

[0008] An apparatus according to a second aspect comprises: means for receiving a volumetric media frame comprising three-dimensional (3D) data content; means for encoding the 3D data content into 3D spatial regions using a mesh compression algorithm; means for identifying patch boundaries and vertex density among the mesh-compressed 3D spatial regions; means for determining at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and means for signaling said at least one surface reconstruction parameter and the mesh-compressed 3D spatial regions in at least one bitstream.

[0009] According to an embodiment, the apparatus comprises: means for performing a test-rendering using a surface reconstruction algorithm, said at least one surface reconstruction parameter and information about the mesh-compressed 3D spatial regions; and means for re-determining, based on the test-rendering, said at least one surface reconstruction parameter.

[0010] According to an embodiment, the apparatus comprises: means for repeating the test-rendering using the re-determined at least one surface reconstruction parameter until a predetermined threshold value descriptive of image quality is reached.

[0011] According to an embodiment, the apparatus comprises: means for signaling, in or along the at least one bitstream, an indication about one or more spatial regions where the surface reconstruction is to be applied.

[0012] According to an embodiment, the signaling of spatial regions and sub-regions comprises offsets and sizes of the spatial regions.

[0013] According to an embodiment, the surface reconstruction parameters include an identification of at least one relevant spatial region.

[0014] According to an embodiment, the apparatus comprises: means for signaling the spatial regions and the surface reconstruction parameters individually.

[0015] According to an embodiment, the signaling of the parameters for spatially optimized mesh reconstruction is configured to be carried out by at least one syntax element included in a visual volumetric video-based coding (V3C) parameter set extension syntax structure.

[0016] According to an embodiment, the surface reconstruction parameters and the spatial regions are signaled in one of the following: an atlas sequence parameter set extension, an atlas frame parameter set extension, one or more SEI messages.

[0017] An apparatus according to a third aspect comprises at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: receive a volumetric media frame comprising three-dimensional (3D) data content; encode the 3D data content into 3D spatial regions using a mesh compression algorithm; identify patch boundaries and vertex density among the mesh-compressed 3D spatial regions; determine at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and signal said at least one surface reconstruction parameter and the mesh- compressed 3D spatial regions in at least one bitstream. [0018] An apparatus according to a fourth aspect comprises: means for receiving a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh-compressed 3D spatial regions; means for receiving, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; means for generating, a mesh representation from the 3D volumetric representation data; means for applying the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and means for rendering the mesh with the reconstructed surfaces.

[0019] A method according to a fifth aspect comprises: receiving a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh-compressed 3D spatial regions; receiving, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; generating a mesh representation from the 3D volumetric representation data; applying the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and rendering the mesh with the reconstructed surfaces.

[0020] An apparatus according to a sixth aspect comprises at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: receive a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh- compressed 3D spatial regions; receive, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; generate a mesh representation from the 3D volumetric representation data; apply the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and render the mesh with the reconstructed surfaces. [0021] Computer readable storage media according to further aspects comprise code for use by an apparatus, which when executed by a processor, causes the apparatus to perform the above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] For a more complete understanding of the example embodiments, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:

[0023] Figs, la and lb show an encoder and decoder for encoding and decoding 2D pictures;

[0024] Figs. 2a and 2b show a compression and a decompression process for 3D volumetric video;

[0025] Fig. 3 shows an example of block-to-patch mapping with 4 projected patches onto an atlas;

[0026] Figs. 4a - 4c show an illustrative example of a patch projection into 2D domain for atlas data;

[0027] Figs. 5a and 5b show extensions to the V-PCC encoder and decoder to support mesh encoding and mesh decoding;

[0028] Figs. 6a and 6b an example illustrating the problem of connecting vertices from two or more patches;

[0029] Fig. 7 shows a flow chart for an encoding method according to an embodiment;

[0030] Figs. 8a and 8b show an example illustrating shortcomings of a constant ballradius surface reconstruction algorithm;

[0031] Fig. 9 shows an exemplified block chart of an apparatus according to an embodiment;

[0032] Fig. 10 shows a flow chart for decoding method according to an embodiment; and

[0033] Fig. 11 shows an exemplified block chart of an apparatus according to an embodiment. DETAILED DESCRIPTON OF SOME EXAMPLE EMBODIMENTS

[0034] In the following, several embodiments of the invention will be described in the context of point cloud models for volumetric video coding. It is to be noted, however, that the invention is not limited to specific scene models or specific coding technologies. In fact, the different embodiments have applications in any environment where coding of volumetric scene data is required.

[0035] A video codec comprises an encoder that transforms the input video into a compressed representation suited for storage/transmission, and a decoder that can uncompress the compressed video representation back into a viewable form. An encoder may discard some information in the original video sequence in order to represent the video in a more compact form (i.e. at lower bitrate).

[0036] Volumetric video may be captured using one or more three-dimensional (3D) cameras. When multiple cameras are in use, the captured footage is synchronized so that the cameras provide different viewpoints to the same world. In contrast to traditional 2D/3D video, volumetric video describes a 3D model of the world where the viewer is free to move and observer different parts of the world.

[0037] Volumetric video enables the viewer to move in six degrees of freedom (6DOF): in contrast to common 360° video, where the user has from 2 to 3 degrees of freedom (yaw, pitch, and possibly roll), a volumetric video represents a 3D volume of space rather than a flat image plane. Volumetric video frames contain a large amount of data because they model the contents of a 3D volume instead of just a two-dimensional (2D) plane. However, only a relatively small part of the volume changes over time. Therefore, it may be possible to reduce the total amount of data by only coding information about an initial state and changes which may occur between frames. Volumetric video can be rendered from synthetic 3D animations, reconstructed from multi-view video using 3D reconstruction techniques such as structure from motion, or captured with a combination of cameras and depth sensors such as LiDAR (Light Detection and Ranging), for example. [0038] Volumetric video data represents a three-dimensional scene or object, and thus such data can be viewed from any viewpoint. Volumetric video data can be used as an input for augmented reality (AR), virtual reality (VR) and mixed reality (MR) applications. Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, . ..), together with any possible temporal changes of the geometry and attributes at given time instances (e.g. frames in 2D video). Volumetric video is either generated from 3D models, i.e. computer-generated imagery (CGI), or captured from real-world scenes using a variety of capture solutions, e.g. a multi-camera, a laser scan, a combination of video and dedicated depths sensors, etc. Also, a combination of CGI and real-world data is possible. Examples of representation formats for such volumetric data are triangle meshes, point clouds, or voxel. Temporal information about the scene can be included in the form of individual capture instances, i.e. “frames” in 2D video, or other means, e.g. position of an object as a function of time.

[0039] Increasing computational resources and advances in 3D data acquisition devices has enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the 3D data is used. Dense voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications, such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is coding this 3D data as a set of texture and depth map as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.

[0040] In 3D point clouds, each point of each 3D surface is described as a 3D point with color and/or other attribute information such as surface normal or material reflectance. Point cloud is a set of data points in a coordinate system, for example in a three- dimensional coordinate system being defined by X, Y, and Z coordinates. The points may represent an external surface of an object in the screen space, e.g. in a three-dimensional space.

[0041 ] In dense point clouds or voxel arrays, the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations are to be stored or interchanged between entities, then efficient compression of the presentations becomes fundamental. Standard volumetric video representation formats, such as point clouds, meshes, voxel, suffer from poor temporal compression performance. Identifying correspondences for motion-compensation in 3D-space is an ill-defined problem, as both, geometry and respective attributes may change. For example, temporal successive “frames” do not necessarily have the same number of meshes, points or voxel. Therefore, compression of dynamic 3D scenes is inefficient. 2D-video based approaches for compressing volumetric data, i.e. multiview with depth, have much better compression efficiency, but rarely cover the full scene. Therefore, they provide only limited 6DOF capabilities.

[0042] Instead of the above-mentioned approach, a 3D scene, represented as meshes, points, and/or voxel, can be projected onto one, or more, geometries. These geometries may be “unfolded” or packed onto 2D planes (two planes per geometry: one for texture, one for depth), which are then encoded using standard 2D video compression technologies. Relevant projection geometry information may be transmitted alongside the encoded video files to the decoder. The decoder decodes the video and performs the inverse projection to regenerate the 3D scene in any desired representation format (not necessarily the starting format).

[0043] Projecting volumetric models onto 2D planes allows for using standard 2D video coding tools with highly efficient temporal compression. Thus, coding efficiency can be increased greatly. Using geometry-projections instead of 2D-video based approaches based on multiview and depth, provides a better coverage of the scene (or object). Thus, 6DOF capabilities are improved. Using several geometries for individual objects improves the coverage of the scene further. Furthermore, standard video encoding hardware can be utilized for real-time compression/decompression of the projected planes. The projection and the reverse projection steps are of low complexity.

[0044] Figs, la and lb show an encoder and decoder for encoding and decoding the 2D texture pictures, geometry pictures and/or auxiliary pictures. A video codec consists of an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. Typically, the encoder discards and/or loses some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate). An example of an encoding process is illustrated in Figure la. Figure la illustrates an image to be encoded (I n ); a predicted representation of an image block (P' n ); a prediction error signal (D n ); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (I' n ); a final reconstructed image (R' n ); a transform (T) and inverse transform (T 1 ); a quantization (Q) and inverse quantization (Q 1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F).

[0045] An example of a decoding process is illustrated in Figure lb. Figure lb illustrates a predicted representation of an image block (P' n ); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (I'“); a final reconstructed image (R' n ); an inverse transform (T 1 ); an inverse quantization (Q 1 ); an entropy decoding (E 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F). [0046] Many hybrid video encoders encode the video information in two phases. Firstly pixel values in a certain picture area (or “block”) are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, i.e. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g. Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate). Video codecs may also provide a transform skip mode, which the encoders may choose to use. In the transform skip mode, the prediction error is coded in a sample domain, for example by deriving a sample-wise difference value relative to certain adjacent samples and coding the sample-wise difference value with an entropy coder.

[0047] Many video encoders partition a picture into blocks along a block grid. For example, in the High Efficiency Video Coding (HEVC) standard, the following partitioning and definitions are used. A coding block may be defined as an NxN block of samples for some value of N such that the division of a coding tree block into coding blocks is a partitioning. A coding tree block (CTB) may be defined as an NxN block of samples for some value of N such that the division of a component into coding tree blocks is a partitioning. A coding tree unit (CTU) may be defined as a coding tree block of luma samples, two corresponding coding tree blocks of chroma samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples. A coding unit (CU) may be defined as a coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples. A CU with the maximum allowed size may be named as LCU (largest coding unit) or coding tree unit (CTU) and the video picture is divided into non-overlapping LCUs.

[0048] In HEVC, a picture can be partitioned in tiles, which are rectangular and contain an integer number of LCUs. In HEVC, the partitioning to tiles forms a regular grid, where heights and widths of tiles differ from each other by one LCU at the maximum. In HEVC, a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit. In HEVC, a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan and contained in a single NAL unit. The division of each picture into slice segments is a partitioning. In HEVC, an independent slice segment is defined to be a slice segment for which the values of the syntax elements of the slice segment header are not inferred from the values for a preceding slice segment, and a dependent slice segment is defined to be a slice segment for which the values of some syntax elements of the slice segment header are inferred from the values for the preceding independent slice segment in decoding order. In HEVC, a slice header is defined to be the slice segment header of the independent slice segment that is a current slice segment or is the independent slice segment that precedes a current dependent slice segment, and a slice segment header is defined to be a part of a coded slice segment containing the data elements pertaining to the first or all coding tree units represented in the slice segment. The CUs are scanned in the raster scan order of LCUs within tiles or within a picture, if tiles are not in use. Within an LCU, the CUs have a specific scan order. [0049] Entropy coding/decoding may be performed in many ways. For example, context-based coding/decoding may be applied, where in both the encoder and the decoder modify the context state of a coding parameter based on previously coded/decoded coding parameters. Context-based coding may for example be context adaptive binary arithmetic coding (CABAC) or context-adaptive variable length coding (CAVLC) or any similar entropy coding. Entropy coding/decoding may alternatively or additionally be performed using a variable length coding scheme, such as Huffman coding/decoding or Exp-Golomb coding/decoding. Decoding of coding parameters from an entropy-coded bitstream or codewords may be referred to as parsing.

[0050] The phrase along the bitstream (e.g. indicating along the bitstream) may be defined to refer to out-of-band transmission, signaling, or storage in a manner that the out- of-band data is associated with the bitstream. The phrase decoding along the bitstream or alike may refer to decoding the referred out-of-band data (which may be obtained from out-of-band transmission, signaling, or storage) that is associated with the bitstream. For example, an indication along the bitstream may refer to metadata in a container file that encapsulates the bitstream.

[0051 ] A first texture picture may be encoded into a bitstream, and the first texture picture may comprise a first projection of texture data of a first source volume of a scene model onto a first projection surface. The scene model may comprise a number of further source volumes.

[0052] In the projection, data on the position of the originating geometry primitive may also be determined, and based on this determination, a geometry picture may be formed. This may happen for example so that depth data is determined for each or some of the texture pixels of the texture picture. Depth data is formed such that the distance from the originating geometry primitive such as a point to the projection surface is determined for the pixels. Such depth data may be represented as a depth picture, and similarly to the texture picture, such geometry picture (such as a depth picture) may be encoded and decoded with a video codec. This first geometry picture may be seen to represent a mapping of the first projection surface to the first source volume, and the decoder may use this information to determine the location of geometry primitives in the model to be reconstructed. In order to determine the position of the first source volume and/or the first projection surface and/or the first projection in the scene model, there may be first geometry information encoded into or along the bitstream. It is noted that encoding a geometry (or depth) picture into or along the bitstream with the texture picture is only optional and arbitrary for example in the cases where the distance of all texture pixels to the projection surface is the same or there is no change in said distance between a plurality of texture pictures. Thus, a geometry (or depth) picture may be encoded into or along the bitstream with the texture picture, for example, only when there is a change in the distance of texture pixels to the projection surface.

[0053] An attribute picture may be defined as a picture that comprises additional information related to an associated texture picture. An attribute picture may for example comprise surface normal, opacity, or reflectance information for a texture picture. A geometry picture may be regarded as one type of an attribute picture, although a geometry picture may be treated as its own picture type, separate from an attribute picture.

[0054] Texture picture(s) and the respective geometry picture(s), if any, and the respective attribute picture(s) may have the same or different chroma format.

[0055] Terms texture (component) image and texture (component) picture may be used interchangeably. Terms geometry (component) image and geometry (component) picture may be used interchangeably. A specific type of a geometry image is a depth image. Embodiments described in relation to a geometry (component) image equally apply to a depth (component) image, and embodiments described in relation to a depth (component) image equally apply to a geometry (component) image. Terms attribute image and attribute picture may be used interchangeably. A geometry picture and/or an attribute picture may be treated as an auxiliary picture in video/image encoding and/or decoding.

[0056] Figures 2a and 2b illustrate an overview of exemplified compression/ decompression processes. The processes may be applied, for example, in MPEG visual volumetric video-based coding (V3C), defined currently in ISO/IEC DIS 23090-5: “Visual Volumetric Video-based Coding and Video-based Point Cloud Compression”, 2nd Edition. [0057] Visual volumetric video, a sequence of visual volumetric frames, if uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data. [0058] V3C specification enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C components, before coding such information. Such representations may include occupancy, geometry, and attribute components. The occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation. The geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g. texture or material information, of such 3D data. An example of volumetric media conversion at an encoder is shown in Figure 2a and an example of a 3D reconstruction at a decoder is shown in Figure 2b.

[0059] Additional information that allows associating all these subcomponents and enables the inverse reconstruction, from a 2D representation back to a 3D representation is also included in a special component, referred to in this document as the atlas. An atlas consists of multiple elements, named as patches. Each patch identifies a region in all available 2D components and contains information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding box associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information.

[0060] Atlases are partitioned into patch packing blocks of equal size. The 2D bounding boxes of patches and their coding order determine the mapping between the blocks of the atlas image and the patch indices. Figure 3 shows an example of block-to-patch mapping with 4 projected patches onto an atlas when asps_patch_precedence_order_flag is equal to 0. Projected points are represented with dark grey. The area that does not contain any projected points is represented with light grey. Patch packing blocks are represented with dashed lines. The number inside each patch packing block represents the patch index of the patch to which it is mapped.

[0061] Axes orientations are specified for internal operations. For instance, the origin of the atlas coordinates is located on the top-left comer of the atlas frame. For the reconstruction step, an intermediate axes definition for a local 3D patch coordinate system is used. The 3D local patch coordinate system is then converted to the final target 3D coordinate system using appropriate transformation steps.

[0062] Figure 4a shows an example of a single patch packed onto an atlas image. This patch is then converted to a local 3D patch coordinate system (U, V, D) defined by the projection plane with origin O’, tangent (U), bi-tangent (V), and normal (D) axes. For an orthographic projection, the projection plane is equal to the sides of an axis-aligned 3D bounding box, as shown in Figure 4b. The location of the bounding box in the 3D model coordinate system, defined by a left-handed system with axes (X, Y, Z), can be obtained by adding offsets TilePatch3dOffsetU, TilePatch3DOffsetV, and TilePatch3DOffsetD, as illustrated in Figure 4c.

[0063] The generic mechanism of V3C may be used by applications targeting volumetric content. One of such applications is MPEG immersive video (MIV; defined in ISO/IEC 23090-12).

[0064] MIV enables volumetric video coding for applications in which a scene is recorded with multiple RGB(D) (red, green, blue, and optionally depth) cameras with overlapping fields of view (FoVs). One example setup is a linear array of cameras pointing towards a scene. This multi-scopic view of the scene allows a 3D reconstruction and therefore 6DoF/3DoF+ consumption.

[0065] MIV uses the patch data unit concept from V3C and extends it by using camera views for reprojection.

[0066] Coded V3C video components are referred to in this document as video bitstreams, while an atlas component is referred to as the atlas bitstream. Video bitstreams and atlas bitstreams may be further split into smaller units, referred to here as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream.

[0067] V3C patch information is contained in atlas bitstream, atlas_sub_bistream(), which contains a sequence of NAL units. A NAL unit is specified to format data and provides header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are contained in NAL units, each of which contains an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet- oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.

[0068] NAL units in atlas bitstream can be divided to atlas coding layer (ACL) and nonatlas coding layer (non-ACL) units. The former dedicated to carry patch data while the latter to carry data necessary to properly parse the ACL units or any additional auxiliary data.

[0069] In the nal_unit_header() syntax nal unit type specifies the type of the RBSP data structure contained in the NAL unit as specified in Table 4 of ISO/IEC 23090-5. nal layer id specifies the identifier of the layer to which an ACL NAL unit belongs or the identifier of a layer to which a non-ACL NAL unit applies. The value of nal layer id shall be in the range of 0 to 62, inclusive. The value of 63 may be specified in the future by ISO/IEC. Decoders conforming to a profile specified in Annex A of ISO/IEC 23090-5 shall ignore (i.e., remove from the bitstream and discard) all NAL units with values of nal layer id not equal to 0.

[0070] Thus, the visual volumetric video-based coding (V3C; ISO/IEC DIS 23090-5) as described above specifies a generic syntax and mechanism for volumetric video coding. The generic syntax can be used by applications targeting volumetric content, such as point clouds, immersive video with depth, and mesh representations of volumetric frames. The purpose of the specification is to define how to decode and interpret the associated data (atlas data in ISO/IEC 23090-5) which tells a Tenderer how to interpret 2D frames for reconstructing volumetric frames.

[0071] In addition to the two applications of V3C (ISO/IEC 23090-5), i.e. V-PCC (ISO/IEC 23090-5) and MIV (ISO/IEC 23090-12), MPEG 3DG (ISO SC29 WG7) group has started a work on a third application, i.e. V3C mesh compression.

[0072] A polygon mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modelling. The faces usually consist of triangles (triangle mesh), quadrilaterals (quads), or other simple convex polygons (n-gons), since this simplifies rendering, but may also be more generally composed of concave polygons, or even polygons with holes. [0073] Objects created with polygon meshes are represented by different types of elements. These include vertices, edges, faces, polygons and surfaces with the following definitions:

- Vertex: A position in 3D space defined as (x,y,z) along with other information such as color (r,g,b), normal vector and texture coordinates.

- Edge: A connection between two vertices.

- Face: A closed set of edges, in which a triangle face has three edges, and a quad face has four edges. A polygon is a coplanar set of faces. In systems that support multi-sided faces, polygons and faces are equivalent. Mathematically a polygonal mesh may be considered an unstructured grid, or undirected graph, with additional properties of geometry, shape and topology.

- Surfaces: or smoothing groups, are useful, but not required to group smooth regions.

- Groups: Some mesh formats contain groups, which define separate elements of the mesh, and are useful for determining separate sub-objects for skeletal animation or separate actors for non-skeletal animation.

- Materials: defined to allow different portions of the mesh to use different shaders when rendered.

- UV coordinates: Most mesh formats also support some form of UV coordinates which are a separate 2D representation of the mesh "unfolded" to show what portion of a 2-dimensional texture map to apply to different polygons of the mesh. It is also possible for meshes to contain other such vertex attribute information such as color, tangent vectors, weight maps to control animation, etc. (sometimes also called channels).

[0074] Figures 5a and 5b show the extensions to the V-PCC encoder and decoder to support mesh encoding and mesh decoding, respectively, as proposed in MPEG M47608. [0075] In the encoder extension, the input mesh data is demultiplexed into vertex coordinate+attributes data and vertex connectivity data. The vertex coordinate+attributes data is coded using MPEG-I V-PCC, whereas the vertex connectivity data is coded as auxiliary data. Both of said data are multiplexed to create the final compressed output bitstream. Vertex ordering is carried out on the reconstructed vertex coordinates at the output of MPEG-I V-PCC to reorder the vertices for optimal vertex connectivity encoding. [0076] In the decoder, the input bitstream is demultiplexed to generate the compressed bitstreams for vertex coordinates+attributes data and vertex connectivity data. The vertex coordinates+attributes data is decompressed using MPEG-I V-PCC decoder. Vertex ordering is carried out on the reconstructed vertex coordinates at the output of MPEG-I V- PCC decoder to match the vertex order at the encoder. The vertex connectivity data is also decompressed, and everything is multiplexed to generate the reconstructed mesh.

[0077] Reconstructing a surface from a set of points has been extensively studied during recent years. Most of the algorithms for surface reconstruction are quite expensive in terms of required processing and not suitable for real-time frame-by-frame reconstruction. The following three algorithms are nowadays dominating surface reconstruction:

Alpha shapes generalize convex hulls. From an initial volume containing all vertices in the 3D space, sub-volumes are iteratively removed. Intuitively, the size of the carved out sub-volumes can be adjusted by a parameter.

Ball pivoting rolls a ball over the point’s surface. As soon as it hits 3 points, it creates a triangle from these. The ball’s radius is a vital parameter for this algorithm, as a too small ball ‘falls through’ the surface, while a too big ball conceals details from the point set.

Poisson surface reconstruction solves a regularized optimization problem to generate a smooth surface. The surface’s amount of detail (level of smoothing) can be adjusted with a parameter.

[0078] Hole filling algorithms tackle the simpler problem of connecting holes in already partially connected meshes. While there is still disagreement about the most optimal solutions, the majority of the proposed algorithms requires parametrization depending on the content type and vertex density, e.g. by defining the number of iterations for a refinement algorithm.

[0079] The current development in MPEG 3DG for integration of MESH compression into the V3C family of standards typically relies on mesh reconstruction at the decoder, which allows reducing the amount of information required for compressing traditional mesh information. The input meshes may contain vertices with varying spatial density over different regions of the input model, which means that applying a surface reconstruction globally to the model with the same parameters might be sub-optimal. For example, input sequences containing human figures typically demonstrate the differences in the vertex density such that vertex density around facial region is significantly higher than on other parts of the model.

[0080] Reconstruction of meshes at the decoder side seems to work well within patches as the mesh is constructed based on the information that the pixels in the geometry patch are expected to result in a solid mesh. However, connecting vertices from two or more patches to generate a surface for the mesh remains to be problematic. The problem of connecting vertices from two or more patches is illustrated in an example of Figures 6a and 6b, where Figure 6a illustrates gaps and Figure 6b illustrates patch edges on the mesh after initial reconstruction.

[0081 ] There may be several origins for the problem of connecting vertices from two or more patches. Firstly, as a consequence of quantization of 3D information, a shift of vertex positions is expected. Furthermore, video coding introduces additional artefacts when mesh information is compressed with a video codec. When re-projected back into 3D, the vertex positions for the same vertex between two patches no longer match, and as a result, gaps in the model start to appear.

[0082] Accordingly, the lack of region-based configuration data for surface construction algorithms causes degraded quality of reconstruction and performance at the decoder.

[0083] In the following, an enhanced method for spatially optimized mesh reconstruction will be described in more detail, in accordance with various embodiments. [0084] The method, which is disclosed in Figure 7, comprises receiving (700) a volumetric media frame comprising three-dimensional (3D) data content; encoding (702) the 3D data content into 3D spatial regions using a mesh compression algorithm; identifying (704) patch boundaries and vertex density among the mesh-compressed 3D spatial regions; determining (706) at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and signaling (708) said at least one surface reconstruction parameter and the mesh-compressed 3D spatial regions in at least one bitstream. [0085] Thus, by providing mechanisms for determining region-based configuration data for surface reconstruction algorithms at the encoder side and storing them to be carried in or along V3C bitstream, the quality of reconstruction and performance at the decoder can be improved. More complex calculations and analysis may be performed at the encoder side, thus avoiding expensive computation during decoding. The encoder side analysis may include, for example, identification of different vertex densities in different regions of the model, which are typical for at least current development trends of MPEG-I mesh compression.

[0086] According to an embodiment, the method comprises performing a test-rendering using a surface reconstruction algorithm, said at least one surface reconstruction parameter and information about the mesh-compressed 3D spatial regions; and re-determining, based on the test-rendering, said at least one surface reconstruction parameter.

[0087] Hence, the encoder-side reconstruction analysis can compare the reconstruction result to the ground truth and thus further optimize reconstruction parameters.

[0088] According to an embodiment, the method comprises repeating the test-rendering using the re-determined at least one surface reconstruction parameter until a predetermined threshold value descriptive of image quality is reached. Consequently, such in-loop testrendering may be carried out in the encoder to ensure a sufficient image quality at the reconstruction and to shirt the heavy processing to be carried in the encoder.

[0089] The ball-pivoting algorithm mentioned above is one practical example, which benefits from signaling according to the embodiments, where at least a ball radius or multiple ball-radii depending on the spatial region of the model are signalled. The shortcomings of a constant ball-radius are illustrated in an example of Figures 8a and 8b, where Figure 8a illustrates the reconstructed mesh without ball-pivoting and Figure 8b illustrates the reconstructed mesh after gap filling by ball-pivoting with constant ballradius. As can been seen in Figure 8b, the selected constant ball-radius performs well in filling the gaps on the shoulder and in some parts on top of the head of the model of the human figure. However, the facial region still shows significant gaps, which is due to higher vertex density typically applied for the facial region of a human figure.

[0090] Thus, smaller ball-radii could be used for areas of the model in which vertex density is higher and larger radii would be used for areas where the density is lower. Depending on the type of surface reconstruction algorithm, similar parameter optimization can be considered. For this purpose, the type of mesh reconstruction algorithm may be signalled, as well as providing flexible means for signaling the parameters for spatial regions of the model. Parameters for ball-pivoting algorithm are mentioned herein merely as an example of the surface reconstruction parameters and parameters for other surface reconstruction algorithms could be provided in a similar manner.

[0091] An apparatus suitable for implementing the method comprises means for receiving a volumetric media frame comprising three-dimensional (3D) data content; means for encoding the 3D data content into 3D spatial regions using a mesh compression algorithm; means for identifying patch boundaries and vertex density among the mesh- compressed 3D spatial regions; means for determining at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and means for signaling said at least one surface reconstruction parameter and the mesh- compressed 3D spatial regions in at least one bitstream.

[0092] Figure 9 shows an exemplified block chart of such an apparatus illustrating encoder operations related to spatially optimized surface reconstruction. In the example of Figure 9, the apparatus receives one or more volumetric frames (900) describing a 3D object or a scene. A mesh encoder (902), such as a V3C-compatible mesh encoder, is used to compress the 3D information in a mesh format. Spatial mesh analysis (904) is performed to identify patch boundaries and vertex density in the compressed scene. As a result of the analysis, one or more parameters for surface reconstruction (906) is defined. As described above, an optional iterative step of performing a test rendering may be carried out on the generated mesh along with the identified surface reconstruction parameters (908). Test rendering may be used to refine surface reconstruction parameters until acceptable quality is reached with the in-loop test rendering. The resulting V3C encoded mesh and the related surface reconstruction parameters are encapsulated in one or more V3C bitstreams (910). [0093] According to an embodiment, the apparatus comprises means for performing a test-rendering using a surface reconstruction algorithm, said at least one surface reconstruction parameter and information about the mesh-compressed 3D spatial regions; and means for re-determining, based on the test-rendering, said at least one surface reconstruction parameter. [0094] According to an embodiment, the apparatus comprises means for signaling, in or along the at least one bitstream, an indication about one or spatial regions where the surface reconstruction is to be applied. Thus, this allows the decoder to prioritize the surface reconstruction on regions of the model, where either gaps exist or where they are most visible.

[0095] Parameters for spatially optimized mesh reconstruction may be carried on multiple levels of the V3C bitstream and a new signalling can be provided accordingly. [0096] According to an embodiment, the signaling of the parameters for spatially optimized mesh reconstruction is configured to be carried out by at least one syntax element included in a V3C parameter set extension. At least one syntax element may be used, for example, to signal information about the used surface reconstruction algorithm. The type of surface reconstruction may be signaled as a look-up table enumerating typical surface reconstruction algorithms, wherein the look-up table enables an easy introduction of possible new algorithms. Additionally, the number of parameters, type of parameters and size of parameters for each parameter set may be signaled as one or more syntax elements in the parameter set extension.

[0097] According to an embodiment, the signaling of spatial regions and sub-regions comprises offsets and sizes of the spatial regions. Also, explicit identifications and nesting structures of the regions may be signaled. The offsets and sizes may be delta coded to save space in the bitstream.

[0098] According to an embodiment, the surface reconstruction parameters include an identification of at least one relevant spatial region. Additionally, a number of parameters for the given (sub-)region may be signaled, if the V3C parameter set extension does not provide said information.

[0099] According to an embodiment, the spatial regions and the surface reconstruction parameters may be signaled individually.

[0100] According to an embodiment, the surface reconstruction parameters and the spatial regions are signaled in one of the following: an atlas sequence parameter set extension, an atlas frame parameter set extension, one or more SEI messages.

[0101] Among the various options for carrying out the signalling of the parameters for spatially optimized mesh reconstruction, the following embodiments illustrate examples of encapsulating the signalling into one or more V3C bitstreams as VPS (V3C Parameter set) level signaling. In the examples, different types of parameter sets are defined, which may be assigned to specific regions of the model. VPS level signaling provides the advantage of being fairly static, wherein the parameters are valid for the entire duration of the sequence. [0102] According to an embodiment, the surface reconstruction parameters and spatial regions are stored in different syntax structures which are linked to each other by indexing. The indexing may be explicit or implicit depending on the presence of an indicator or flag. The benefit of having two separate syntax structures is that same parameter sets may be reused by multiple spatial regions and vice versa. This may reduce the number of bits required to signal parameter sets for the entire model.

[0103] Table 1 shows an example of attributes to be included in the spatial region syntax structure.

Table 1. An example of spatial region syntax structure [0104] In the above syntax, srr explicit spatial ids present flag equal to 1 specifies that each spatial region has an explicit index assigned to them. srr_explicit_spatial_ids_present_flag equal to 0 specifies that region ids are assigned implicitly as the order of spatial regions in the list, i.e. srr spatial region id = i. Implicit signalling reduces amount of bits to be signalled, but provides less flexibility for updating information. When not present, the value of srr_explicit_spatial_ids_present_flag is inferred to be equal to 0.

[0105] srr hierarchical structure present flag equal to 1 specifies that nesting of spatial regions is enabled. Nesting allows to generate hierarchical tree-structures that allow signalling of parameter sets based on parent-child paradigm. All parameter sets signalled for a parent are also valid for its child nodes. Additional parameter sets may be signalled per child node to improve reconstruction quality. Enabling srr_hierarchical_structure_present_flag requires srr_explicit_spatial_ids_present_flag to be enabled as well. When not present, the value of srr_explicit_spatial_ids_present_flag is inferred to be equal to 0.

[0106] srr num spatial regions indicates the number of spatial regions in the surface_reconstruction_regions() syntax structure.

[0107] srr_offset[ i ][ j ] indicates the x, y and z offsets of the i-th region in surface_reconstruction_regions().

[0108] srr_size[ i ][ j ] indicates the dimensions of x-, y- and z-components of the i-th region in surface_reconstruction_regions().

[0109] srr_spatial_region_id[ i ] indicates the index of the i-th region in the surface_reconstruction_regions() syntax structure. Shall only be present when srr_explicit_spatial_ids_present_flag is equal to 1. srr spatial region id shall be a unique index greater than 0.

[0110] srr_parent_id[ i ] indicates the parent index of the i-th region in the surface_reconstruction_regions() syntax structure. Shall only be present when srr_hierarchical_structure_present_flag is equal to 1. For region with no parents the srr_parent_id shall be 0.

[0111] According to an embodiment, the surface reconstruction offsets and dimensions are delta coded. I.e., only difference to previous region in the list is encoded. This may involve defining offsets and dimensions as signed integers instead of unsigned integers. [0112] Table 2 shows an example of attributes to be included in the surface reconstruction parameters syntax structure.

Table 2. An example of surface reconstruction parameters syntax structure

[0113] In Table 2, srp_num_parameter_sets indicates the number of parameter sets in the surface_reconstruction_parameters() syntax structure.

[0114] srp_parameter_set_id[ i ] indicates the parameter set id of the i-th parameter set in surface_reconstruction_parameters() syntax structure.

[0115] srp_spatial_region_id[ i ] indicates the spatial region id in surface reconstruction regions syntax structure of the i-th parameter set in surface_reconstruction_parameters() syntax structure.

[01 16] srp_parameter[ i ][ j ] contains the j-th parameter of i-th parameter set in surface_reconstruction_parameters() syntax structure. The type of the parameter is defined in vps_surface_reconstruction_extension() as vps_sr_parameter_type[ srp_parameter_set_id[ i ] ][ j ]. Size is defined by vps_sr_parameter_size[ srp_parameter_set_id[ i ] ][ j ] respectively.

[0117] According to an embodiment, the surface reconstruction parameters and regions are stored in a single list where spatial regions contain parameter sets or where parameter sets contain spatial regions. Herein, the indexing is implicit.

[0118] According to an embodiment, existing signalling in ISO/IEC 23090-5 related to scene object information SEI message may be re-used, in which case spatial regions may be signalled using soi_3d_bounding_box_present_flag and surface_reconstruction_parameters() is added in the scene object information(payloadSize) SEI message. [0119] According to an embodiment, the V3C parameter set may be extended to signal surface reconstruction parameters. An example of the addition in the syntax is described below in Table 3.

Table 3. An example of V3C parameter set extension

[0120] In the above syntax, vps surface reconstruction extension present flag equal to 1 specifies that the vps_surface_reconstruction_extension( ) syntax structure is present in the v3c_parameter_set( ) syntax structure. vps_surface_reconstruction_extension_present flag equal to 0 specifies that this syntax structure is not present. When not present, the value of vps_surface_reconstruction_extension_present flag is inferred to be equal to 0.

[0121] Table 4 shows an example of vps_surface_reconstruction_extension( ) syntax structure .

Table 4. An example of VPS surface reconstruction extension

[0122] In the above syntax, vps surface reconstruction algorithm identifies the surface reconstruction algorithm which should be used with the signalled parameters. The values of vps surface reconstruction algorithm may be mapped to surface reconstruction algorithms according to Table 5.

Table 5. An example of surface reconstruction algorithms

[0123] vps_surface_reconstruction_parameter_set_count defines the number of surface reconstruction parameter sets for the content. Different spatial regions can have different number of surface reconstruction parameters or different types of parameters. Thus, multiple parameter sets may be used.

[0124] vps_sr_parameter_set_id[ i ] defines i as the unique index for the i-th parameter set.

[0125] vps_sr_parameter_count[ i ] defines the number of parameters for the i-th parameter set.

[0126] vps_sr_parameter_type[ i ][ j ] defines the type of the j -th surface reconstruction parameter in the i-th parameter set. The type of the surface reconstruction parameters are defined in Table 6.

Table 6. An example of surface reconstruction parameter types

[0127] vps_sr_parameter_size[ i ] defines the size in bits of the j -th surface reconstruction parameter in the i-th parameter set.

[0128] It is noted that various flags indicating the presence of surface reconstruction parameters at different levels of V3C bitstream, like ASOS or AFPS, are to be considered. For example, there could be a flag indicating that the same reconstruction parameters should be applied for the whole model. The priority of such flags should be hierarchical, for example, in a hierarchy of VPS > ASPS > AFPS, etc.

[0129] vps surface reconstruction parameters present flag equal to 1 specifies that the surface_reconstruction( ) syntax structure is present in the vps_surface_reconstruction_extension( ) syntax structure. vps_surface_reconstruction_parameters_present flag equal to 0 specifies that this syntax structure is not present. When not present, the value of vps_surface_reconstruction_parameters_present flag is inferred to be equal to 0. This flag allows signalling surface reconstruction parameters for the entire content or on other levels of the bitstream, such as ASPS or AFPS.

[0130] Table 7 shows an example of attributes contained in the surface reconstruction syntax structure. Table 7. An example of surface reconstruction syntax structure

[0131] In the above syntax, sr surface reconstruction regions present flag equal to 1 specifies that the surface_reconstruction_regions( ) syntax structure is present in the surface reconstruction ( ) syntax structure. sr_surface_reconstruction_regions_present flag equal to 0 specifies that this syntax structure is not present. When not present, the value of sr_surface_reconstruction_regions_present flag is inferred to be equal to 0.

[0132] sr surface reconstruction parameters present flag equal to 1 specifies that the surface_reconstruction_parameters( ) syntax structure is present in the surface_reconstruction( ) syntax structure. sr_surface_reconstruction_parameters_present flag equal to 0 specifies that this syntax structure is not present. When not present, the value of sr_surface_reconstruction_parameters_present flag is inferred to be equal to 0.

[0133] The above embodiments illustrate the signalling of parameter sets as VPS level signalling. While signalling of the different parameter sets on the VPS level is reasonable in terms of minimizing the signalling overhead, the values may change more frequently. In such occasions, more flexible signalling may be considered as lower levels of the V3C bitstream.

[0134] According to an embodiment, the surface_reconstruction( ) syntax structure may be signalled in an extension to atlas_sequence_parameter_set_rbsp( ) syntax structure in ISO/IEC 23090-5. This level of signalling would enable updating parameters for a sequence of information.

[0135] According to an embodiment, surface_reconstruction( ) syntax structure may be signalled in an extension to atlas_frame_parameter_set_rbsp( ) syntax structure in ISO/IEC 23090-5. This level of signalling would enable updating parameters on a frame-by-frame basis. For this type of signalling additional flags indicating updated regions and parameters may be considered.

[0136] According to an embodiment, surface_reconstruction( ) syntax structure may be defined as a new SEI message surface_reconstruction( payloadSize ) with a specific payloadType and payloadSize. In this case it may be considered to add a similar SEI message for the parameter sets to contain information from vps_surface_reconstruction_extension( ).

[0137] Another aspect relates to the operation of a decoder. Figure 10 shows an example of a decoding method comprising receiving (1000) a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh- compressed 3D spatial regions; receiving (1002), either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; generating (1004), a mesh representation from the 3D volumetric representation data; applying (1006) the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and rendering (1008) the mesh with the reconstructed surfaces. [0138] Thus, the decoder receives the V3C bitstream comprising the encoded 3D volumetric representation data, which includes mesh-compressed 3D spatial regions, as well as the signaling elements indicating the surface reconstruction algorithm applied by the encoder and one or more surface reconstruction parameters. The decoder generates a mesh representation from the 3D volumetric representation data, wherein the mesh may have gaps on a plurality of spatial region, originating for example over patch boundaries. For filling the gaps, the decoder applies the signaled surface reconstruction algorithm according to the signaled surface reconstruction parameters on the spatial regions of interest, thereby creating one or more reconstructed surfaces for said spatial regions. Finally, the mesh can be rendered by providing the spatial regions of interest with the reconstructed surfaces.

[0139] Figure 11 shows an exemplified block chart of such an apparatus illustrating the decoder operations related to spatially optimized surface reconstruction. The receiver acquires V3C bitstream (1100), which contains encoded 3D volumetric representation data. The receiver then acquires, from or along said V3C bitstream, the surface reconstruction parameters (1102). A V3C bitstream parser (1104) extracts the mesh information (1106) from encoded 3D volumetric representation data and generates per patch meshes (1108). The surface reconstruction algorithm (1110) is then executed using to the signaled surface reconstruction parameters to reconstruct surfaces at patch boundaries. Finally, the mesh is rendered (1112) by providing the patch boundaries with the reconstructed surfaces.

[0140] The embodiments relating to the encoding aspects may be implemented in an apparatus comprising: means for receiving a volumetric media frame comprising three- dimensional (3D) data content; means for encoding the 3D data content into 3D spatial regions using a mesh compression algorithm; means for identifying patch boundaries and vertex density among the mesh-compressed 3D spatial regions; means for determining at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and means for signaling said at least one surface reconstruction parameter and the mesh-compressed 3D spatial regions in at least one bitstream.

[0141] The embodiments relating to the encoding aspects may likewise be implemented in an apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: receive a volumetric media frame comprising three-dimensional (3D) data content; encode the 3D data content into 3D spatial regions using a mesh compression algorithm; identify patch boundaries and vertex density among the mesh-compressed 3D spatial regions; determine at least one surface reconstruction parameter per 3D spatial region based on the patch boundaries and the vertex density; and signal said at least one surface reconstruction parameter and the mesh-compressed 3D spatial regions in at least one bitstream.

[0142] The embodiments relating to the decoding aspects may be implemented in an apparatus comprising means for receiving a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh-compressed 3D spatial regions; means for receiving, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; means for generating, a mesh representation from the 3D volumetric representation data; means for applying the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and means for rendering the mesh with the reconstructed surfaces.

[0143] The embodiments relating to the decoding aspects may likewise be implemented in an apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: receive a bitstream in a decoder, said bitstream comprising encoded 3D volumetric representation data comprising mesh-compressed 3D spatial regions; receive, either in said bitstream or in a further bitstream, one or more signaling elements indicating an applied surface reconstruction algorithm and one or more surface reconstruction parameters; generate a mesh representation from the 3D volumetric representation data; apply the signaled surface reconstruction algorithm according to the one or more surface reconstruction parameters on one or more spatial regions of the mesh so as to create one or more reconstructed surfaces; and render the mesh with the reconstructed surfaces.

[0144] Such apparatuses may comprise e.g. the functional units disclosed in any of the Figures la, lb, 2a and 2b for implementing the embodiments.

[0145] In the above, some embodiments have been described with reference to encoding. It needs to be understood that said encoding may comprise one or more of the following: encoding source image data into a bitstream, encapsulating the encoded bitstream in a container file and/or in packet(s) or stream(s) of a communication protocol, and announcing or describing the bitstream in a content description, such as the Media Presentation Description (MPD) of ISO/IEC 23009-1 (known as MPEG-DASH) or the IETF Session Description Protocol (SDP). Similarly, some embodiments have been described with reference to decoding. It needs to be understood that said decoding may comprise one or more of the following: decoding image data from a bitstream, decapsulating the bitstream from a container file and/or from packet(s) or stream(s) of a communication protocol, and parsing a content description of the bitstream, [0146] In the above, where the example embodiments have been described with reference to an encoder or an encoding method, it needs to be understood that the resulting bitstream and the decoder or the decoding method may have corresponding elements in them. Likewise, where the example embodiments have been described with reference to a decoder, it needs to be understood that the encoder may have structure and/or computer program for generating the bitstream to be decoded by the decoder.

[0147] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0148] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

[0149] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

[0150] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended examples. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.