HIERARCHICAL TREE ATTRIBUTE CODING IN POINT CLOUD CODING

Title:

HIERARCHICAL TREE ATTRIBUTE CODING IN POINT CLOUD CODING

Document Type and Number:

WIPO Patent Application WO/2020/072665

Kind Code:

Abstract:

A media coding mechanism is disclosed. The mechanism includes storing an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points. An octree is applied to the points to encode a geometry bitstream describing positions of the points. A hierarchical tree is applied to the points to encode an attribute bitstream describing attribute values of the points. A point cloud coding (PCC) bitstream containing the geometry bitstream and the attribute bitstream is transmitted to support reconstructing the media sequence.

Inventors:

KATHARIYA BIRENDRA (US)
ZAKHARCHENKO VLADYSLAV (US)
CHEN JIANLE (US)

Application Number:

PCT/US2019/054321

Publication Date:

April 09, 2020

Filing Date:

October 02, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

FUTUREWEI TECHNOLOGIES INC (US)

International Classes:

G06T9/40; G06T15/20; H04N19/103

Foreign References:

US20120176381A1	2012-07-12
US20060031915A1	2006-02-09
US20140184430A1	2014-07-03

Attorney, Agent or Firm:

DIETRICH, William H. et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. A method implemented in an encoder, the method comprising:

storing, in a memory of the encoder, an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points;

applying, by a processor of the encoder, an octree to the points to encode a geometry bitstream describing positions of the points;

applying, by the processor, a hierarchical tree to the points to encode an attribute bitstream describing attribute values of the points; and

transmitting, by a transmitter of the encoder, a point cloud coding (PCC) bitstream containing the geometry bitstream and the attribute bitstream to support reconstructing the media sequence.

2. The method of claim 1, wherein the hierarchical tree is a binary tree.

3. The method of any of claims 1-2, wherein applying the hierarchical tree to the points comprises:

assigning, by the processor, the points to a parent node in a first layer of a plurality of layer of details (LODs);

recursively dividing, by the processor, the parent node into child nodes in the plurality of LODs based on the positions of the points;

encoding, by the processor, an attribute value for the parent node in the first layer of the LOD in the attribute bitstream; and

encoding, by the processor, attribute values of child nodes as residuals based on a predictor from a preceding LOD, wherein the residuals are encoded in the attribute bitstream.

4. The method of any of claims 1-3, wherein applying the hierarchical tree to the points further comprises selecting a centroid for each current node, and wherein recursively dividing the parent node into child nodes further comprises dividing each current node by a plane traversing the centroid for the current node.

5. The method of any of claims 1-4, wherein attribute values of each preceding LOD are selected as attribute values at a corresponding centroid.

6. The method of any of claims 1-5, wherein a depth of the hierarchical tree is determined according to:

where d is the depth of a binary tree, round is a rounding function, log is a logarithm function, and N is a number of the points.

7. The method of any of claims 1-6, wherein the attribute values of the parent node and the child nodes are encoded in refinement layers, and wherein the refinement layers are determined according to:

R(j) = I_j - LOD (j - l)

where R are the refinement layers, j specifies a refinement layer R(j), I_j specifies a current layer of the binary tree, and LOD(j-l) specifies the preceding LOD.

8. The method of any of claims 1-7, further comprising encoding a use binary tree for LOD (useinTreeLoD) variable in the attribute bitstream to indicate the binary tree is used to determine the LODs and attributes values.

9. The method of claim 1, wherein the hierarchical tree is an octree.

10. The method of claim 1, wherein the hierarchical tree is a hybrid binary tree.

11. A method implemented in a decoder, the method comprising:

receiving, by a receiver of the decoder, a point cloud coding (PCC) bitstream containing a geometry bitstream and an attribute bitstream that include an encoding of an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points;

applying, by a processor of the decoder, an octree to the geometry bitstream to decode positions of the points in the image;

applying, by the processor, a hierarchical tree to the points based on the positions of the points to decode attribute values of the points; and reconstructing, by the processor, the sparse point cloud to reconstruct the image as part of the media sequence, wherein the sparse point cloud is reconstructed based on the positions of the points and the attribute values of the points.

12. The method of claim 11, wherein the hierarchical tree is a binary tree.

13. The method of any of claims 11-12, wherein applying the hierarchical tree to the points comprises:

assigning, by the processor, the points to a parent node in a first layer of a plurality of layer of details (LODs);

recursively dividing, by the processor, the parent node into child nodes in the plurality of LODs based on the positions of the points;

setting, by the processor, an attribute value for the parent node as an attribute value for a first layer of the LODs; and

determining, by the processor, attribute values for the child nodes based on residuals for the child nodes and a predictor from a preceding LOD.

14. The method of any of claims 11-13, wherein applying the hierarchical tree to the points further comprises selecting a centroid for each current node, and wherein recursively dividing the parent node into child nodes further comprises dividing each current node by a plane traversing the centroid for the current node.

15. The method of any of claims 11-14, wherein attribute values of each preceding LOD describe attribute values at a corresponding centroid.

16. The method of any of claims 11-15, wherein a depth of the hierarchical tree is determined according to:

where d is the depth of a binary tree, round is a rounding function, log is a logarithm function, and N is a number of the points.

17. The method of any of claims 11-16, wherein the attribute values of the parent node and the child nodes are stored in the attribute bitstream in refinement layers, and wherein the refinement layers are determined according to:

R(j = I_j - LOD 0 - 1)

where R are the refinement layers, j specifies a refinement layer R(j), I_j specifies a current layer of the binary tree, and LOD(j-l) specifies the preceding LOD.

18. The method of any of claims 11-17, further comprising obtaining a use binary tree for LOD (useinTreeLoD) variable in the attribute bitstream to determine the binary tree is used to determine the LODs and attributes values.

19. The method of claim 11, wherein the hierarchical tree is an octree.

20. The method of claim 11, wherein the hierarchical tree is a hybrid binary tree.

21. A media coding device comprising:

a processor, a receiver coupled to the processor, and a transmitter coupled to the processor, the processor, receiver, and transmitter configured to perform the method of any of claims 1-20.

22. A non-transitory computer readable medium comprising a computer program product for use by a media coding device, the computer program product comprising computer executable instructions stored on the non-transitory computer readable medium such that when executed by a processor cause the media coding device to perform the method of any of claims 1-20.

23. An encoder comprising:

a storing means for storing an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points;

a geometry bitstream means for applying an octree to the points to encode a geometry bitstream describing positions of the points;

an attribute bitstream means for applying a hierarchical tree to the points to encode an attribute bitstream describing attribute values of the points; and a transmitting means for transmitting a point cloud coding (PCC) bitstream containing the geometry bitstream and the attribute bitstream to support reconstructing the media sequence.

24. The encoder of claim 23, wherein the encoder is further configured to perform the method of any of claims 1-10.

25. An decoder comprising:

a receiving means for receiving a point cloud coding (PCC) bitstream containing a geometry bitstream and an attribute bitstream that include an encoding of an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points; a geometry bitstream means for applying an octree to the geometry bitstream to decode positions of the points in the image;

an attribute bitstream means for applying a hierarchical tree to the points based on the positions of the points to decode attribute values of the points; and

a reconstruction means for reconstructing the sparse point cloud to reconstruct the image as part of the media sequence, wherein the sparse point cloud is reconstructed based on the positions of the points and the attribute values of the points.

26. The decoder of claim 25, wherein the decoder is further configured to perform the method of any of claims 11-20.

Description:

Hierarchical Tree Attribute Coding In Point Cloud Coding

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This patent application claims the benefit of U.S. Provisional Patent Application No. 62/740,233, filed October 2, 2018 by Birendra Kathariya, et. al., and titled“Point Cloud Attributes Coding Using Binary Tree,” which is hereby incorporated by reference.

TECHNICAL FIELD

[0002] The present disclosure is generally related to media coding, and is specifically related to coding attribute values of points in a sparse point cloud.

BACKGROUND

[0003] The amount of data needed to depict even a relatively short media presentation can be substantial, which may result in difficulties when the data is to be streamed or otherwise communicated across a communications network with limited bandwidth capacity. Thus, media data is generally compressed before being communicated across modern day telecommunications networks. The size of a media presentation could also be an issue when the media is stored on a storage device because memory resources may be limited. Media compression devices often use software and/or hardware at the source to code the media data prior to transmission or storage, thereby decreasing the quantity of data needed to represent frames of the media. The compressed data is then received at the destination by a media decompression device that decodes the media data. With limited network resources and ever increasing demands of higher media quality, improved compression and decompression techniques that improve compression ratio with little to no sacrifice in image quality are desirable.

SUMMARY

[0004] In an embodiment, the disclosure includes a method implemented in an encoder, the method comprising: storing, in a memory of the encoder, an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points; applying, by a processor of the encoder, an octree to the points to encode a geometry bitstream describing positions of the points; applying, by the processor, a hierarchical tree to the points to encode an attribute bitstream describing attribute values of the points; and transmitting, by a transmitter of the encoder, a point cloud coding (PCC) bitstream containing the geometry bitstream and the attribute bitstream to support reconstructing the media sequence. PCC is used to code images including clouds of points. The points each have a geometry, which is the position of the point in three dimensional (3D) space. The points also have one or more attributes, such as a color and/or reflectance (e.g., color and light). Some systems employ a distance based algorithm to code the attributes. For example, the distance based algorithm may select a random set of root points. The algorithm may then classify non-root points into layers of detail (LODs) based on the positions of the non-root points relative to the root points and other non-root points. The distance based algorithm iterates through the points by recursively searching through the cloud for other points that are above and below a threshold distance from the current point. Accordingly the distance based algorithm is highly processor intensive. The present disclosure includes a mechanism for assigning points into LODs based on position by applying a hierarchical tree, such as a binary tree. This disclosed mechanism increases coding speed by a factor of four times to five times (e.g., coding time is twenty five percent to twenty percent of the coding time of the distance based algorithm. Further, the disclosed mechanism increases compression by about fifteen percent. As such, the disclose attribute coding mechanism substantially increases the processing speed and reduces memory and processing resource usage at both an encoder and a corresponding decoder.

[0005] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the hierarchical tree is a binary tree.

[0006] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein applying the hierarchical tree to the points comprises: assigning, by the processor, the points to a parent node in a first layer of a plurality of layer of details (LODs); recursively dividing, by the processor, the parent node into child nodes in the plurality of LODs based on the positions of the points; encoding, by the processor, an attribute value for the parent node in the first layer of the LOD in the attribute bitstream; and encoding, by the processor, attribute values of child nodes as residuals based on a predictor from a preceding LOD, wherein the residuals are encoded in the attribute bitstream.

[0007] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein applying the hierarchical tree to the points further comprises selecting a centroid for each current node, and wherein recursively dividing the parent node into child nodes further comprises dividing each current node by a plane traversing the centroid for the current node. In this example, the disclosed attribute coding mechanism operates by placing the points into a node and selecting a point as a center point, denoted as a centroid. The node is divided into child nodes by one or more planes passing through the centroid. The centroid is selected as a predictor and coded. This process is recursive so that each centroid is coded as a residual between an attribute value and the attribute value of the centroid from the previous LOD. This continues until the nodes can no longer be split and the remaining points are coded based on the previous centroids. In this way, the centroid for each current LOD is used as a predictor for lower LODs.

[0008] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein attribute values of each preceding LOD are selected as attribute values at a corresponding centroid.

[0009] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein a depth of the hierarchical tree is determined according to: d = round

where d is the depth of a binary tree, round is a rounding function, log is a logarithm function, and N is a number of the points.

[0010] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the attribute values of the parent node and the child nodes are encoded in refinement layers, and wherein the refinement layers are determined according to:

R(j) = I _j - L0D(j - 1)

where R are the refinement layers, j specifies a refinement layer R(j), I _j specifies a current layer of the binary tree, and LOD(j-l) specifies the preceding LOD.

[0011] Optionally, in any of the preceding aspects, another implementation of the aspect provides, further comprising encoding a use binary tree for LOD (useinTreeLoD) variable in the attribute bitstream to indicate the binary tree is used to determine the LODs and attributes values.

[0012] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the hierarchical tree is an octree.

[0013] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the hierarchical tree is a hybrid binary tree. In some examples, octrees and hybrid binary trees can be used to code attributes in a similar manner to a binary tree by altering the relevant algorithms. An octree splits nodes into eight sub-nodes instead of two by splitting a node with three planes. A hybrid binary tree alternates selecting a predictor at every other node split to create fewer LODs than a binary tree. [0014] In an embodiment, the disclosure includes a method implemented in a decoder. The method comprises: receiving, by a receiver of the decoder, a PCC bitstream containing a geometry bitstream and an attribute bitstream that include an encoding of an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points; applying, by a processor of the decoder, an octree to the geometry bitstream to decode positions of the points in the image; applying, by the processor, a hierarchical tree to the points based on the positions of the points to decode attribute values of the points; and reconstructing, by the processor, the sparse point cloud to reconstruct the image as part of the media sequence, wherein the sparse point cloud is reconstructed based on the positions of the points and the attribute values of the points. PCC is used to code images including clouds of points. The points each have a geometry, which is the position of the point in 3D space. The points also have one or more attributes, such as a color and/or reflectance (e.g., color and light). Some systems employ a distance based algorithm to code the attributes. For example, the distance based algorithm may select a random set of root points. The algorithm may then classify non root points into LODs based on the positions of the non-root points relative to the root points and other non-root points. The distance based algorithm iterates through the points by recursively searching through the cloud for other points that are above and below a threshold distance from the current point. Accordingly the distance based algorithm is highly processor intensive. The present disclosure includes a mechanism for assigning points into LODs based on position by applying a hierarchical tree, such as a binary tree. This disclosed mechanism increases coding speed by a factor of four times to five times (e.g., coding time is twenty five percent to twenty percent of the coding time of the distance based algorithm. Further, the disclosed mechanism increases compression by about fifteen percent. As such, the disclose attribute coding mechanism substantially increases the processing speed and reduces memory and processing resource usage at both an encoder and a corresponding decoder.

[0015] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the hierarchical tree is a binary tree.

[0016] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein applying the hierarchical tree to the points comprises: assigning, by the processor, the points to a parent node in a first layer of a plurality of LODs; recursively dividing, by the processor, the parent node into child nodes in the plurality of LODs based on the positions of the points; setting, by the processor, an attribute value for the parent node as an attribute value for a first layer of the LODs; and determining, by the processor, attribute values for the child nodes based on residuals for the child nodes and a predictor from a preceding LOD.

[0017] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein applying the hierarchical tree to the points further comprises selecting a centroid for each current node, and wherein recursively dividing the parent node into child nodes further comprises dividing each current node by a plane traversing the centroid for the current node. In this example, the disclosed attribute coding mechanism operates by placing the points into a node and selecting a point as a center point, denoted as a centroid. The node is divided into child nodes by one or more planes passing through the centroid. The centroid is selected as a predictor and coded. This process is recursive so that each centroid is coded as a residual between an attribute value and the attribute value of the centroid from the previous LOD. This continues until the nodes can no longer be split and the remaining points are coded based on the previous centroids. In this way, the centroid for each current LOD is used as a predictor for lower LODs.

[0018] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein attribute values of each preceding LOD describe attribute values at a corresponding centroid.

[0019] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein a depth of the hierarchical tree is determined according to: d = round

where d is the depth of a binary tree, round is a rounding function, log is a logarithm function, and N is a number of the points.

[0020] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the attribute values of the parent node and the child nodes are stored in the attribute bitstream in refinement layers, and wherein the refinement layers are determined according to:

R(j) = I, - L0D(j - 1)

where R are the refinement layers, j specifies a refinement layer R(j), I _j specifies a current layer of the binary tree, and LOD(j-l) specifies the preceding LOD.

[0021] Optionally, in any of the preceding aspects, another implementation of the aspect provides, further comprising obtaining a useinTreeLoD variable in the attribute bitstream to determine the binary tree is used to determine the LODs and attributes values. [0022] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the hierarchical tree is an octree.

[0023] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the hierarchical tree is a hybrid binary tree. In some examples, octrees and hybrid binary trees can be used to code attributes in a similar manner to a binary tree by altering the relevant algorithms. An octree splits nodes into eight sub-nodes instead of two by splitting a node with three planes. A hybrid binary tree alternates selecting a predictor at every other node split to create fewer LODs than a binary tree.

[0024] In an embodiment, the disclosure includes a media coding device comprising: a processor, a receiver coupled to the processor, and a transmitter coupled to the processor, the processor, receiver, and transmitter configured to perform the method of any of the preceding aspects.

[0025] In an embodiment, the disclosure includes a non-transitory computer readable medium comprising a computer program product for use by a media coding device, the computer program product comprising computer executable instructions stored on the non- transitory computer readable medium such that when executed by a processor cause the media coding device to perform the method of any of the preceding aspects.

[0026] In an embodiment, the disclosure includes an encoder comprising: a storing means for storing an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points; a geometry bitstream means for applying an octree to the points to encode a geometry bitstream describing positions of the points; an attribute bitstream means for applying a hierarchical tree to the points to encode an attribute bitstream describing attribute values of the points; and a transmitting means for transmitting a PCC bitstream containing the geometry bitstream and the attribute bitstream to support reconstructing the media sequence.

[0027] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the encoder is further configured to perform the method of any of the preceding aspects.

[0028] In an embodiment, the disclosure includes a decoder comprising: a receiving means for receiving a PCC bitstream containing a geometry bitstream and an attribute bitstream that include an encoding of an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points; a geometry bitstream means for applying an octree to the geometry bitstream to decode positions of the points in the image; an attribute bitstream means for applying a hierarchical tree to the points based on the positions of the points to decode attribute values of the points; and a reconstruction means for reconstructing the sparse point cloud to reconstruct the image as part of the media sequence, wherein the sparse point cloud is reconstructed based on the positions of the points and the attribute values of the points.

[0029] Optionally, in any of the preceding aspects, another implementation of the aspect provides, wherein the decoder is further configured to perform the method of any of the preceding aspects.

[0030] For the purpose of clarity, any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.

[0031] These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0032] For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

[0033] FIG. 1 is a schematic diagram of an encoder for encoding sparse point clouds.

[0034] FIG. 2 is a schematic diagram of a decoder for decoding sparse point clouds.

[0035] FIG. 3 is a flowchart of an example method of coding point geometry in a sparse point cloud.

[0036] FIG. 4 is a flowchart of an example distance based method of coding point attributes in a sparse point cloud.

[0037] FIG. 5 is a schematic diagram of an example hierarchical tree based method of coding point attributes in a sparse point cloud.

[0038] FIG. 6 is a schematic diagram of an example mechanism for splitting a node at a centroid point according to a hierarchical tree based method of coding point attributes.

[0039] FIG. 7 is a schematic diagram of another example hierarchical tree based method of coding point attributes in a sparse point cloud.

[0040] FIG. 8 is a schematic diagram of another example hierarchical tree based method of coding point attributes in a sparse point cloud.

[0041] FIG. 9 is a schematic diagram of an example media coding device.

[0042] FIG. 10 is a flowchart of an example method of encoding point attributes in a sparse point cloud. [0043] FIG. 11 is a flowchart of an example method of applying a hierarchical tree to support encoding point attributes in a sparse point cloud.

[0044] FIG. 12 is a flowchart of an example method of decoding point attributes in a sparse point cloud.

[0045] FIG. 13 is a flowchart of an example method of applying a hierarchical tree to support decoding point attributes in a sparse point cloud.

[0046] FIG. 14 is a schematic diagram of an example system for coding point attributes in a sparse point cloud according to a hierarchical tree.

DETAILED DESCRIPTION

[0047] It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

[0048] Media can include many types of data. One example type of media data is the point cloud. A point cloud is a group of points, typically in three dimensional (3D) space, that describes one or more objects. For example, a cloud of points can be used to represent a 3D object, such as a person, a machine component, a rendering of a room, etc. A point cloud can be represented across a sequence frames that may or may not be temporally related, depending on the example. Representations of object(s) may be generated by a 3D scanner with a high sampling rate. Such representations may employ a correspondingly high number of points. Such representations may be referred to as dense point clouds. Sparse point clouds are point clouds that are generated using a lower sampling rate, and hence fewer points. For example, a dense point cloud may be used to represent a detailed rendering of an object or space with textures and detailed outlines, while a sparse point cloud may be employed to represent the general outline of an object or space. As a specific example, light detection and ranging (LIDAR) may be used in autonomous vehicle applications to determine the general landscape and relative positions of potential obstructions. In such a case, detailed renderings are less important than a general understanding of the space being scanned (e.g., a vehicle should stop for a road obstruction regardless of the specific nature of the obstruction). Accordingly, sparse point clouds may be used to represent LIDAR data. Such a sparse point cloud can then be used to control an autonomous vehicle, stored and reviewed for testing purposes, etc. Sparse point clouds may include significant amounts of empty space, and dense point clouds may not. Accordingly, compression algorithms for sparse point clouds may be different than those applied to dense point clouds. The present disclosure is specifically related to sparse point clouds.

[0049] Points in a point cloud are described by geometry and one or more attributes. Compressing the geometry and attributes of the point cloud to reduce the file size is referred to as point cloud coding (PCC). Geometry describes the position of a point in space. An attribute describes something about the point. While many attributes are possible, color and reflectance (e.g., light intensity) are the most common use cases. The geometry of points in a spare point cloud may be encoded/decoded by applying an octree, which is discussed in greater detail below. The attributes of the points may be encoded/decoded according to a distance based algorithm, which is also discussed in detail below.

[0050] Generally, the distance based algorithm selects a set of random root points to encode at a first layer of detail (LOD). The attribute values for the root points are encoded in an attribute bitstream. The algorithm also sorts the remaining points into refinement levels that make up additional levels of detail. The attribute values of each level are encoded as the difference between the attribute value of the node and the attribute value of a predictor at a previous level of detail. By only encoding the difference in values instead of the actual values, the attribute data is compressed. Further, the resulting image can be displayed at varying levels of detail based on the particular needs of the user. The distance based algorithm places nodes into the refinement levels by recursively using search algorithms relative to the nodes. For example a search algorithm can be applied to a set of root nodes to determine predictors for a first refinement level and sort a subset of the nodes into the first refinement level. The search algorithm is then used iteratively on the second LOD to create a third LOD, on the third LOD to create a fourth LOD, etc. As used herein, media includes a sequence of pictures (also referred to as frames), with each picture including a point cloud to be compressed by the recursive search algorithms. Accordingly, the distance based search algorithm can take an immense amount of processor resources to complete. In computational complexity theory terms, the distance based algorithm for coding sparse point cloud attributes may be considered a non-deterministic polynomial-time hard (NPhard) problem. NP hard problems are known to be undesirable due to excessive computational resource usage.

[0051] Disclosed herein are mechanisms to encode and decode sparse point cloud attributes with a hierarchical tree. Such a mechanism is not NP hard, and hence significantly reduces computational resource usage during encoding and decoding. As examples, the disclosure describes attribute coding using a binary tree, an octree, and a hybrid binary tree. However, the disclosed mechanisms may be modified to support other hierarchical tree structures. As a specific example, attribute coding using a binary tree is shown to increase coding speed by a factor or four to a factor of five when compared to the distance based approach. This indicates binary tree attribute coding can reduce coding time to between twenty percent and twenty five percent of the time used to execute the distance based approach on the same hardware. Further, the binary tree attribute coding is shown to increase compression by fifteen percent when compared to the distance based approach. Accordingly, the hierarchical tree based attribute coding schemes of the present disclosure significantly improve the function of the corresponding encoders and decoders (codecs) in terms of completion speed, processor resource usage, memory usage, and network resource usage when communicating the compressed data. The hierarchical tree based attribute coding schemes are discussed in greater detail below. But generally, the hierarchical tree based attribute coding schemes assign the points in a sparse point cloud to a parent node. The parent node is then recursively sub-divided into child nodes to create multiple LODs. Hence, the points are divided into groups by position as indicated by the geometry and without employing recursive search algorithms. The attribute values in higher LODs are used as predictors for attribute values of lower LODs according to a lifting algorithm. Stated less formally, the attribute values for points in lower LODs are encoded as a difference between an actual attribute value of the point and the attribute value of the predictor from the immediately preceding higher LOD. As a specific example, the nodes may be split by selecting a point closest to the center of the node, which is designated as the centroid. The node can then be split by a plane passing through the centroid (e.g., an XY plane, an YZ plane, or an XZ plane). The centroid can be used as a predictor for the points in the next LOD created by such a split. The nodes can continue to split until a minimum node size is reached. For example, the minimum node size may be selected based on the maximum sampling capability of a LIDAR creating the sparse point cloud. Once the nodes are split, the lifting algorithm can encode the attribute value(s) at the first LOD (e.g., the first predictor/centroid), the predictors at each LOD based on the predictors of the previous LOD, and the points at the lowest LOD based on the predictors of the previous LOD. The preceding discussion generally describes the application of a binary tree. However, the principals described herein can apply to other hierarchical tree structures, such as octree, hybrid binary, or others known in the art, by altering the node splitting and predictor selection as desired. [0052] FIG. 1 is a schematic diagram of an encoder 100 for encoding sparse point clouds. The encoder 100 receives positions 101 and attributes 103 of the points in the sparse point cloud as inputs. The encoder 100 encodes the positions 101 and attributes 103 into a geometry bitstream 105 and an attribute bitstream 107, respectively, which are outputs of the encoder. The geometry bitstream 105 and the attribute bitstream 107 are combined into a combined bitstream for transmission toward a decoder. A sparse point cloud is a cloud of points that describes an object or space and is captured according to a process with a reduced number of samples (e.g., in comparison to a dense point cloud). It should be noted that resolution describes the precision of a sample, and sampling rate describes a frequency of samples. Accordingly, a dense point cloud and a sparse point cloud may have similar resolutions, but can be differentiated based on a sampling rate (e.g., number of samples over a distance value). The sampling rate may be selected/set based on the acquisition process used to generate the point cloud. In a particular example, a sparse point cloud may include a sampling density between 3D points that is an inverse quadratic proportion of the distance between the points. Such a sparse point cloud may density of:

where b is a number of degrees of rotation, N is a number of strobes per one degree of b, a is a range, and density is described in units of points per degree. As used above, a may be an increasing value with the remaining values set as constants. The positions 101 are the locations of such points in space (e.g., 3D space) and may be represented as coordinate points or other spatial representations. The attributes 103 are features of the points, such as color, reflectance, etc. For example, a color attribute 103 may be a red, green, and blue (RGB) value, light value, a red difference, and blue difference value (YCrCB), etc. A reflectance attribute 103 may be an amount/intensity of light that bounces back from the point. The geometry bitstream 105 is a series of compressed point positions 101 from one or more pictures (e.g., frames). The atribute bitstream 107 is a series of compressed point atributes 103 from one or more pictures (e.g., frames).

[0053] Conceptually, the encoder 100 includes a geometry section for encoding the positions 101 and an attribute section for encoding the atributes 103. The geometry section includes a transform coordinates module 111, a quantize points module 113, an analyze octree module 115, an analyze surface approximation module 119, and an arithmetic encode module 117. [0054] The transform coordinates 111 module is applied to the positions 101 upon entering the encoder 100. The positions 101 may not have any particular structure, may be floating point numbers, and may be presented relative to some coordinate system (e.g., an application specific or standardized coordinate system). The transform coordinates 111 transforms the positions 101 into a format usable by the encoder 100. In an example implementation, the positions 101 may be denoted as X” ^ng. n = 1, ... , N. Internal coordinates XJ] ¹¹, n = 1, ... , N, may be obtained from original coordinates by the coordinate transformation X _n = (X° ^ng— T)/s, where T = (t _x, t _y, t _z). The parameters T and s are such that the point positions

X _n ^nt, n = 1, ... , N, lie in a bounding cube [0,2 ^d) for some non-negative integer parameter d. Point positions 101 in the internal coordinate system that have been compressed and decompressed are denoted X _n, n = 1, ... , N _0Ut , where N _out is the number of points in the decoded point cloud. N _out may not be the same as N. If a Trisoup geometry codec is used, s is specified by the triSoupIntToOrigScale parameter, while T is [0, 0, 0], and d are specified by the triSoupDepth parameter. Components of points (x _n, y _n, z _n) outside the bounding cube

[0,2 ^d) may be clipped to the range [0,2 ^d— 1] as desired. If an octree geometry codec is used, 1/s is specified by a positionQuantizationScale parameter, while T is determined by T = (minx° ^rig , miny° ^ng , min z° ^rig), and d is determined by d = Ceil Log2(Max(x[ ₁ ^nt, y] ₁ ^nt, zή ^h1, n = 1, ... , N _orig) + l)j, such that [0,2 ^d) ³ is the smallest bounding cube with side an integer power of two that contains the point positions in internal coordinates. Ceil x ) is the least integer greater than or equal to x. Log2(x) is the base-2 logarithm of x. Max( xl, ... , xN ) is the maximum of xl, ..., xN.

[0055] After transformation, the positions 101 are forwarded to the quantize points module 113. The encoder 100 operates on non-negative integer values. The quantize points module 113 performs rounding to create integer values and merges any resulting duplicate points. In an example implementation, the internal coordinates X/ ¹¹ can be converted to integer values X _nby a function X _n = Round (X/ ¹¹), where Round(-) is a function that rounds the components of a vector to the nearest integer. After such quantization, there may be multiple points with the same position, called duplicate points. The duplicate point removal process is optional. If enabled, this process removes points with the same quantized coordinates. Multiple points with the same quantized position and different attributes are merged in a single point. The attributes 103 associated with the single point are computed by the transfer attributes module 125 as described below. The process of position quantization, duplicate point removal, and assignment of attributes to the remaining points is called voxelization. Voxelization is the process of grouping points together into voxels. The set of voxels are the unit cubes [i— 0.5, i + 0.5) x [j— 0.5, j + 0.5) x [k— 0.5, k + 0.5) for integer values of i, j, and k between 0 and 2 ^d— 1. The locations of all the points within a voxel are quantized to the voxel centre, and the attributes 103 of all the points within the voxel are combined (e.g., averaged) and assigned to the voxel. A voxel is occupied when the voxel contains any point of the point cloud.

[0056] After quantization, the positions 101 are forwarded to the analyze octree module 115. The analyze octree module 115 is configured to encode the positions 101 of the points and/or voxels by employing an octree. An octree is a hierarchical structure used to recursively split nodes into sub-nodes. Specifically, an octree recursively splits a parent node into eight child nodes. The octree can contain any number of layers of nodes. When used to encode a sparse point cloud, the octree denotes when a node contains at least one point. The node is split into eight sub-nodes. Those sub-nodes that contain points are marked and further split into eight sub-nodes, etc. Nodes that contain no points are discarded. Such sub-division may continue until the nodes reach a size that is comparable to (e.g., larger than/equal to) a minimum threshold, such as a minimum scanning capability size (e.g., a maximum effective sampling capability) associated with a point scanner/LIDAR that generated the point cloud. This approach compresses the actual position of the points/voxels to a location in a leaf node of the octree. An example implementation of such as process can be formally described as follows. First, a cubical axis-aligned bounding box B is defined by the two extreme points (0,0,0) and (2 ^d, 2 ^d, 2 ^d). An octree structure is then built by recursively subdividing B. At each stage, a cube is subdivided into 8 sub-cubes. An 8-bit code, named an occupancy code, is then generated by associating a l-bit value with each sub-cube in order to indicate whether the sub-cube contains points (e.g., full and has a value of one) or not (e.g., empty and has a value of zero). Only full sub-cubes with a size greater than one (e.g., non-voxels) are further subdivided. Since points may be duplicated, multiple points may be mapped to the same sub cube of size one (e.g., the same voxel). In order to handle such a situation, the number of points for each sub-cube of dimension one is also arithmetically encoded. For example, the occupancy of the octree as well as the number of points in each of the occupied smallest nodes can be sent to the arithmetic encode module 117 for encoding.

[0057] After octree analysis, the resulting geometry data created from the positions 101 may be forwarded to the analyze surface approximation module 119. The analyze surface approximation module 119 is an optional module that can be employed to encode an approximation of the surfaces of an object represented by a point cloud. For example, the analyze surface approximation module 119 may be used for dense point clouds. In an example implementation, the analyze surface approximation module 119 may use a trisoup algorithm that represents the surface of an object as a mesh of triangles. Data related to the triangle mesh can then be forwarded to the arithmetic encode module 117 for encoding.

[0058] The arithmetic encode module 117 is configured to encode positions 101 related data, such as octree data, node occupancy data, and/or object surface data, into the geometry bitstream 105. For example, the arithmetic encode module 117 can encode such data by using a binary code. In an example implementation, the positions 101 data can be received as a number of symbols. The arithmetic encode module 117 can employ a look up table (LUT) with the most commonly used symbols and a cache of recently used symbols. If a symbol is in the LUT, the index of the location of the symbol in the LUT is encoded. If a symbol is in the cache, the index of the location of the symbol in the cache is encoded. Otherwise, an arithmetic value describing the symbol is encoded. Other coding mechanisms may also be used, such as context adaptive variable length coding (CAVLC), Context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy coding technique.

[0059] The process of encoding the attributes 103 of the points is now discussed. The attribute section includes a transform colors module 123, a transfer attributes module 125, a Region Adaptive Flierarchical Transform (RAFIT) module 127, a generate LOD module 129, a lifting module 131, a quantize coefficients module 133, and an arithmetic encode module 135. The transform colors module 123 is an optional module. The transform colors module 123 may be configured to convert RGB attributes 103 into a YCbCr form or vice versa. This is optional because quantization of color components may be agnostic to the color space of the components. This is because the components may be processed independently. In the event the transform colors module 123 is used, the transform colors module 123 ensures the attributes 103 are converted to the desired format for further processing.

[0060] The transfer attributes module 125 receives the attributes 103 of the points from the transform colors module 123 and/or as input to the encoder 100. Also, the positions 101 of the points are forwarded to the transfer attributes module 125. Further, the output from the analyze octree module 115 and/or the analyze surface approximation module 119 are forwarded to the reconstruct geometry module 121. The reconstruct geometry module 121 is configured to reconstruct positions 10 la of the points based on the results of the encoding processes used by the octree and/or surface approximation algorithms. Accordingly, the reconstruct geometry module 121 generates positions lOla data in manner similar to a decoder. The output of the reconstruct geometry module 121 is then forwarded to the transfer attributes module 125. This allows the transfer attributes module 125 to compare the positions lOla as encoded with the original positions 101 to determine the distortion created by the encoding process. The transfer attributes module 125 is configured to alter the attributes 103 of the points, as desired, to offset the distortion created by the encoding of the geometry bitstream 105 from the positions 101, where the distortion is the difference between the positions 101 and the positions 10 la. This process may also be called recoloring. In an example, the transfer attributes module 125 may operate as follows. Given the input point cloud positions 101, the attributes 103 and the reconstructed positions 10 la the objective of the attributes transfer procedure of the transfer attributes module 125 may be to determine the attribute 103 values that minimize the attribute 103 distortions. The input positions 101 and the reconstructed positions 10 la may be denoted as (Xi)i=o...N-i ^and (Xi)i=o... _Nrec— _l’ respectively. N and N _recare the number of points in the original point cloud and the number of points in the reconstructed point cloud, respectively. If duplicated points are merged into voxels N _rec < N. Otherwise N _rec = N. For each point X _j in the reconstructed point cloud, X ^* is the nearest neighbour in the original point cloud and a ^* is the attribute 103 value associated with X ^*. For each point X _j in the reconstructed point cloud, describes the set of points (Q) ⁺ (i) in the original point cloud that share X _j as their nearest neighbour in the reconstructed point cloud. H(i) is the number of elements in (Q) ⁺(i) . h) is one of the elements of (Q) ⁺(i) . If (Q) ⁺(i) is empty, then the attribute 103 value a ^* is associated with the point X _j. If (Q) ⁺ ( i) is not empty, then the attribute 103 value a _t associated with the point X _j is obtained by a _t a (h). As shown in

FIG. 1, the transfer attributes module 125 is coupled to the RAFIT module 127 and the generate LOD module 129 via a switch. Accordingly, the transfer attributes module 125 can output results to the RAFIT module 127 or the generate LOD module 129 (e.g., but may not output to both simultaneously in some examples).

[0061] The recolored attributes 103 and the reconstructed positions 10 la may be forwarded to the RAFIT module 127. The RAFIT module 127 provides an optional mechanism for encoding the attributes 103. When used, the RAHT module 127 spatially transforms attribute 103 colors to obtain transformed colors, denoted as (TY _n, TU _n, TV _n) , n = 1, ... , N _vox . The transformed colors can be obtained based on voxel colors (Y _n, U _n V _n), n = 1, ... , N _V0X, and based on a corresponding list of voxel locations (x _n, y _n, z _n), n— 1, , N _vox. The transformed colors can then be represented by a list of transform coefficients denoted as T _n, n = 1, ... , N. In an example implementation, the RAHT module 127 traverses a set of blocks that contain the points of the point cloud in 3D space. The RAHT module 127 then applies a transform to such blocks to transform attributes 103 of the points/voxels contained in each block. The transform may vary depending on the spatial position of the block. For example, the transform of a parent block may be the concatenation of the two sibling blocks, with the exception that a first direct current (DC) component of the transforms of the two sibling blocks may be replaced by their weighted sum and difference. Further, the transforms of the two sibling blocks may be copied from the first and last parts of the transform of the parent block, with the exception that the DC components of the transforms of the two sibling blocks are replaced by their weighted difference and sum.

[0062] The recolored attributes 103 and the reconstructed positions 10 la may also be forwarded to the generate LOD module 129. The generate LOD module 129 is another example attribute 103 encoding mechanism. The generate LOD module 129 is configured to organize the points into various LODs based on their positions 10 la. For example, the points may be organized into a first layer and one or more refinement layers. This allows the attributes 103 to be decoded and displayed in varying levels of detail as desired by the end user. In some systems, the generate LOD module 129 selects a set of random root points to encode at the first LOD. The remaining points are organized into refinement levels based on distance to the points of the first layer. In an example implementation, all the points are first marked as non-visited and the set of visited points, denoted as V, is set as empty. The generate LOD module 129 may then proceed iteratively. At each iteration 1, the refinement level R _j is generated as follows. The generate LOD module 129 iterates over all the points. If the current point has been visited, then the point is ignored. Otherwise, the minimum distance D of the current point to the set V is computed. If D is strictly lower than a threshold d _j, then the current point is ignored. If D is higher or equal than d _j, then the current point is marked as visited and added to both R _j and V. This process is repeated until all the points are traversed. The level of detail, LOD _j, is obtained by taking the union of the refinement levels R ₀, R ₁ ... , R _j . This process is repeated until all the LODs are generated or until all the vertices have been visited. Iterative use of a distance based search algorithm is immensely processor intensive, for example when multiple pictures are to be encoded at many LODs. This may be considered an NP hard problem. [0063] The present disclosure includes a modified generate LOD module 129 that significantly reduces coding time and increases attribute compression. The disclosed generate LOD module 129 applies a hierarchical tree, such as a binary tree, an octree, and/or a hybrid binary tree, to sort the points for attribute encoding. By using a hierarchical tree, such as a binary tree coding speed may be increased by a factor or four to a factor of five when compared to the iterative distance based search approach. This indicates that hierarchical tree attribute coding can reduce coding time to between twenty percent and twenty five percent of the time used to execute the distance based approach on the same hardware. Further, hierarchical tree attribute coding may increase compression by fifteen percent when compared to the distance based approach. When designed to use a hierarchical tree, the generate LOD module 129 assigns the points in the point cloud to a parent node based on positions 10 la. The parent node is then recursively sub-divided into child nodes to create multiple LODs. Flence, the points are divided into groups by position 10 la without employing recursive search algorithms.

[0064] The output of the generate LOD module 129 (e.g., the points sorted according to position) can then be forwarded to the lifting module 131. The lifting module 131 is configured to encode each LOD based on a previous LOD. For example, the attribute values 103 for the points in the first LOD are encoded. The attribute values 103 for the first refinement layer are encoded as the difference between the actual value and the value of a predictor from the first LOD. The attribute values 103 of further refinement layers are then iteratively encoded as the difference between the actual value and the value of a predictor from the previous refinement layer. Such differences may be referred to as residuals. For example, a residual is a difference between a predictor value and an actual value. The lifting module 131 may also apply a transform to the predictors and residuals in order to convert such values into a series of coefficients for further encoding.

[0065] The output of the RAFIT module 127 and/or the lifting module 131 may be forwarded to the quantize coefficients module 133. The quantize coefficients module 133 is configured to compress the predictors and residuals of the attribute data for further encoding. Many mechanisms can be employed to quantize the attribute data. For example, the predictors and residuals may be received at the quantize coefficients module 133 as various transform coefficients. The quantize coefficients module 133 may then multiply such coefficients by a weighting function to reduce the size of the coefficients for further encoding. Different mechanisms can employ different weighting functions, such as a step function, a rounding function, a normalization function, etc. [0066] The output of the quantize coefficients module 133 are quantized transformed coefficients representing the attributes 103 of the points in the point cloud. Such output is received by the arithmetic encoder 135, which may be substantially similar to the arithmetic encoder 117. However, the arithmetic encoder 135 is dedicated to encoding attributes 103 into the attribute bitstream 107. In some examples, the arithmetic encoder 135 and the arithmetic encoder 117 may be implemented in a common module.

[0067] It should be noted that the preceding encoder 100 includes many optional modules and can be implemented by selectively applying various function(s), operator(s), options, etc. As noted above, different selections result in benefits and/or detriments for different cases. The only requirement is that a corresponding decoder must apply a copy and/or an inverse of any optional models, function(s), operator(s), options, etc. that are selected at the encoder 100. This is because the encoded data is unreadable at the decoder if the decoder is unable to properly reverse the encoding to reconstruct the media sequence. Accordingly, the relevant modules of the encoder 100 can signal such choices used in encoding the geometry bitstreaml05 and/or the attribute bitstream 107 to the decoder. These choices can be signaled in the geometry bitstream 105, the attribute bitstream 107, and/or the combined bitstream, depending on the example.

[0068] FIG. 2 is a schematic diagram of a decoder 200 for decoding sparse point clouds, for example as encoded in a combined bitstream by an encoder 100. The decoder 200 includes various modules to reverse the encoding process from the encoder 100. The decoder 200 receives a geometry bitstream 205 and an attribute bitstream 207 as inputs. Such bitstreams are substantially similar to the geometry bitstream 105 and the attribute bitstream 107, respectively, and may be received as part of a combined bitstream (e.g., including relevant signaling syntax). The decoder 200 is configured to decode the geometry bitstream 205 and the attribute bitstream 207 to generate positions 201 and attributes 203 of points in a point cloud for display. The point cloud can then be reconstructed and displayed or otherwise employed based on the positions 201 and attributes 203 of the points. The positions 201 and attributes 203 should be approximately equivalent to positions 101 and attributes 103, respectively. Any differences in such values are perceived as distortion. While distortion is undesirable, distortion is a side effect of lossy compression. Accordingly, compression and distortion may be considered design tradeoffs. Beneficial encoder 100 and decoder 200 designs seek to balance compression and distortion along with other design constraints, such as hardware resource usage, etc. For example, when coding sparse point clouds for use by machines, such as autonomous vehicles, distortion may be less of a detractor, in which case compression can be emphasized. [0069] The decoder 200 includes a geometry section for decoding the positions 201 of the points. The geometry section includes an arithmetic decode module 217, a synthesize octree module 215, a synthesize surface approximation module 219, a reconstruct geometry module 221, and an inverse transform coordinates module 211. Such modules are configured to reverse the encoding processes performed by corresponding modules at the encoder.

[0070] The arithmetic decode module 217 is configured to receive the geometry bitstream 205 and convert a binary representation of the geometry bitstream 205 into quantized transformed coefficients representing the geometry. For example, the arithmetic decode module 217 is configured to perform an inverse function of the function provided by an arithmetic encode module 117. In one example, the arithmetic decode module 217 may employ a LUT with a list of most commonly used symbols and a cache of recently used symbols. The arithmetic decode module 217 can then read indices from the geometry bitstream 205 to obtain coefficients from the LUT and/or cache in order to reverse a similar process performed by the encoder. In the event that the value is not an index, the value can be converted directly into a coefficient. In other examples, the arithmetic decode module 217 may employ CAVLC, CABAC, SBAC, PIPE coding, or other entropy coding technique as desired.

[0071] The transformed coefficients representing the geometry are output from the arithmetic decode module 217 and forwarded to the synthesize octree module 215 and the synthesize surface approximation module 219. The synthesize octree module 215 is configured to recreate the octree from the transformed coefficients. For example, the synthesize octree module 215 may apply an inverse transform to recover the octree data, which can then be used to reconstruct the octree and determine the point positions based on the octree. In an example implementation, the decoding process starts by reading from the bitstream the dimensions of a bounding box (B) that contains the point cloud. The octree structure is recreated by subdividing B according to the occupancy codes from the geometry bitstream 205. Each time a sub-cube of dimension one is reached (e.g., a lowest level of the octree), the number of points for that sub-cube, denote as c, is arithmetically decoded from the geometry bitstream 205. A number of points c are then generated at the origin of the sub-cube. This results in a reconstruction of a number of points c that are positioned in a location that approximate the reconstructed point cloud.

[0072] The synthesize surface approximation module 219 is an optional module configured to reverse the encoding processes of the analyze approximation module 119. Accordingly, when used, the synthesize surface approximation module 219 may be configured to employ a trisoup algorithm to decode a triangular mesh in order to approximate the surfaces of the point cloud from the geometry bitstream 205. For example, the synthesize surface approximation module 219 may read various vertices from the geometry bitstream 205 and form triangles from the vertices by determining a centroid, mean-removed coordinates, and scaled variances for the triangles. Such information can then be used to reconstruct the triangular mesh to approximate the surface, for example by projecting the triangles onto corresponding planes based on the angles of the triangles.

[0073] The output of the synthesize octree module 215 and/or the synthesize surface approximation module 219 is received at the reconstruct geometry module 221, which is substantially similar to reconstruct geometry module 121 (e.g., reconstruct geometry module 121 is designed to predict the results generated by the reconstruct geometry module 221). For example, the reconstruct geometry module 221 assign internal coordinates to the points based on the cube of the octree in which such points reside. When the synthesize surface approximation module 219 is used, the triangular mesh can also be used to adjust the coordinates of such points into positions indicated by the triangular mesh. The output of the reconstruct geometry module 221 can then be output as positions 201, or optionally forwarded to an inverse transform coordinates module 211.

[0074] The inverse transform coordinates module 211 is an optional module that reverses the coordinate transform mechanism of the transform coordinates module 111. For example, the inverse transform coordinates module 211 may apply an inverse transform to convert the coordinates of the points to an external coordinate system. In an example, the decoded point positions 201 in the original coordinate system are obtained from decoded point positions 201 in the internal coordinate system by the coordinate transformation X° ^g = sX _n + T.

[0075] The decoder 200 also includes an attribute section for decoding the attributes 203 of the points. The attribute section includes an arithmetic decode module 235, an inverse quantize module 233, a RAHT module 227, a generate LOD module 229, an inverse lifting module 231, and an inverse transform colors module 223. Such modules are configured to reverse the encoding processes performed by corresponding modules at the encoder.

[0076] The arithmetic decode module 235 is similar to the arithmetic decode module 217, but is configured to decode both the geometry bitstream 205 and the attribute bitstream 207. In some examples, the arithmetic decode module 235 and the arithmetic decode module 217 are implemented in the same module. The arithmetic decode module 235 is configured to convert a binary representation of the geometry bitstream 205 and the attribute bitstream 207 into quantized transformed coefficients representing the geometry and the attributes of the points. [0077] The resulting data from the arithmetic decode module 235 is forwarded to the inverse quantize module 233, which is configured to remove some or all of the quantization from the data (e.g., depending on whether the quantization process is lossy or not). For example, the inverse quantize module 233 may apply an inverse weighting function to the coefficients. The inverse weighting function is selected to reverse the compression applied by the quantize coefficients module 133. This may result in uncompressed coefficients for predictors and residuals of the attribute data (e.g., and relevant data from the geometry bitstream 205). An inverse transform can also be applied to convert such data into a form useable by the other attribute modules. As shown in FIG. 2, inverse quantize module 233 is coupled to the RAFIT module 227 and the generate LOD module 229 via a switch. Accordingly, the inverse quantize module 233 can output results to the RAFIT module 227 or the generate LOD module 229 (e.g., but may not output to both simultaneously in some examples).

[0078] The RAFIT module 227 is an optional module that receives reconstructed positions 201 from the reconstruct geometry module 221 and attribute data from the inverse quantize module 233. The RAFIT module 227 is designed to reverse the operation of the RAFiT module 127. For example, the RAFIT module 227 obtains a list of transform coefficients denoted as T _n, n = 1, ... , N to detennine a list of transformed colors, denoted as (TY _n, TU _n, TV _n) . n = 1, ... , N _V0X. The RAHT module 227 can then use a list of voxel locations (x _n, y _n, z _n), n = 1, ... , N _V0X and voxel colors (Ϋ _h, U _n, V _n), n = 1, ... , N _V0X to apply an inverse spatial transform to obtain the attribute 203 colors.

[0079] The generate LOD module 229 receives reconstructed positions 201 from the reconstruct geometry module 221 and attribute data from the inverse quantize module 233. The generate LOD module 229 is substantially similar to the generate LOD module 129 of the encoder 100, and functions in substantially the same manner. Specifically, in the present disclosure the generate LOD module 229 employs the reconstructed positions 201 to include the points in a node and applies a hierarchical tree to recursively divide the node into sub-nodes in order to generate a first LOD and a plurality of refinement layers. The resulting LODs can then be forwarded to the inverse lifting module 231.

[0080] The inverse lifting module 231 performs the inverse operation of the lifting module 131. The inverse lifting module 231 receives the LODs from the generate LOD module 229. The inverse lifting module 231 can also determine the predictors and residuals for the attributes 203 from the attribute bitstream 207 (e.g., after arithmetic decoding, inverse quantization, etc. by other modules as discussed above). For example, the inverse lifting module 231 may apply an inverse transform to coefficients in order to recover the predictors and residuals. The inverse lifting module 231 may then recursively set the attribute values for the points of the various LODs based on the predictors and residuals. For example, the attribute values for points at the first LOD can be set as indicated in the attribute bitstream 207, attribute values for the points at the second LOD (e.g., first refinement layer) can be set by adding attribute values in a first set of predictors associated with the first LOD to a first set of residuals associated with the second LOD, the attribute values for the points at the third LOD (e.g., second refinement layer) can be set by adding attribute values of a second set of predictors associated with the second LOD to a second set of residuals associated with the third LOD, etc. This results in determining the attributes 203 for the points in the LODs.

[0081] The reconstructed attributes 203 may be forwarded from the inverse lifting module 231 and/or the RAHT module 227 to an optional inverse transform colors module 223. The inverse transform colors module 223 is configured to reverse the operations of the transform colors module 123. Accordingly, the inverse transform colors module 223 may transform the attributes 203 from YCbCr form to RGB form, or vice versa, as desired.

[0082] The point cloud can then be reconstructed by applying the attributes 203 to the points at positions 201. The resulting point cloud can also be included in a picture, also known as a frame. Multiple pictures of the point cloud can be reconstructed and related to create a media sequence. Such pictures may be related by different factors in different examples. For example, pictures of the point cloud may be related based on the relative position of a scanning device/LIDAR when such pictures were captured. The media sequence can be forwarded to a display for viewing by a user and/or stored in memory for review by a machine, for example for use in controlling and/or debugging actions of a machine that captured the point cloud.

[0083] FIG. 3 is a flowchart of an example method 300 of coding point geometry in a sparse point cloud. For example, method 300 may be employed by an encoder 100 to generate a geometry bitstream 105. Method 300 begins when a point cloud is received for encoding. At step 301 the points of the point cloud are assigned to a parent node in a first layer. The parent node and the first layer may be considered to be the current node and the current layer. The method 300 then proceeds to step 303.

[0084] At step 303, the method 300 determines whether the current node is occupied. If the node is not occupied, the method 300 proceeds to step 307. At step 307, the current node is omitted from the octree and the method 300 proceeds to step 309. If the node is occupied at step 303, the method 300 proceeds to step 305. At step 305, the method 300 indicates the current node is occupied in memory. The current node is split into eight child nodes which are included in a next layer. The method 300 then proceeds to step 309.

[0085] At step 309, the method 300 determines whether there are more nodes in the current layer. If there are more nodes in the current layer, then the method 300 proceeds to step 311. At step 311, the next node in the current layer is set as the current node. The method 300 then returns to step 303. If there are no more nodes in the current layer at step 309, the method 300 proceeds to step 313.

[0086] At step 313, the method 300 determines whether the lower threshold for the octree has been reached. The lower threshold may be associated with the sampling limit of the scanner that created the point cloud. The threshold may be set by default and/or set by a user and encoded in the bitstream to signal the decoder. If the threshold has not been reached, the method 300 proceeds to step 315 and moves to the next layer. At step 315, the method 300 makes the first child node in the next layer the current node. The next layer is also made the current node. The method 300 then returns to step 303. If the lower threshold is reached at step 313, then the current layer is the last layer of the octree. Accordingly, the method 300 proceeds to step 317. At step 317, the method 300 indicates the number of points in each of the occupied nodes in memory and ends. The occupied nodes and number of points in the last layer of occupied nodes can then be encoded in the geometry bitstream.

[0087] The method 300 can be modified to operate on a decoder 200 to decode the geometry bitstream 205 by reading from the geometry bitstream to determine whether a node is occupied instead of writing to the bitstream. Further, the number of points are assigned to the last layer of occupied nodes based on the data in bitstream instead of writing such data to the bitstream.

[0088] FIG. 4 is a flowchart of an example distance based method 400 of coding point attributes in a sparse point cloud. For example, method 400 may be employed to implement the generate LOD module 129 in the encoder 100 or the generate LOD module 229 in the decoder 200. Accordingly, the method 400 may be used to encode or decode attributes of a point cloud to/from an attribute bitstream. Specifically, the method 400 sorts points into varying LODs so that each LOD can be coded based on a lower or higher LOD, depending on implementation.

[0089] An attribute signal 403 is received at the generate LOD module. The attribute signal 403 includes the attributes of points in a point cloud and is substantially similar to attributes 103 as modified by intervening components. At block 401, the points and corresponding attributes are split into a group of higher level nodes (H(N)) and lower level nodes (L(N)). This split is performed by employing a distance threshold, which may be user defined. For example, a set of root points is randomly selected. A search algorithm is then employed from each of the root points. Points that are within the threshold distance of a root point are assigned to L(N) and root points and any other points that are not within the threshold of a root point are placed into FI(N). The attributes for points in FI(N) are coded directly.

[0090] At block 403, a predication is performed. Specifically, a set of predictors P(N) is selected for use in encoding the next LOD. The predictors may be selected as the closest point to each root point that is beyond the distance threshold. At step 407, the predictors are encoded as a difference between the attribute value of the predictor and the attribute value of the associated root point. The encoded attribute values for the root points and the predictors are output as D(N). At block 405, an update is determined. Specifically, the attribute values of the predictors are obtained from D(N) and prepared to update the attribute values of the next LOD. The attribute values of the predictors are stored in U(N). At block 409, U(N) is applied to L(N). This modifies the attribute values of the points assigned to L(N) so that such attribute values are represented as a function of the predictor instead of being represented as complete attribute values. This reduces the size of the representation of the attribute values at L(N). The values of L(N) are then sent to the block 401 to be further split into additional levels. This process is applied recursively until the attributes of the points are encoded into the desired number of LODs. As can be seen for the forgoing, method 400 compresses the attribute values of the points into varying LODs. However, method 400 performs this task at the cost of recursively calling a search algorithm for sets of points at each LOD. This process is extremely processor intensive, particularly when multiple pictures each including a point cloud are to be coded. From a complexity standpoint, this approach may be considered an NP hard problem, which is generally disfavored in the art.

[0091] FIG. 5 is a schematic diagram of an example hierarchical tree based method 500 of coding point attributes in a sparse point cloud. Specifically, the hierarchical tree shown in FIG. 5 is a binary tree. Method 500 may be employed to implement the generate LOD module 129 in the encoder 100 or the generate LOD module 229 in the decoder 200. Accordingly, the method 500 may be used to encode or decode attributes of a point cloud to/from an attribute bitstream.

[0092] The method 500 is applied to a picture containing a point cloud. The method 500 sorts the points into a plurality of LODs for encoding by a lifting algorithm. The method 500 initially assigns the points, denoted as P, to a first layer node 501 in a first LOD 511. The method 500 divides the first layer node 501 into second layer nodes 503. For example, the first layer node 501 may be divided into second layer nodes 503 by a plane that bisects the first layer node 501. The points P in the first layer node 501 are split into groups Pl and P2 based on the position of such points relative to the plane. The second layer nodes 503 make up a second LOD 512.

[0093] The method 500 continues to recursively divide the nodes until a final LOD 514 is reached. FIG. 5 depicts a tree of depth four with four LODs. Flowever, a tree can have j LODs where j is a default or user defined value (e.g., based on the maximum sampling capability of a scanner that created the point cloud). For example, the second layer nodes 503 are split into third layer nodes 505 in a third LOD 513. The points Pl and P2 of the second layer nodes 503 are further split based on their position relative to planes that splits the second layer nodes 503 to create the third layer nodes 505. In the example shown, points Pl are split into points Pla and Plb, and points P2 are split into points P2a and P2b.

[0094] In the example shown, the third layer nodes 505 are split into fourth layer nodes 507 in a final LOD 514. The points Pla, Plb, P2a, and P2b of the third layer nodes 505 are further split based on their position relative to planes that splits the third layer nodes 505 to create the fourth layer nodes 507. In the example shown, points Pla are split into points Plaa and Plab, points Plb are split into points Plba and Plbb, points P2a are split into points P2aa and P2ab, and points P2b are split into points P2ba and P2bb.

[0095] In the event a node contains no points, such a node can be omitted from the hierarchical tree. Accordingly, the hierarchical tree based method 500 sorts points into nodes based on position. Such nodes can then be encoded by the lifting algorithm. For example, a point from the points P can be selected from the first node 501 when the first node 501 is split. The selected point can act as a predictor for points Pl and P2 in the second LOD 512. Points from points Pl and P2 can also be selected to act as predictors for points Pla, Plb, P2a, and P2b in the third LOD 513. Further, points from points Pla, Plb, P2a, and P2b can be selected to act as predictors for points Plaa, Plab, Plba, Plbb, P2aa, P2ab, P2ba, and P2bb in the final LOD 514. The lifting algorithm can encode the attribute value of the predictor from the first LOD 511. The lifting algorithm can also encode the attribute values of the predictors from the second LOD 512 based on the predictor for the first LOD 511. For example, the attribute values of the predictors from the second LOD 512 are encoded as a residual representing the difference between the predictor for the first LOD 511 and the predictors from the second LOD 512. This process can then encode the remaining points in the same fashion. In this way the differences between attribute values for the predictors at each level and the attribute values for the predictors at the previous level are encoded as residuals. Finally, the attribute values of the points in the final LOD 514 can be encoded as residuals (e.g., difference s/changes in value) based on the attribute values of the predictors in the previous LOD (the third LOD 513 in the example shown). This scheme ensures that only the attribute values of the first LOD 511 are completely encoded and all remaining values are encoded as residuals. This approach significantly compresses the attribute bitstream because the residuals are generally smaller than complete values and can be encoded using fewer bits.

[0096] Method 500 is discussed in terms of encoding, but operates in a substantially similar manner at a decoder. For example, the nodes are split into LODs, which sorts points based on positions that are obtained from decoding the geometry bitstream. The difference is the inverse lifting algorithm is applied instead of the lifting algorithm. Specifically, the attribute value of the first LOD 511 is obtained from the attribute bitstream. The attribute values of the second LOD 512 are obtained by adding the residuals from the second LOD 512 to the attribute values of the predictor from the first LOD 511. Framed in more formal terms, the attribute values for each current LOD are determined based on residuals for the current LOD and attribute values from the preceding LOD (as determined from the attribute bitstream).

[0097] Method 500 sorts the points into layers for coding based on point positions, but not based on relative distance between points. Accordingly, method 500 may not employ a recursive search algorithm as is done in method 400. Omission of the recursive search algorithm significantly reduces processor resource usage during encoding and decoding. As a specific example, attribute coding using the binary tree of method 500 may increase coding speed by a factor or four to a factor of five when compared to the distance based approach of method 400. This indicates binary tree attribute coding according to method 500 can reduce coding time to between twenty percent and twenty five percent of the time used to execute the distance based approach of method 400 on the same hardware. Further, the binary tree attribute coding of method 500 may increase compression by fifteen percent when compared to the distance based approach of method 400. Accordingly, the hierarchical tree based attribute coding scheme of method 500 significantly improves the function of the corresponding codecs in terms of completion speed, processor resource usage, memory usage, and network resource usage when communicating the compressed data.

[0098] An example implementation of method 500 can be described formally as follows. In general, attribute data of the point cloud may be encoded as a binary tree to represent LODs. One of the implementations to build the binary tree for attribute LODs employs the following steps. All of the points from a point cloud data can be extracted and included in a node of a binary tree. The binary tree depth can be set as according to:

[0099] Syntax elements can also be encoded into the combined bitstream (e.g., in the attribute bitstream) to indicate that the binary tree is used to encode the attribute data. An example syntax element denoted use binary tree for LOD (useinTreeLoD) can be used for this purpose as described below:

The syntax element bitstream size in bytes indicates the size of the bitstream in bytes. The variable useinTreeLoD represents whether a binary tree is used to generate LOD.

[00100] At the decoder side the same method 500 for LOD generation based on a binary tree can be used to reconstruct the point cloud based on the geometry. The decoder can parse syntax elements to determine whether useinTreeLoD is set (e.g., equal to one). The decoder can then extract all of the points from a point cloud data and include the points in a node of a binary tree. The binary tree depth can be set and the centroid nodes computed starting from the root as in the encoder. Refinement layers are also extracted for the LODs as in a similar manner to the encoder. The LOD(j) can be reconstructed via union with prior LOD(j-l) and R(j). Refinements layer extraction and LOD reconstruction can be repeated while points are present in any attribute refinement layer.

[00101] FIG. 6 is a schematic diagram of an example mechanism 600 for splitting a node at a centroid point according to a hierarchical tree based method of coding point attributes, such as method 500. Accordingly, mechanism 600 can be employed in the generate LOD module 129 in the encoder 100 or the generate LOD module 229 in the decoder 200. As such, the mechanism 600 may be used to encode or decode attributes of a point cloud to/from an atribute bitstream.

[00102] Mechanism 600 shows a node 601. The node 601 is an example of nodes 501, 503, and/or 505. The node 601 can be split into two nodes. For example, mechanism 600 can review points in the node 601 and select a centroid 602. A centroid 602 is a point determined to be the closest to the center of the group of points contained in the node 601. The node 601 is bisected by a plane that traverses the centroid 602. The plane can be selected from an XY plane, a YZ plane, and an XZ plane. In FIG. 6, an XY plane includes X and Y components and no Z components. An XY plane divides the node 601 into a top node above a bottom node. A YZ plane includes Y and Z components and no X components. A YZ plane divides the node 601 into a left node horizontally adjacent to a right node. An XZ plane includes X and Z components and no Y components. An XZ pane divides the node 601 into a front node in front of a back node. In this example, the centroid 602 can act as a predictor for the two nodes created when the node 601 is divided. Further, the centroid 602 may be removed from the resulting nodes when the node 601 is divided. For example, the centroid 602 may be coded as part of a current LOD and may not remain when coding the next LOD.

[00103] FIG. 7 is a schematic diagram of another example hierarchical tree based method 700 of coding point attributes in a sparse point cloud. Specifically, the hierarchical tree shown in FIG. 7 is an octree. Method 700 may be employed to implement the generate LOD module 129 in the encoder 100 or the generate LOD module 229 in the decoder 200. Accordingly, the method 700 may be used to encode or decode attributes of a point cloud to/from an attribute bitstream. Method 700 is substantially similar to method 500, but splits each current node into eight child nodes.

[00104] In method 700, points are assigned to a first node 701 in a first LOD 711. The first layer node 701 is split into second layer nodes 703 in a second LOD 712. The second layer nodes 703 are split into third layer nodes 705 in a third LOD 713. The third layer nodes 705 are split into fourth layer nodes 707 in a final LOD 714. The nodes of method 700 are split in a manner that is similar to the nodes of method 500. However, each node is split into four child nodes instead of two child nodes. For example, the first node 701 can be split at a centroid as in mechanism 600. However, the first node 701 is split by an XY plane, a YZ plane, and an XZ plane (e.g., instead of by a single plane). This creates eight sub-nodes that become the second layer nodes 703. The same process can be used to split the second layer nodes 703 into the third layer nodes 705 and the third layer nodes 705 into the fourth layer nodes 707. The lifting algorithm can then be applied to method 700 in substantially the same manner as in method 500. For example, the centroid point selected to position the planes for splitting the current node can be selected as a predictor for child nodes. As another example, a predictor for each child node can be selected as a nearest point in a neighboring node.

[00105] FIG. 8 is a schematic diagram of another example hierarchical tree 800 based method of coding point attributes in a sparse point cloud. Specifically, the hierarchical tree shown in FIG. 8 is a hybrid binary tree. Method 800 may be employed to implement the generate LOD module 129 in the encoder 100 or the generate LOD module 229 in the decoder 200. Accordingly, the method 800 may be used to encode or decode attributes of a point cloud to/from an attribute bitstream. Method 800 is substantially similar to method 500, but creates fewer LODs. Specifically, centroid compute is skipped for every other split, which reduces the number of LODs.

[00106] In method 800, points are assigned to a first node 801. The first layer node 801 is split into second layer nodes 803, which forms a first LOD 811. The first layer node 801 is split by a plane that bisects the center of the first layer node 801 (e.g., an XY plane, a XZ plane, or a YZ plane). However, the centroid is not computed for the first layer node 801. The lack of centroid computation is denoted by the dashed lines of the first node 801. The second layer nodes 803 are split into third layer nodes 805 according to centroid computation. The third layer nodes 805 are split into fourth layer nodes 807 in a final LOD 812 without centroid computation. This approach causes a node to be split into four child nodes per LOD. This may reduce the number of centroid computations and/or reduce the number of predictors as compared to method 500. This may increase coding speed at the cost of coding efficiency as the predictors may be less accurate. Further, fewer LODs may be desirable for large point clouds, and hence the method 800 may be beneficial for some point clouds.

[00107] An example implementation of method 800 can be described formally as follows. All of the points from a point cloud data can be extracted and included in a node of a binary tree. The binary tree depth can be set as according to:

where d is the depth of a binary tree, round is a rounding function, log is a logarithm function, and N is a number of the points. The centroid for the nodes is computed starting from the root node, but centroid computation is skipped for alternate layers. For example, when d is even, centroid computation at alternate layers starting at the root node is skipped. Further, when d is odd, centroid computation at alternate layers starting at the first layer is skipped. Namely, j = d/2 if d is even, j = (d/2) + 1 otherwise. Refinement layers are then generated for the LODs according to:

where R are the refinement layers, j specifies a refinement layer R(j), I _j specifies a current layer of the binary tree, and LOD(j-l) specifies the preceding LOD. Such steps can be iteratively completed until all the points in the point cloud are traversed. The lifting algorithm can then be employed to encode the attribute data. As with methods 500 and 700, the centroids can be used as predictors for method 800. Also as with methods 500 and 700, predictors can be selected for a node from a nearest point in a selected neighboring node.

[00108] FIG. 9 is a schematic diagram of an example media coding device 900. The media coding device 900 is suitable for implementing the disclosed examples/embodiments as described herein. The media coding device 900 comprises downstream ports 920, upstream ports 950, and/or transceiver units (Tx/Rx) 910, including transmitters and/or receivers for communicating data upstream and/or downstream over a network. The media coding device 900 also includes a processor 930 including a logic unit and/or central processing unit (CPU) to process the data and a memory 932 for storing the data. The media coding device 900 may also comprise electrical, optical-to-electrical (OE) components, electrical-to-optical (EO) components, and/or wireless communication components coupled to the upstream ports 950 and/or downstream ports 920 for communication of data via electrical, optical, or wireless communication networks. The media coding device 900 may also include input and/or output (I/O) devices 960 for communicating data to and from a user. The I/O devices 960 may include output devices such as a display for displaying media data, speakers for outputting audio data, etc. The I/O devices 960 may also include input devices, such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces for interacting with such output devices.

[00109] The processor 930 is implemented by hardware and software. The processor 930 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field- programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs). The processor 930 is in communication with the downstream ports 920, Tx/Rx 910, upstream ports 950, and memory 932. The processor 930 comprises a coding module 914. The coding module 914 implements the disclosed embodiments described above, such as methods 300, 500, 700, 800, 1000, 1100, 1200, and/or 1300 which may employ a mechanism 600. The coding module 914 may also implement any other method/mechanism described herein. Further, the coding module 914 may implement an encoder 100, and/or a decoder 200. For example, the coding module 914 can employ a hierarchical tree to split points in a point cloud into nodes based on point position. The coding module 914 can then apply a lifting algorithm or inverse lifting algorithm to encode or decode, respectively, attribute values for the points. Flence, coding module 914 allows the media coding device 900 to code point cloud attribute data using significantly fewer processing resources while providing increased coding efficiency when compared to other systems. As such, the coding module 914 improves the functionality of the media coding device 900 as well as addresses problems that are specific to the media coding arts. Further, the coding module 914 effects a transformation of the media coding device 900 to a different state. Alternatively, the coding module 914 can be implemented as instructions stored in the memory 932 and executed by the processor 930 (e.g., as a computer program product stored on a non-transitory medium).

[00110] The memory 932 comprises one or more memory types such as disks, tape drives, solid-state drives, read only memory (ROM), random access memory (RAM), flash memory, ternary content addressable memory (TCAM), static random-access memory (SRAM), etc. The memory 932 may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.

[00111] FIG. 10 is a flowchart of an example method 1000 of encoding point attributes in a sparse point cloud. Method 1000 can be employed to implement method 300, 500, 700, and/or 800, for example by employing mechanism 600. Accordingly, method 1000 may be employed to implement the generate LOD module 129 and/or the lifting module 131 in the encoder 100. As such, the method 1000 may be used to encode positions and attributes of points in a point cloud to a geometry bitstream and an attribute bitstream to create a combined bitstream. Method 1000 may also be applied by a coding module 914 of a media coding device 900.

[00112] Method 1000 may be initiated when an encoder receives input indicating that a media sequence containing a sparse point cloud should be encoded. At step 1001, one or more images of a media sequence can be stored in memory. At least one of the images comprises a sparse point cloud. The sparse point cloud contains a plurality of points.

[00113] At step 1003, an octree is applied to the points to encode a geometry bitstream describing positions of the points. For example, applying the octree can be accomplished by applying method 300 to the points as described above.

[00114] At step 1005, a hierarchical tree is applied to sort the points into nodes based on the position of the points. A lifting algorithm can also be employed to encode an attribute bitstream describing attribute values of the points. For example, the attributes of the points can be encoded as a series of predictors and residuals to compress the encoding of the attribute values. The hierarchical tree may be a binary tree, an octree, or a hybrid binary tree as discussed in methods 500, 700, and 800, respectively. Other hierarchical tree structures may also be used as desired. The hierarchical tree can be applied according to method 1100 as described below.

[00115] At step 1007, a variable, such as a flag, can be encoded in the attribute bitstream, the geometry bitstream, and/or the combined PCC bitstream to indicate the hierarchical tree is used to determine the LODs and attributes values. For example, a useinTreeLoD variable can be encoded in the attribute bitstream to indicate a binary tree is used to determine the LODs and attributes values.

[00116] At step 1009, a PCC bitstream containing the geometry bitstream and the attribute bitstream is transmitted to support reconstructing the media sequence. For example, a plurality of point cloud states can be decoded into a plurality of pictures, which can be arranged to create a media sequence for display or other computational analysis.

[00117] FIG. 11 is a flowchart of an example method 1100 of applying a hierarchical tree to support encoding point attributes in a sparse point cloud. Method 1100 can be employed to apply a hierarchical tree at step 1005 of method 1000. Accordingly, method 1100 can be employed when implementing method 300, 500, 700, and/or 800, for example by employing mechanism 600. Further, method 1100 may be employed to implement the generate LOD module 129 and/or the lifting module 131 in the encoder 100. As such, the method 1100 may be used to encode positions and attributes of points in a point cloud to a geometry bitstream and an attribute bitstream to create a combined bitstream. Method 1100 may also be applied by a coding module 914 of a media coding device 900.

[00118] Method 1100 is initiated when encoding attributes for points in a point cloud. At step 1101, the points of a point cloud are assigned to a parent node in a first layer of a plurality of LODs. A depth of the hierarchical tree with the parent node in the first layer acting as a root node can be determined according to: d = round

where d is the depth of a binary tree, round is a rounding function, log is a logarithm function, and N is a number of the points.

[00119] At step 1103, the parent node is recursively divided into child nodes in the plurality of LODs. The nodes can be recursively divided based on the positions of the points contained in the nodes. For example, recursively dividing the parent nodes into child nodes can include selecting a centroid for each current node. Each current node can be divided by one or more planes traversing the centroid for the current node.

[00120] At step 1105, an attribute value is encoded in the attribute bitstream for the parent node in the first layer of the LODs. Attribute values of child nodes are encoded as residuals based on a predictor from a preceding LOD. The residuals are also encoded in the attribute bitstream. For example, attribute values of each preceding LOD can be selected as attribute values at a corresponding centroid. As a specific example, the attribute values of the parent node and the child nodes can be encoded in refinement layers. The refinement layers are determined according to:

R(j) = I, - L0D(j - 1)

where R are the refinement layers, j specifies a refinement layer R(j), I _j specifies a current layer of the binary tree, and LOD(j-l) specifies the preceding LOD.

[00121] FIG. 12 is a flowchart of an example method 1200 of decoding point attributes in a sparse point cloud. Method 1200 can be employed to implement method 300, 500, 700, and/or 800, for example by employing mechanism 600. Accordingly, method 1200 may be employed to implement the generate LOD module 229 and/or the inverse lifting module 231 in the decoder 200. As such, the method 1200 may be used to decode positions and attributes of points in a point cloud from a geometry bitstream and an attribute bitstream to reconstruct a media sequence. For example, method 1200 may be used to decode an attribute bitstream created according to methods 1000 and/or method 1100. Method 1200 may also be applied by a coding module 914 of a media coding device 900.

[00122] Method 1200 may be initiated when a decoder received PCC data. At step 1201, a PCC bitstream is received. The PCC bitstream contains a geometry bitstream and an attribute bitstream. The geometry bitstream and the attribute bitstream collectively include an encoding of an image of a media sequence. The image includes a sparse point cloud containing a plurality of points.

[00123] At step 1203, an octree is applied to the geometry bitstream in order to decode positions of the points in the image. For example, applying the octree can be accomplished by applying method 300 to the points as described above.

[00124] At step 1205, a variable is obtained from the attribute bitstream, the geometry bitstream, and/or the PCC bitstream. The variable can be used to determine that a hierarchical tree is to be used to determine LODs and attributes values. For example, the variable may be a useinTreeLoD variable set to indicate that a binary tree should be used to determine the LODs and attributes values.

[00125] At step 1207, a hierarchical tree is applied based on the value of the variable determined at step 1205. The hierarchical tree is applied to sort the points in the point cloud into nodes based on the position of the points. An inverse lifting algorithm can also be employed to decode the attribute values of the points based on data in the attribute bitstream and the occupancy of the nodes of the hierarchical tree. This may result in decoding the attribute values of the points. For example, the inverse lifting algorithm can decode attribute values of the points that are encoded as a series of LOD based predictors and residuals. The hierarchical tree may be a binary tree, an octree, or a hybrid binary tree as discussed in methods 500, 700, and 800, respectively. Other hierarchical tree structures may also be used as desired. The hierarchical tree can be applied according to method 1300 as described below.

[00126] At step 1209, the sparse point cloud is reconstructed as part of a reconstructed image. One or more images may be reconstructed and placed in order as part of a reconstructed media sequence. Based on the forgoing, the sparse point cloud is reconstructed based on the positions of the points and the attribute values of the points.

[00127] FIG. 13 is a flowchart of an example method 1300 of applying a hierarchical tree to support decoding point attributes in a sparse point cloud. Method 1300 can be employed to apply a hierarchical tree at step 1207 of method 1200. Accordingly, method 1300 can be employed when implementing method 300, 500, 700, and/or 800, for example by employing mechanism 600. Further, method 1300 may be employed to implement the generate LOD module 229 and/or the inverse lifting module 231 in the decoder 200. As such, the method 1300 may be used to decode positions and attributes of points in a point cloud from a geometry bitstream and an attribute bitstream to reconstruct a media sequence. For example, method 1300 may be used to decode an attribute bitstream created according to methods 1000 and/or method 1100. Method 1300 may also be applied by a coding module 914 of a media coding device 900.

[00128] Method 1300 is initiated when decoding attributes for points in a point cloud. At step 1301, the points of a point cloud are assigned to a parent node in a first layer of a plurality of LODs. A depth of the hierarchical tree with the parent node in the first layer acting as a root node can be determined according to: d = round

where d is the depth of a binary tree, round is a rounding function, log is a logarithm function, and N is a number of the points.

[00129] At step 1303, the parent node is recursively divided into child nodes in the plurality of LODs. The nodes can be recursively divided based on the positions of the points contained in the nodes. For example, recursively dividing the parent nodes into child nodes can include selecting a centroid for each current node. Each current node can be divided by one or more planes traversing the centroid for the current node.

[00130] At step 1305, an inverse lifting algorithm is applied. For example, an attribute value for the parent node (e.g., the root node) is set as an attribute value for a first layer of the LODs. Attribute values for the child nodes are determined based on residuals for the child nodes and a predictor from a preceding LOD. In some examples, attribute values of each preceding LOD describe attribute values at a corresponding centroid determined at step 1303. As a specific example, the attribute values of the parent node and the child nodes may be stored in the attribute bitstream in refinement layers. The refinement layers are determined according to:

R(j) = I _j - L0D (j - 1)

where R are the refinement layers, j specifies a refinement layer R(j), I _j specifies a current layer of the binary tree, and LOD(j-l) specifies the preceding LOD.

[00131] FIG. 14 is a schematic diagram of an example system 1400 for coding point attributes in a sparse point cloud according to a hierarchical tree. System 1400 may be implemented by an encoder and a decoder, such as encoder 100 and decoder 200. System 1400 may also be implemented on a media coding device 900. Further system 1400 may be employed when implementing mechanism 600 and/or method 300, 500, 700, 800, 1000, 1100, 1200, and/or 1300.

[00132] The system 1400 includes a media encoder 1402. The media encoder 1402 comprises a storing module 1401 for storing an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points. The media encoder 1402 further comprises a geometry bitstream module 1403 for applying an octree to the points to encode a geometry bitstream describing positions of the points. The media encoder 1402 further comprises an attribute bitstream module 1405 for applying a hierarchical tree to the points to encode an attribute bitstream describing attribute values of the points. The media encoder 1402 further comprises a transmitting module 1407 for transmitting a PCC bitstream containing the geometry bitstream and the attribute bitstream to support reconstructing the media sequence. The media encoder 1402 may be further configured to perform any of the steps of method 1000 and/or 1100.

[00133] The system 1400 also includes a media decoder 1410. The media decoder 1410 comprises a receiving module 1411 for receiving a PCC bitstream containing a geometry bitstream and an attribute bitstream that include an encoding of an image of a media sequence, the image comprising a sparse point cloud containing a plurality of points. The media decoder 1410 further comprises a geometry bitstream module 1413 for applying an octree to the geometry bitstream to decode positions of the points in the image. The media decoder 1410 further comprises an attribute bitstream module 1415 for applying a hierarchical tree to the points based on the positions of the points to decode attribute values of the points. The media decoder 1410 further comprises a reconstruction module 1417 for reconstructing the sparse point cloud to reconstruct the image as part of the media sequence, wherein the sparse point cloud is reconstructed based on the positions of the points and the attribute values of the points. The media decoder 1410 may be further configured to perform any of the steps of method 1200 and/or 1300.

[00134] A first component is directly coupled to a second component when there are no intervening components, except for a line, a trace, or another medium between the first component and the second component. The first component is indirectly coupled to the second component when there are intervening components other than a line, a trace, or another medium between the first component and the second component. The term“coupled” and its variants include both directly coupled and indirectly coupled. The use of the term“about” means a range including ±10% of the subsequent number unless otherwise stated.

[00135] It should also be understood that the steps of the exemplary methods set forth herein are not necessarily required to be performed in the order described, and the order of the steps of such methods should be understood to be merely exemplary. Likewise, additional steps may be included in such methods, and certain steps may be omitted or combined, in methods consistent with various embodiments of the present disclosure.

[00136] While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented. [00137] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.

Previous Patent: VISION BASED RECOGNITION OF GAMING CHIPS

Next Patent: METHOD AND SYSTEM FOR ASSESSING HUMAN MOVEMENTS