Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
HIERARCHICAL ARTICULATED NEURAL RADIANCE FIELD FOR FAST 3D RECONSTRUCTION AND RENDERING
Document Type and Number:
WIPO Patent Application WO/2023/165706
Kind Code:
A1
Abstract:
An object deformation apparatus (700) configured to: obtain an input volume (101) comprising the object (109); sample one or more point (110) in the input volume (101);deform the one or more input volume sampled point (110) to a canonical volume (102); sample one or more point (115) in a region (116) of the canonical volume (102) surrounding each deformed input volume sampled point (113); estimate the density of each deformed input volume sampled point (113) in dependence on the density of the one or more corresponding surrounding sampled point (115); and render an output volume (107) in dependence on the estimated density of each deformed input volume sampled point (113). The apparatus may enable a reduction in the number of samples to be deformed.

Inventors:
SZABO ATTILA (DE)
YUAN SHANXIN (DE)
BUSAM BENJAMIN (DE)
ZHOU YIREN (DE)
LEONARDIS ALES (DE)
Application Number:
PCT/EP2022/055569
Publication Date:
September 07, 2023
Filing Date:
March 04, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
SZABO ATTILA (DE)
International Classes:
G06T13/40; G06T19/20
Other References:
PENG SIDA ET AL: "Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies", 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 10 October 2021 (2021-10-10), pages 14294 - 14303, XP034093773, DOI: 10.1109/ICCV48922.2021.01405
PARK KEUNHONG ET AL: "Nerfies: Deformable Neural Radiance Fields", 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 10 October 2021 (2021-10-10), pages 5845 - 5854, XP034093136, DOI: 10.1109/ICCV48922.2021.00581
MILDENHALL, BEN ET AL.: "European conference on computer vision", 2020, SPRINGER, article "Nerf: Representing scenes as neural radiance fields for view synthesis"
MULLER, THOMAS ET AL.: "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding", ARXIV PREPRINT ARXIV:2201.05989, 2022
PARK, KEUNHONG ET AL.: "Nerfies: Deformable neural radiance fields", PROCEEDINGS OF THE IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2021
PENG, SIDA ET AL.: "Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans", PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2021
PENG, SIDA ET AL.: "Animatable neural radiance fields for modeling dynamic human bodies", PROCEEDINGS OF THE IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2021
Attorney, Agent or Firm:
KREUZ, Georg M. (DE)
Download PDF:
Claims:
CLAIMS

1. An object deformation apparatus (700), the apparatus (700) comprising one or more processors (701) and a memory (702) storing in non-transient form data defining program code executable by the one or more processors (701) to implement an object deformation model (100), the apparatus (700) being configured to: obtain an input volume (101) comprising the object (109); sample one or more point (110) in the input volume (101); deform the one or more input volume sampled point (110) to a canonical volume (102); sample one or more point (115) in a region (116) of the canonical volume (102) surrounding each deformed input volume sampled point (113); estimate the density of each deformed input volume sampled point (113) in dependence on the density of the one or more corresponding surrounding sampled point (115); and render an output volume (107) in dependence on the estimated density of each deformed input volume sampled point (113).

2. The object deformation apparatus (700) of claim 1 , wherein the apparatus (700) is configured to sample one or more point (110) in the input volume (101) by means of casting rays (111) from the pixels.

3. The object deformation apparatus (700) of claim 1 or 2, wherein the apparatus (700) is configured to deform the one or more input volume sampled point (110) to the canonical volume (102) by converting each input volume sampled point (110) into a coordinate in the canonical volume (102) and deforming the coordinate in the canonical volume (102) to a required configuration.

4. The object deformation apparatus (700) of any preceding claim, wherein the apparatus (700) is configured to calculate an ellipsoid (116) surrounding each deformed input volume sampled point (113), and sample the one or more point (115) in a region (116) of the canonical volume (102) surrounding each deformed input volume sampled point (113) by projecting from the deformed input volume sampled point (113) onto the ellipsoid (116).

5. The object deformation apparatus (700) of any preceding claim, wherein the apparatus (700) is configured to estimate the density of each deformed input volume sampled point (113) as being the maximum density of the one or more corresponding surrounding sampled point (115).

6. The object deformation apparatus (700) of any preceding claim, wherein the apparatus (700) is configured to render the output volume (107) by aggregating the density of each deformed input volume sampled point (113) by means of trilinear interpolation.

7. The object deformation apparatus (700) of any preceding claim, wherein the input volume (101) and the output volume (107) comprise an observed volume further comprising the object (109).

8. The object deformation apparatus (700) of any preceding claim, wherein the apparatus (700) is configured to compare the estimated density of each deformed input volume sampled point (113) against a density threshold, and to render the output volume (107) in dependence on only the estimated density of each deformed input volume sampled point (113) that is designated to have an estimated density above the density threshold.

9. The object deformation apparatus (700) of any preceding claim, wherein the apparatus (700) is configured to repeat the steps of any preceding claim for one or more iterations, and wherein the apparatus (700) is configured to sample of the input volume (101) with a different resolution in each iteration.

10. The object deformation apparatus (700) of claim 9, wherein the apparatus (700) is configured to increase the resolution of the sampling of one or more point (110) in the input volume (101) in each subsequent iteration.

11. The object deformation apparatus (700) of claim 10 when dependent on claim 8, wherein the apparatus (700) is configured to, in a subsequent iteration, only sample one or more point (110) in the input volume (101) which are in a region that, in the previous iteration, is designated to have an estimated density above the density threshold.

12. A method (500) for deforming an object, the method (500) comprising: obtaining an input volume comprising the object (501); sampling one or more point in the input volume (502); deforming the one or more input volume sampled point to a canonical volume (503); sampling one or more point in a region of the canonical volume surrounding each deformed input volume sampled point (504); estimating the density of each deformed input volume sampled point in dependence on the density of the one or more corresponding surrounding sampled point (505); and rendering an output volume in dependence on the estimated density of each deformed input volume sampled point (506).

13. An object deformation apparatus (700), the apparatus (700) comprising one or more processors (701) and a memory (702) storing in non-transient form data defining program code executable by the one or more processors (700) to implement an object deformation model (100), the apparatus (700) being configured to: obtain an input volume (101) comprising the object (109); obtain a deformation model of the object (109), the deformation model comprising one or more surface point (205); diffuse each deformation model surface point (205) to a point (206) on a grid (207); sample one or more point (110) in the input volume (101) in dependence on one or more point (206) on the grid (207); and deform each input volume sampled point (110) to a canonical volume (102).

14. The object deformation apparatus (700) of claim 13, wherein the deformation model is a blendshape model.

15. The object deformation apparatus (700) of claim 13 or 14, wherein the apparatus (700) is configured to diffuse each deformation model surface point (205) to the nearest point (206) on the grid (207) to the deformation model surface point (205).

16. The object deformation apparatus (700) of any of claims 13 to 15, wherein apparatus (700) is configured to sample one or more point (110) in the input volume (101) in dependence on an interpolation between one or more point (206) on the grid (207).

17. The object deformation apparatus (700) of claim 16, wherein the apparatus (700) is configured to sample one or more point (110) in the input volume (101) in dependence on an interpolation between at least the two nearest points (206) on the grid (207).

18. The object deformation apparatus (700) of any of claims 13 to 17, wherein the apparatus (700) is configured to record a link (304) between each deformation model surface point (205) and a point (206) on the grid (207), and sample one or more point (110) in the input volume (101) in dependence on one or more link (304) between a deformation model surface point (205) and a point (206) on the grid (207).

19. The object deformation apparatus (700) of claim 18, wherein the apparatus (700) is configured to record a link (304) between each deformation model surface point (205) and the nearest point (206) on the grid (207) to the deformation model surface point (205), and sample one or more point (110) in the input volume (101) in dependence on the link (304) between one or more deformation model surface point (205) and the nearest point (206) on the grid (207).

20. A method (600) for deforming an object, the method (600) comprising: obtaining an input volume comprising the object (601); obtaining a deformation model of the object, the deformation model comprising one or more surface point (602); diffusing each deformation model surface point to a point on a grid (603); sampling one or more point in the input volume in dependence on one or more point on the grid (604); and deforming each input volume sampled point to a canonical volume (605).

Description:
HIERARCHICAL ARTICULATED NEURAL RADIANCE FIELD FOR FAST 3D RECONSTRUCTION AND RENDERING

FIELD OF THE INVENTION

This invention relates to object deformation, for example for rendering volumes with a deformed object.

BACKGROUND

Object deformation can enable 3D volumes to be modified, manipulated, and controlled for various applications.

In input volume may comprise a feature such as a body. The body may be a human or animal body. The body may include characteristics such as the arrangement and appearance. The arrangement may be a pose, such as having an arm in the air. The appearance may be the clothing or look of the face of the human body. The characteristic features may be extracted from the input image into a learned model.

It can be advantageous to deform the arrangement of the body while maintaining the overall appearance. For example, it may be advantageous to modify the pose of the human body such that both arms are in the air while also maintaining the overall appearance of the human body. If this pose is not already known from an input volume, then this pose may be known as a novel pose.

Volumetric rendering techniques, such as neural radiance field (NeRF), may produce very high-quality renderings. The main disadvantage of many early NeRF based methods is a slow training and rendering speed. There are two main causes of the slow speed: (i) there may be a high number of samples in the volume that are used to calculate the pixel colours, and (ii) each sample takes a lot of time to calculate, as each sample requires a costly neural network (NN) evaluation at the sample location.

Fast NeRF methods may solve the speed issue both for training and test. This may be because the methods use: (i) efficient data structures, such as octrees, that allows that skipping of empty spaces, thus reducing the number of samples, and (ii) directly learnable features in these data structures and combine them with much smaller NN by conditioning the NN on them. Both accessing the features and evaluating the small networks may be much faster than evaluating larger networks in early NeRF works. When considering dynamic or deformable scenes, the speed may also be a problem. This may be because: (i) the deformation has to be calculated for many samples, as the deformation models may not be compatible with the fast data structures, and (ii) evaluating the deformation may be costly for each sample.

M i Idenhal I , Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham, 2020 has considered synthesizes of novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. This model may struggle with rendering speed.

Sun, Cheng, Min Sun, and Hwann-Tzong Chen. "Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction." arXiv preprint arXiv:2111.11215 (2021) has considered a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This most may struggle with rendering speed.

Muller, Thomas, et al. "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding." arXiv preprint arXiv:2201.05989 (2022) has considered reducing the cost of MPL with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations. This model may struggle with human objects.

Park, Keunhong, et al. "Nerfies: Deformable neural radiance fields." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021 has considered a model that is capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. This model may have difficulty in dealing with large deformation and controllability.

Peng, Sida, et al. "Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021 has considered new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh. This model may produce low quality for novel view synthesis and a lack of fine details. NeRFRig: Neural Radiance Field Rig for Human 3D Shape and Appearance Modeling has considered a 3D human NeRF model capable of rigging articulated human body. This model may struggle with rendering quality and fast training and inference.

Peng, Sida, et al. "Animatable neural radiance fields for modeling dynamic human bodies." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021 has considered neural blend weight fields to produce the deformation fields for human modelling. This model may not address the training or inference speed.

Currently there is a need for methods that simultaneously solve the following 3 problems:

(i) to train and render volumetric representations, such as neural radiance field NeRF, quickly (i.e., training in minutes and rendering at interactive frame rates at 1 Mpixel resolution);

(ii) to render in high quality; and

(iii) to render dynamic scenes and deformable objects (i.e., the ability to animate 3D human avatars).

It is desirable to develop an apparatus and method that overcomes the above problems.

SUMMARY

According to a first aspect there is provided an image deformation apparatus, the apparatus comprising one or more processors and a memory storing in non-transient form data defining program code executable by the one or more processors to implement an object deformation model, the apparatus being configured to: obtain an input volume comprising the object; sample one or more point in the input volume; deform the one or more input volume sampled point to a canonical volume; sample one or more point in a region of the canonical volume surrounding each deformed input volume sampled point; estimate the density of each deformed input volume sampled point in dependence on the density of the one or more corresponding surrounding sampled point; and render an output volume in dependence on the estimated density of each deformed input volume sampled point. By rendering the output volume in dependence on the estimated density of each deformed input volume, this may enable the number of sampled points used in the rendering to be reduced, which may reduce the computational cost.

In some implementations, the apparatus may be configured to sample one or more point in the input volume by means of casting rays from the pixels. By casting rays from the pixels, this may enable the points to be sampled along a line at equidistant points and aligned with the fru strum.

In some implementations, the apparatus may be configured to deform the one or more input volume sampled point to the canonical volume by converting each input volume sampled point into a coordinate in the canonical volume and deforming the coordinate in the canonical volume to a required configuration. By deforming the coordinate in the canonical volume, this can enable the deformation to be carried out in the 2D canonical volume which may reduce the computational cost compared to the 3D observed volume.

In some implementations, the apparatus may be configured to calculate an ellipsoid surrounding each deformed input volume sampled point, and sample the one or more point in a region of the canonical volume surrounding each deformed input volume sampled point by projecting from the deformed input volume sampled point onto the ellipsoid. By sampling the points in the ellipsoid, this can enable further points to be used for the rendering, without the need to sample further points, which may save computational cost.

In some implementations, the apparatus may be configured to estimate the density of each deformed input volume sampled point as being the maximum density of the one or more corresponding surrounding sampled point. By estimating the density of the deformed input volume sampled point based on the maximum surrounding density, this can enable system to be weighted towards a higher density rendering.

In some implementations, the apparatus may be configured to render the output volume by aggregating the density of each deformed input volume sampled point by means of trilinear interpolation. By aggregating the density of the different deformed input volume sampled points, this may enable the output volume to be built up.

In some implementations, the apparatus may be configured wherein the input volume and the output volume comprise an observed volume further comprising the object. In this way, the system may be applied to 3D object deformation.

In some implementations, the apparatus may be configured to compare the estimated density of each deformed input volume sampled point against a density threshold, and to render the output volume in dependence on only the estimated density of each deformed input volume sampled point that is designated to have an estimated density above the density threshold. By only rendering the output volume based on the deformed input volume sampled points with a density above a density threshold, this may enable the lower density deformed input volume sampled points, which are in empty space, to be disregarded, which may reduce the computational cost.

In some implementations, the apparatus may be configured to repeat the steps of any preceding claim for one or more iterations, and wherein the apparatus is configured to sample of the input volume with a different resolution in each iteration. In this way, a different number of deformed input volume sampled points may be generated, which may enable the rendering quality versus computational cost to be varied.

In some implementations, the apparatus may be configured to increase the resolution of the sampling of one or more point in the input volume in each subsequent iteration. In this way, the number of sampling points in the early iterations, where the sampling may not be as targeted, to be reduced, and the number of sampling points in the later iterations, where the sampling may be more targeted, to be increased. This may provide a good rendering quality compared to the computational cost.

In some implementations, the apparatus may be configured to, in a subsequent iteration, only sample one or more point in the input volume which are in a region that, in the previous iteration, is designated to have an estimated density above the density threshold. In this way, the sampling may be more targets as the iterations go on, which may reduce the number of sampling points, which may reduce the computational cost.

According to a second aspect there is provided a method for deforming an object, the method comprising: obtaining an input volume comprising the object; sampling one or more point in the input volume; deforming the one or more input volume sampled point to a canonical volume; sampling one or more point in a region of the canonical volume surrounding each deformed input volume sampled point; estimating the density of each deformed input volume sampled point in dependence on the density of the one or more corresponding surrounding sampled point; and rendering an output volume in dependence on the estimated density of each deformed input volume sampled point. By rendering the output volume in dependence on the estimated density of each deformed input volume, this may enable the number of sampled points used in the rendering to be reduced, which may reduce the computational cost.

According to a third aspect there is provided an object deformation apparatus, the apparatus comprising one or more processors and a memory storing in non-transient form data defining program code executable by the one or more processors to implement an object deformation model, the apparatus being configured to: obtain an input volume comprising the object; obtain a deformation model of the object, the deformation model comprising one or more surface point; diffuse each deformation model surface point to a point on a grid; sample one or more point in the input volume in dependence on one or more point on the grid; and deform each input volume sampled point to a canonical volume. By sampling one or more point in the input volume in dependence on one or more point on the grid, this may enable the cost of deforming a sampled point may be reduced, which may reduce the computational cost.

In some implementations, the apparatus may be configured wherein the deformation model is a blendshape model. In this way, the system may be applied to 3D observed volumes.

In some implementations, the apparatus may be configured to diffuse each deformation model surface point to the nearest point on the grid to the deformation model surface point. In this way, the grid estimation of the deformation model surface point may provide an accurate estimation.

In some implementations, the apparatus may be configured to sample one or more point in the input volume in dependence on an interpolation between one or more point on the grid. In this way, the sampling may be targeted based on the surrounding points on the grid.

In some implementations, the apparatus may be configured to sample one or more point in the input volume in dependence on an interpolation between at least the two nearest points on the grid. In this way, the targeted sampling may be more accurate.

In some implementations, the apparatus may be configured to record a link between each deformation model surface point and a point on the grid, and sample one or more point in the input volume in dependence on one or more link between a deformation model surface point and a point on the grid. In this way, the sampling may be targeted based on the surrounding points on the grid.

In some implementations, the apparatus may be configured to record a link between each deformation model surface point and the nearest point on the grid to the deformation model surface point, and sample one or more point in the input volume in dependence on the link between one or more deformation model surface point and the nearest point on the grid. In this way, the targeted sampling may be more accurate. According to a fourth aspect there is provided a method for deforming an object, the method comprising: obtaining an input volume comprising the object; obtaining a deformation model of the object, the deformation model comprising one or more surface point; diffusing each deformation model surface point to a point on a grid; sampling one or more point in the input volume in dependence on one or more point on the grid; and deforming each input volume sampled point to a canonical volume. By sampling one or more point in the input volume in dependence on one or more point on the grid, this may enable the cost of deforming a sampled point may be reduced, which may reduce the computational cost.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

Figure 1 illustrates the stages an example input volume may undergo in an exemplary object deformation process.

Figure 2 illustrates the stages an example input volume may undergo in another exemplary object deformation process.

Figure 3 illustrates the stages an example input volume may undergo in another exemplary object deformation process.

Figure 4 schematically illustrates an exemplary structure of the network architecture used in the image deformation apparatus.

Figure 5 illustrates an example method for deforming an input volume.

Figure 6 illustrates another example method for deforming an input volume.

Figure 7 illustrates an example of an apparatus configured to perform the methods described herein.

DETAILED DESCRIPTION

The apparatuses and methods described herein concern using an object deformation model.

Embodiments of the present system may tackle one or more of the problems previously mentioned by rending the output volume in dependence on the estimated density of each deformed input volume sampled point. In this way, this may enable the number of sampled points used in the rendering to be reduced, which may reduce the computational cost. Additionally, the present system may tackle one or more of the problems previously mentioned by sampling one or more point in the input volume in dependence on one or more point on the grid. In this way, this may enable the cost of deforming a sampled point may be reduced, which may reduce the computational cost. The present system may provide a deformation model that aims to be fast and is compatible with fast volumetric rendering methods. This may be provided by means of (i) a hierarchical sampling model, which may reduce the number of costly samples, and (ii) hierarchical deformation model, which may be computed fast for each sample. The speedup from the two models may be multiplied, as they may be combined. In this way, the present system may achieve high quality and speed, and enable deformation and control for animatable human 3D avatars.

For image deformation, the forward model may be a STAR human 3D model as defined in Equation 1 . x is a 3D vertex point on the mesh of the blendshape model, y is its deformed point in the observed volume, and s is the shape and pose parameters.

Y = STAR(x, s) (1)

The blendshape model can be defined by Equation 2. B(x) and aj(x) are only defined on the vertex points. Thus, the blendshape may only deform the triangle mesh. y = S 7 a 7 (x)P 7 (s)(x + B(x)s) (2)

The apparatus may invert the blendshape model as defined in Equation 3. x = InvSTAR(STAR(x, s), s) (3)

However, the inversion may be defined for all 3D points and not only for the deformed vertices. To do this, a diffusion may be applied as defined by Equation 4. y is a point in the observed volume and x and y t are the blendshape vertices in the canonical and observed volume respectively. The a 7 blendshape parameter may be diffused similarly.

The invention may be defined by Equation (5). B(x) may be estimated by the diffused B(y). In the way described with regards to Equations 1 to 5, the system, as in the prior art, may be required to compute the distances between all sampled point and vertex pairs. Instead, the present system may use a coarse fine grid structure, where the diffusion can be done efficiently, as described herein with regards to Figure 1.

Figure 1 illustrates the stages of a hierarchical sampling method of the present system. The hierarchical sampling method may aim to lower the computational cost by reducing the number of samples to be deformed.

Figure 1 illustrates the stages an example input volume may undergo in an exemplary object deformation process.

The object deformation apparatus may be configured to obtain an input volume 101. The apparatus may be configured to receive the input volume 101 from a separate apparatus, for example a separate computing apparatus. Alternatively, the apparatus may be configured to generate the input volume 101 itself.

The input volume 101 may comprise an object 109. The object 109 may comprise a body, such as a human or animal body. By having a body object 109 the object deformation apparatus may be used for VR/AR, gaming and avatars, and virtual try-on implementations, for example. In these implementations, aspects of the body, such as the arms or legs, may be deformed to present the viewer with an animated body representation.

The input volume 101 may comprise a 3D image. The input volume 101 may comprise an observed volume. The input, or observed, volume 101 may comprise a 3D space. The object

109 may be situated in the 3D space of the input volume 101.

The apparatus may be configured to sample a point 110 in the input volume 101. The apparatus may be configured to sample one or more point 110 in the input volume 101. The apparatus may be configured sample the one or more point 110 in the input volume 101 by means of casting rays 111 from the pixels. The rays 111 may be cast from the pixels in the 3D space of the input, or observed, volume 101. The one or more point 110 may be sampled along the rays 111. For each ray 111 that belongs to a pixel a ray 111 may be cast. On the ray 111 , equidistant points 110 may be selected between the nearest and furthest distance. The selection of the points 110 may depend on the bounding volume of the object 109. The points

110 may be considered as a grid that is aligned with the frustum. The apparatus may be configured to generate one or more input volume sampled point 110 by sampling the input volume 101.

The observed volume 101 may deform frame to frame. For example, this may occur when a human is animated. In order to render the observed volume 101 , the apparatus may deform the sampled points 110 to a canonical volume 102.

The apparatus may be configured to deform the one or more input volume sampled point 110 to the canonical volume 102. The apparatus may be configured to deform each input volume sampled point 110 to the canonical volume 102. The canonical volume 102 may comprise a 2D space in which a volumetric appearance model is associated. For example, the canonical volume 102 may comprise a multi-layer perceptron in NeRF. The deformation may be parametrised with a 4x4 matrix for each deformed input volume sampled point 113. The apparatus may be configured to generate one or more deformed input volume sampled point 113.

As illustrated in Figure 1 , at step 103, the object 109 of the input volume 101 may be deformed to a deformed canonical volume object 112. Similarly, the one or more input volume sampled point 110 may be deformed to the one or more deformed input volume sampled point 113. The rays 111 of the input volume 101 may be deformed to deformed canonical volume rays 114.

The apparatus may be configured to deform the one or more input volume sampled point 110 to the canonical volume 102 by converting each input volume sampled point 110 into a coordinate in the canonical volume 102 and deforming the coordinate in the canonical volume 102 to a required configuration. The apparatus may be configured to convert each input volume sampled point 110 into a coordinate in the canonical volume 102. Subsequently, the apparatus may be configured to deform the coordinate in the canonical volume 102 to a required configuration. It will also be appropriated that other methods or ordering may be used to deform the one or more input volume sampled point 110 to the canonical volume 102, for example, by the methods described herein with regards to Figures 2 and 3.

As illustrated in step 104 of Figure 1 , the apparatus may be configured to sample one or more point 115 in a region 116 of the canonical volume 102 surrounding each deformed input volume sampled point 113. As illustrated in Figure 1 , the region 116 surrounding each deformed input volume sampled point 113 may comprise an ellipsoid 116. The apparatus may be configured to calculate the region 116 surrounding each deformed input volume sampled point 113, and sample the one or more point 115 in the region 116 of the canonical volume 102 surrounding each deformed input volume sampled point 113 by projecting from the deformed input volume sampled point 113 onto the outer surface of the region 116. In the particular case where the region 116 comprises an ellipsoid 116, the apparatus may be configured to calculate the ellipsoid 116 surrounding each deformed input volume sampled point 113, and sample the one or more point 115 in the region 116 of the canonical volume 102 surrounding each deformed input volume sampled point 113 by projecting from the deformed input volume sampled point 113 onto the ellipsoid 116. The projection may be carried out in random directions from the deformed input volume sampled point 113. The apparatus may be configured to calculate the ellipsoids 116 by transforming an ellipsoid from the observed, or input, volume 101. The ellipsoid in the observed, or input volume

101 , may be defined by tightly fitting in the grid that was defined by the sampling of the observed, or input volume 101. The apparatus may be configured to generate the one or more surrounding sampled point 115.

The apparatus may be configured to evaluate the density values of the one or more deformed input volume sampled point 113 and the corresponding one or more surrounding sampled point 115. The apparatus may be configured to obtain the density of the one or more surrounding sampled point 115. The apparatus may be configured to calculate the density of the one or more surrounding sampled point 115 by extracting the information from the canonical volume

102. The apparatus may be configured to calculate the density of the one or more surrounding sampled point 115 by using a fast data structure NeRF.

As shown in step 105 of Figure 1 , the apparatus may be configured to estimate the density of each deformed input volume sampled point 113. The apparatus may be configured to estimate the density of each deformed input volume sampled point 113 in dependence on the density of the one or more corresponding surrounding sampled point 115. In other words, the density of the one or more surrounding sampled point 115 in the particular region, or ellipsoid, 116 which surrounds the particular deformed input volume sampled point 113 may be used to estimate the density of the particular deformed input volume sampled point 113. This estimation may be applied to each of the one or more deformed input volume sampled point 113.

The apparatus may be configured to estimate the density of each deformed input volume sampled point 113 as being the maximum density of the one or more corresponding surrounding sampled point 115. In other words, the value of density associated with the deformed input volume sampled point 113 may be the maximum of the densities of the surrounding sampled point 115 in the region, or ellipsoid, 116. In other words, the maximum of the densities of the surrounding sampled point 115 in the region, or ellipsoid, 116 may be considered as the density of the corresponding samples 110 in the observed space 101.

As shown in step 105 of Figure 1 , the deformed input volume sampled points 113 with a lower density may be designated with a white point 117. The deformed input volume sampled points 113 with a higher density may be designated with a black point 118. The deformed input volume sampled points 113 with a medium density may be designated with a grey point 118. The lower density 117 may be designated where the deformed input volume sampled point 113 does not fall in the region of the object 112. The higher density 118 may be designated where the deformed input volume sampled point 113 does fall in the region of the object 112. The medium density 119 may be designated where the deformed input volume sampled point 113 falls on the boundary of the region of the object 112.

The apparatus may be configured to compare the estimated density of each deformed input volume sampled point 113 against a density threshold. The apparatus may be configured to designate the density of the deformed input volume sampled point 113 as medium 119 or higher 118 if the density is above the density threshold. The apparatus may be configured to designate the density of the deformed input volume sampled point 113 as lower 117 if the density is below the density threshold. The density threshold may be predetermined. In other words, the density threshold may be determined by the user of the apparatus. The density threshold may be varied accordingly to the type and/or requirements of the input volume 101. For example, if the input volume 101 has a lower overall density, then the density threshold may be lower. Similarly, if the input volume 101 has a higher overall density, then the density threshold may be higher.

The apparatus may be configured render an output volume 107. The apparatus may be configured to generate and output the output volume 107. The output volume 107 may comprise the object 109. Like the input volume 101 , the output volume 107 may comprise a 3D image. The output volume 107 may comprise an observed volume. The output, or observed, volume 107 may comprise a 3D space. The object 109 may be situated in the 3D space of the output volume 107.

The output volume 107 may comprise the same appearance features of the input volume, but the object 109 may be in a different orientation. Similarly, the output volume sampled points 120 may be in a different orientation to the input volume sampled point 110. The output volume rays 121 may be in a different orientation to the input volume rays 111.

The apparatus may be configured render the output volume 107 in dependence on the estimated density of each deformed input volume sampled point 113. In particular, the apparatus may be configured to render the output volume 107 by aggregating the density of each deformed input volume sampled point 113 by means of trilinear interpolation. In other words, the estimated density each deformed input volume sampled point 113 may be built up, or combined together, to generate the output volume 107. The trilinear interpolation may upsample the densities and generate a grid of densities in the frustum. This may represent a 2x higher resolution grid of samples than the samples 110 in the observed volume 101.

The apparatus may be configured to render the output volume 107 in dependence on only the estimated density of each deformed input volume sampled point 113 that is designated to have an estimated density above the density threshold. In other words, deformed input volume sampled points 113 that have a density below the density threshold may be disregarded in the rendering of the output volume 107. In this way, only the deformed input volume sampled points 113 which fall in, or on the boundary of, the object 112 are used for the rendering of the output volume 107. This may save computational cost as the background, or empty, points need not to be rendered. This may be illustrated in step 106 of Figure 1 , in which only the medium 119 and higher 118 densities, which fall in, or on the boundary, of the object 112 are used to render the output volume 107.

The apparatus may be configured to perform a hierarchical sampling process. In particular, the apparatus may be configured to perform a hierarchical sampling process in a coarse to fine fashion. As shown in step 108 of Figure 1 , the apparatus may be configured to repeat steps 101 to 107. The apparatus may be configured to repeat the steps 101 to 107 for a different level of sampling resolution. The steps 101 to 107 may be repeated one or more times, to generate one or more iterations. In each iteration, the apparatus may be configured to increase in the sampling resolution. In particular, the apparatus may be configured to increase the resolution of the sampling of one or more point 110 in the input volume 101 in each subsequent iteration. In this way, the apparatus may generate a higher resolution, and better-quality output volume 107 in each iteration. As an example, the resolution may be increase by 2x, 4x, 8x or more at each iteration until a desired resolution level is reach.

The apparatus may be configured to, in a subsequent iteration, only sample one or more point 110 in the input volume 101 which are in a region that, in the previous iteration, is designated to have an estimated density above the density threshold. In other words, in each subsequent iteration, the steps 101 to 107 may only be carried out in the regions that the previous iteration were designated as falling in, or on the boundary, of the object 112. In this way, the apparatus may be configured to disregard regions of the input volume 101 which do not comprise the object 109. This may enable the computational cost to be reduced, as areas of the background, or empty regions, of the input volume 101 are not put through the steps 101 to 107 when there is not the need.

The apparatus may be configured to render the output volume 107 by combining the density of each deformed input volume sampled point 114 of one or more of the iterations. The apparatus may be configured to render the output volume 107 by means of numerical integration of the different iterations. This may enable to pixel colour of each pixel in the output volume 107 to be determined.

Figures 2 and 3 illustrate the stages of a deformation model of the present system. The deformation model may aim to lower the computational cost of deforming a sample.

Figure 2 illustrates the stages an example input volume may undergo in another exemplary object deformation process. In particular, Figure 2 illustrates an alternative method of the deforming step 102 in Figure 1. In this way, the stages described herein with regards to Figure 2 may be used in combination with the steps described herein with regards to Figure 1.

The apparatus may be configured to obtain an input volume 101 comprising an object 109 as described herein with regards to Figure 1.

The apparatus may be configured to obtain a deformation model of the object 109. The apparatus may be configured to receive the deformation model from a separate apparatus, for example a separate computing apparatus. Alternatively, the apparatus may be configured to generate the deformation model itself. The deformation model may comprise a blendshape model. The blendshape model may comprise pose and shape characteristics. The blendshape model may be controlled by the pose and shape inputs. The blendshape model may comprise a mesh which comprises the blendshape, or deformation, model points 205.

The deformation model may comprise one or more surface point 205. As shown in step 201 of Figure 2, the surface points 205 may be distributed over the surface of the object 109. In this way, the surface points 205 may comprise a blendshape mesh according to the shape and pose inputs of the observed volume 101. As shown in step 202 of Figure 2, the apparatus may be configured to diffuse 208 each deformation model surface point 205 to a point 206 on a grid 207. The grid 207 may comprise a plurality of points 206 distributed evenly across the grid 207. The grid 207 may be in 3D. In other words, the parameters of the deformation, or blendshape, model may be diffused in 3D. This may aggregate the blendshape information to a gridpoint. In this way, the number of points to be sampled from the deformation model surface points 205 may be reduced. Put another way, not all of the points in the input volume 101 may need to be sampled. This may reduce the computational cost. As illustratively shown in step 202 of Figure 2, three deformation model surface points 205 may be reduced to a single point 206 on the grid 207. In particular, the apparatus may be configured to diffuse each deformation model surface point 205 to the nearest point 206 on the grid 207. The nearest point 206 may be the nearest neighbour vertex point. In this way, the estimation function of the grid may be more accurate.

The apparatus may be configured to sample one or more point 110 in the input volume 101 as described herein with regards to Figure 1. In particular, the apparatus may be configured to sample one or more point 110 in the input volume 101 in dependence one or more points 206 on the grid 207. This may enable the gridpoint information to be used to obtain the blendshape parameters. In this way, the sampling of the points 110 in the input volume 101 may be targeted based on points 110 which fall within the object 109. This may reduce the number of sampling points 110 by disregarding points 110 which do not fall within the object 109. This may in turn reduce the computational cost.

As shown in step 203 of Figure 2, the apparatus may be configured to sample 209 one or more point 110 in the input volume 101 in dependence on an interpolation between one or more points 206 on the grid 207. In other words, the apparatus may be configured to interpolate between points 206 which surround the point 110 in the input volume 101. This may allow the sampling of the input volume 101 to be based on a range, or average, of points 206 on the grid. In particular, the apparatus may be configured to sample one or more point 110 in the input volume 101 in dependence on an interpolation between at least the two nearest points 206 on the grid 207. The nearest point 206 may be the nearest neighbour vertex point. This may allow the sampling of the input volume 101 to be further targeted to points 206 that surround the sampling points 110 and fall within the object.

As shown in step 204 of Figure 2, the apparatus may be configured to deform each input volume sampled point 110 to a canonical volume 102 as described with regards to Figure 1. As the number of input volume sampled points 110 may have been reduced by the steps described herein with regards to Figure 2, the number of input volume sampled points 110 to be deformed may be reduced. This may reduce the computational cost.

The apparatus may be configured to render the output volume 107 as described herein with regards to Figure 1 , based on the one or more deformed input volume sampled point 113.

Figure 3 illustrates the stages an example input volume may undergo in another exemplary object deformation process. In particular, Figure 3 illustrates an alternative method of the deforming step 102 in Figure 1. In this way, the stages described herein with regards to Figure 3 may be used in combination with the steps described herein with regards to Figure 1 and Figure 2.

The apparatus may be configured to obtain an input volume 101 comprising an object 109 as described herein with regards to Figure 1 and Figure 2.

The apparatus may be configured to obtain a deformation model of the object 109 as described herein with regards to Figure 2.

As shown in step 301 of Figure 3, the apparatus may be configured to record a link 304 between each deformation model surface point 205 and a point 206 on a grid 207. The grid 207 may comprise a plurality of points 206 distributed evenly across the grid 207. The grid 207 may be in 3D. As illustratively shown in step 301 of Figure 3, three deformation model surface points 205 may be linked to a single point 206 on the grid 207. In particular, the apparatus may be configured to record a link 304 between each deformation model surface point 205 and the nearest point 206 on the grid 207. The nearest point 206 may be the nearest neighbour vertex point. In this way, the linking function of the grid may be more accurate.

The apparatus may be configured to sample one or more point 110 in the input volume 101 as described herein with regards to Figure 1 and Figure 2.

In particular, as shown in step 302 of Figure 3, the apparatus may be configured to sample 209 one or more point 110 in the input volume 101 in dependence on one or more link 304 between a deformation model surface point 205 and a point 206 on the grid 207. In other words, the apparatus may be configured to may use the link 304 to target the sampling of the input volume 101. In particular, the apparatus may be configured to sample one or more point 110 in the input volume 101 in dependence on the link 304 between one or more deformation model surface point 205 and the nearest point 206 on the grid 207. The nearest point 206 may be the nearest neighbour vertex point. This may allow the sampling of the input volume 101 to be further targeted to points 206 that surround the sampling points 110 and fall within the object.

As shown in step 303 of Figure 3, the apparatus may be configured to deform each input volume sampled point 110 to a canonical volume 102 as described with regards to Figure 1 and Figure 2. As the number of input volume sampled points 110 may have been reduced by the steps described herein with regards to Figure 3, the number of input volume sampled points 110 to be deformed may be reduced. This may reduce the computational cost.

The apparatus may be configured to render the output volume 107 as described herein with regards to Figure 1 , based on the one or more deformed input volume sampled point 113.

The steps described herein with regards to Figure 2 and Figure 3 may be carried out independently or in combination. For example, the sampling of the points 110 of the input volume 110 may be carried out in dependence on one or more of: (i) one or more point 206 on the grid 207, (ii) an interpolation between one or more point 206 on the grid 207, such as the nearest two points 206 on the grid 207, and (iii) one or more link 304 between a deformation model surface point 205 and a point 206 on the grid 207, such as the nearest point 206 on the grid 207. Each of these steps may further target the sampling of the points 110 of the input volume 110, which may reduce the computational cost of the deformation step.

The present system as described herein with regards to Figures 1 to 3 may provide the following advantages: (i) enable deformations and dynamic animations for fast and high quality volumetric rendering methods, (ii) speed up high quality deformable rendering methods, (iii) hierarchical sampling may be compatible with many different fast NeRF methods, as the present system may be agnostic to the specific fast data structure that they use, and (iv) the present system may have no restrictions on the type of volumetric representation, for example any NeRF may be used with any training losses.

Figure 4 schematically illustrates an exemplary structure of the network architecture used in the object deformation apparatus.

From a given camera viewpoint, rays may be cast corresponding to each pixel of the output image 401. For each ray, the points 402, 403 may sampled along the ray in the deformed volume. The object deformation apparatus network architecture 400 may comprise an inverse deformation model InvNet 404, preferably a neural network. The InvNet 404 may be used to deform the 3D points 402, 403 back to the canonical volume. Equation 6 represents in the process carried out by InvNet 404. The 3D points in the deformed volume are denoted by y and blendshape parameters are denoted by p. x = InvNet(y, p)' (6)

A sparse trained articulated human body regressor (STAR) model 408 may be used to further separate p into pose and shape arrangement parameters. Preferably Invnet 404 is able to invert the blendshape model to the canonical domain. The representation may be used as a differentiable Tenderer for articulated (rigged) humans. This may provide fine grained 3D reconstruction and pose estimation by backpropagating errors from 2D images to blendshape parameters. Equation 7 represents the process carried out by the STAR model 408. y = STAR(x, p) (7)

The points 405, 406 in the canonical volume 407 may be fed to a neural rendering network (NeRF) 409, which estimates the density, s, and colour values, c, of the points 405, 406. Equation 8 represents the process carried out by the NeRF network. The ray direction is denoted by d. The colour of the point may depend on the ray direction. s, c = NeRF(x, d) (8)

The NeRF network may also take the ray directions 410 as input, because the colour values can be view dependent. A numerical integration method may be used to sum up the contribution of each 3D point to the pixel colour. Equations 9 and 10 represent the process carried out in the integration. The subscript denotes the index of the sampled 3D points density and colour values along the ray r. t denotes the distance between the consecutive points. Because the x coordinates depend on the inversion of y using InvNet, the pixel colour C may depend on r and p.

During inference the apparatus may use the trained InvNet and NeRF networks to render humans, or other bodies, from arbitrary poses p and from arbitrary viewpoints by casting rays from those viewpoints and perform the rendering process on the rays.

Figure 5 summarises an example of a method 500 for deforming an object. At step 501 , the method 500 comprises obtaining an input volume comprising the object. At step 502, the method 500 comprises sampling one or more point in the input volume. At step 503, the method 500 comprises deforming the one or more input volume sampled point to a canonical volume. At step 504, the method 500 comprises sampling one or more point in a region of the canonical volume surrounding each deformed input volume sampled point. At step 505, the method 500 comprises estimating the density of each deformed input volume sampled point in dependence on the density of the one or more corresponding surrounding sampled point. At step 506, the method 500 comprises rendering an output volume in dependence on the estimated density of each deformed input volume sampled point.

Figure 6 summarises an example of a method 600 for deforming an object. At step 601 , the method 600 comprises obtaining an input volume comprising the object. At step 602, the method 600 comprises obtaining a deformation model of the object, the deformation model comprising one or more surface point. At step 603, the method 600 comprises deforming each deformation model surface point to a point on a grid. At step 604, the method 600 comprises sampling one or more point in the input volume in dependence on one or more point on the grid. At step 605, the method 600 comprises deforming each input volume sampled point to a canonical volume.

An example of an apparatus 700 configured to implement the methods described herein is schematically illustrated in Figure 7. The apparatus 700 may be implemented on an electronic device, such as a laptop, tablet, smart phone or TV.

The apparatus 700 comprises a processor 701 configured to process the datasets in the manner described herein. For example, the processor 701 may be implemented as a computer program running on a programmable device such as a Central Processing Unit (CPU). The apparatus 700 comprises a memory 702 which is arranged to communicate with the processor

701. Memory 702 may be a non-volatile memory. The processor 701 may also comprise a cache (not shown in Figure 7), which may be used to temporarily store data from memory 702. The apparatus 700 may comprise more than one processor 701 and more than one memory

702. The memory 702 may store data that is executable by the processor 701. The processor 701 may be configured to operate in accordance with a computer program stored in non- transitory form on a machine-readable storage medium. The computer program may store instructions for causing the processor 701 to perform its methods in the manner described herein.

Specifically, the object deformation apparatus 701 may comprise one or more processors, such as processor 701 , and a memory 702 storing in non-transient form data defining program code executable by the processor(s) to implement an object deformation model. The object deformation apparatus may obtain an input volume comprising the object. The object deformation apparatus may sample one or more point in the input volume. The object deformation apparatus may deform the one or more input volume sampled point to a canonical volume. The object deformation apparatus may sample one or more point in a region of the canonical volume surrounding each deformed input volume sampled point. The object deformation apparatus may estimate the density of each deformed input volume sampled point in dependence on the density of the one or more corresponding surrounding sampled point. The object deformation apparatus may render an output volume in dependence on the estimated density of each deformed input volume sampled point.

Specifically, the object deformation apparatus 701 may comprise one or more processors, such as processor 701 , and a memory 702 storing in non-transient form data defining program code executable by the processor(s) to implement an object deformation model. The object deformation apparatus may obtain an input volume comprising the object. The object deformation apparatus may obtain a deformation model of the object, the deformation model comprising one or more surface point. The object deformation apparatus may diffuse each deformation model surface point to a point on a grid. The object deformation apparatus may sample one or more point in the input volume in dependence on one or more point on the grid. The object deformation apparatus may deform each input volume sampled point to a canonical volume.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description, it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.