Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A DEVICE FOR PERFORMING A RECURSIVE RASTERIZATION
Document Type and Number:
WIPO Patent Application WO/2022/131949
Kind Code:
A1
Abstract:
The disclosure relates to rendering a digital scene. To this end, it provides a device for performing recursive rasterization, which may be used for rendering the scene. The device may select, for each first tile of a render target, first primitives from an acceleration structure. The selected first primitives are included in a first frustum associated with the first tile, and the device may provide the selected first primitives for rasterization Then, it may receive first rasterized tiles, each obtained by rasterizing the first primitives selected for one first tile, and may obtain one or more second tiles by processing the first rasterized tiles. Then, the device may select, for each second tile, second primitives from the acceleration structure, wherein a set of second primitives is selected from each of one or more second frustums generated for said second tile, and may provide the selected second primitives for further rasterization.

Inventors:
GLUSHKOV NIKITA VADIMOVICH (CN)
Application Number:
PCT/RU2020/000688
Publication Date:
June 23, 2022
Filing Date:
December 14, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
GLUSHKOV NIKITA VADIMOVICH (CN)
International Classes:
G06T11/40; G06T15/50
Foreign References:
US20110254852A12011-10-20
US20180197268A12018-07-12
US20200051332A12020-02-13
Other References:
J. ARVO: "Tiled shadow maps", COMPUTER GRAPHICS INTERNATIONAL. PROCEEDINGS, 2004, US, pages 240 - 247, XP055642317, ISSN: 1530-1052, DOI: 10.1109/CGI.2004.1309216
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY & PARTNERS" LTD. et al. (RU)
Download PDF:
Claims:
CLAIMS

1. A device (100) for recursive rasterization, the device (100) being configured to: determine a render target (107) comprising a plurality of first tiles; obtain an acceleration structure (101); obtain one or more first frustums (103), wherein each first tile of the render target (107) is associated with one first frustum (103); select, for each first tile, first primitives (102) from the acceleration structure (101), wherein the selected first primitives (102) are included in the first frustum (103) associated with the first tile; write the selected first primitives (102) into a primitive storage (104) of a rendering pipeline; receive first rasterized tiles (106) from a rasterizer (105) of the rendering pipeline, wherein each first rasterized tile (106) has been obtained by rasterizing the first primitives (102) selected for one first tile of the render target (107); process the first rasterized tiles (106) using a filter shader (108) to obtain one or more second tiles (109); generate one or more second frustums (110) for each second tile (109); select, for each second tile (109), second primitives (111) from the acceleration structure (101), wherein a set of second primitives (111) is selected from each of the one or more second frustums (110) generated for said second tile (109); write the selected second primitives (111) into the primitive storage (104); and receive, from the rasterizer (105), one or more second rasterized tiles (112) for each second tile (109), wherein each second rasterized tile (112) has been obtained by rasterizing one set of second primitives (111) selected for said second tile (109). . The device (100) according to claim 1, further configured to: merge each second tile (109) with the second rasterized tiles (112) that correspond to said second tile (109); and output the merging result to the render target (107).

28 The device (100) according to claim 1, further configured to: process the second rasterized tiles (112) using a filter shader (108) to obtain one or more third tiles; generate one or more third frustums for each third tile; select, for each third tile, third primitives from the acceleration structure (101), wherein a set of third primitives is selected from each of the one or more third frustums generated for said third tile; write the selected third primitives into the primitive storage (104); and receive, from the rasterizer (105), one or more third rasterized tiles for each third tile, wherein each third rasterized tile has been obtained by rasterizing one set of third primitives selected for said third tile. The device (100) according to claim 3, further configured to: merge each second tile (109) with the second rasterized tiles (112) that correspond to said second tile (109) and further with the third rasterized tiles that correspond to said second rasterized tiles (112) corresponding to said second tile (109); and output the merging result to the render target (107). The device (100) according to one of the claims 1 to 4, further configured to: repeat, one or more times until a determined condition is achieved, the steps of processing, selecting, writing, and receiving; then merge all tiles (109) and rasterized tiles (106, 112) that correspond to another; and output the merging result to the render target (107). The device (100) according to claim 5, wherein said steps are repeated in accordance with one or more rendering requirements of a control shader. The device (100) according to any one of the preceding claims, wherein at least one of the first primitives (102), and the second primitives (111), and the third primitives comprise triangles.

8. The device (100) according to any one of the preceding claims, wherein at least one of the first primitives (102), and the second primitives (111), and the third primitives comprise one or more pairs of a triangle and a material.

9. The device (100) according to claim 8, wherein a material includes a description of one or more algorithms and parameters required for rasterizing.

10. The device (100) of any one of the preceding claims, wherein the device (100) is configured to: separately obtain an individual first frustum (103) for each first tile or group of first tiles included in the plurality of first tiles of the render target (107); and/or separately generate one or more individual second frustums (110) for each second tile (109).

11 . The device (100) according to any one of the preceding claims, configured to: perform the steps of selecting the first primitives (102), writing the selected first primitives (102) into the primitive storage (104), and receiving the first rasterized tiles (106), in parallel for all first tiles or in parallel for groups of first tiles; and/or perform the steps of processing the first rasterized tiles (106), selecting the second primitives (111) for the second tiles (109), writing the selected second primitives

(111) into the primitive storage (104), and receiving the second rasterized tiles

(112), in parallel for all second tiles (109) or in parallel for groups of second tiles (109).

12. The device (100) according to any one of the preceding claims, configured to: perform the steps of selecting the first primitives (102), writing the selected first primitives (102) into the primitive storage (104), and receiving the first rasterized tiles (106), separately for each first tile; and/or perform the steps of processing the first rasterized tiles (106), selecting the second primitives (111) for the second tiles (109), writing the selected second primitives

(111) into the primitive storage (104), and receiving the second rasterized tiles

(112), separately for each second tile (109). A system (10) for performing rasterization, the system (10) comprising: the device (100) according to any one of the claims 1 to 10; and the rendering pipeline, wherein the rendering pipeline includes the primitive storage (104), the rasterizer (105), and a memory (113) connected to the rasterizer (105). The system (10) according to claim 13, wherein the rasterizer (105) is configured to: rasterize the first primitives (102) selected for the plurality of first tiles of the render target (107) to obtain the first rasterized tiles (106), wherein for each first tile the first primitives (102) selected for said first tile are rasterized to obtain one first rasterized tile (106); and rasterize the second primitives (111) selected for the second tiles (109) to obtain the second rasterized tiles (112), wherein for each second tile (109) each set of second primitives (111) selected for said second tile (109) is rasterized to obtain one second rasterized tile (1 12). The system (10) according to claim 13 or 14, wherein the rasterizer (105) is configured to rasterize the first primitives (102) selected for the plurality of first tiles of the render target (107) tile per tile, and/or rasterize the sets of second primitives (111) selected for the second tiles (109) tile per tile. The system (10) according to any one of the claims 13 to 15, wherein: the first tiles, the first rasterized tiles (106), the second tiles (109), and the second rasterized tiles (112) are stored in the same space of the memory (113). A method (1000) for rasterization, the method comprising: determining (1001) a render target (107) comprising a plurality of first tiles; obtaining (1002) an acceleration structure (101); obtaining (1003) one or more first frustums (103), wherein each first tile of the render target (107) is associated with one first frustum (103); selecting (1004), for each first tile, first primitives (102) from the acceleration structure (101), wherein the selected first primitives (102) are included in the first frustum (103) associated with the first tile; writing (1005) the selected first primitives (102) into a primitive storage (104) of a rendering pipeline; receiving (1006) first rasterized tiles (106) from a rasterizer (105) of the rendering pipeline, wherein each first rasterized tile (106) has been obtained by rasterizing the first primitives (102) selected for one first tile of the render target (107); processing the first rasterized tiles (106) using a filter shader to obtain one or more second tiles (109); generating (1008) one or more second frustums (1 10) for each second tile (109); selecting (1009), for each second tile (109), second primitives (111) from the acceleration structure (101), wherein a set of second primitives (111) is selected from each of the one or more second frustums (110) generated for said second tile (109); writing (1010) the selected second primitives (111) into the primitive storage (104); and receiving (1011), from the rasterizer (105), one or more second rasterized tiles

(112) for each second tile (109), wherein each second rasterized tile (112) has been obtained by rasterizing one set of second primitives (111) selected for said second tile (109).

18. A computer program comprising a program code for performing the method (1000) according to claim 17, when executed on a computer.

32

Description:
A DEVICE FOR PERFORMING A RECURSIVE RASTERIZATION

TECHNICAL FIELD

The present disclosure relates to the field of digital image processing, in particular, to the field of rendering digital scenes. To this end, the disclosure provides a device and a method for performing a recursive rasterization, which may be used for the rendering of a scene. The recursive rasterization may include generating different sets of frustums for selecting different sets of primitives for rasterization, in particular, selecting different sets of primitives associated with the same tile of a render target.

BACKGROUND

Rendering a scene means generating an image of the scene from some model related to the scene, by means of a computer program or algorithm. In order to trivially render a three- dimensional (3D) digital scene into a render target, it is enough to use one camera and one rasterization, so that all pixels are rasterized using a single shared view frustum. Rasterization may thereby refer to the technique of converting primitives, such as polygons, into a rasterized format. The rasterization of a plurality of such primitives, may lead to the rendered scene.

In order to also show reflections from mirror surfaces, or refraction, or direct shadows from light sources, or some other kinds of special effects, some conventional rendering methods perform a secondary rasterization including the following steps: configuring an additional temporary render target, preparing one more temporary cameras with appropriate properties for the desired effect, rendering the scene into the temporary render target based on the temporary cameras by using rasterization, and - during the rendering of a main frame - applying the temporary render target to appropriate parts of the scene.

However, a typical problem with these conventional rendering methods is, that a modern graphics processing unit (GPU), which is configured to implement some rasterization algorithms, is not flexible enough for an efficient implementation of multi-bounce special rendering effects, such as reflections, refractions, sub-surface scattering or other techniques of global illumination. Thus, allocation and management of temporary render targets for each bounce of each effect have to be configured. Further, a complex order of secondary rasterizations for each effect has to be configured as well. However, such a complex dependency graph increases the costs of the development.

Moreover, it then has to be detected that the special effect is really visible in the frame, and the parameters of the visibility have to be understood, in order to avoid useless operations. This may be non-trivial for complex scenes. For instance, if a surface with special effect overlaps with another object in a frame, it is almost impossible to avoid unnecessary operations during the secondary rasterization(s). In addition, the GPU must save a lot of temporary data into a memory. Further, significant bandwidth related with such complex special effects is especially ineffective for mobile GPUs, because they have significant limitations of power consumption. Furthermore, allocation of the memory for the temporary framebuffer increases the power consumption due to retention of stored data in the GPU memory. Additionally, there is also a problem related to the fact that typical 3D products should work on different platforms using different GPU architectures, memory, and feature sets.

Overall, the complexity of control is very high, such that it is difficult to make complex effects, for example, dozens reflections in one frame, reflections from non-planar or non-spherical surfaces, reflection-in-reflection, mixing of the different kind of effects in one frame, and so on. Each secondary rasterization is further expensive in terms of memory operations and power consumption. Moreover, the maximum number of secondary rasterizations is limited, so that it is often difficult to render a lot of multi-bounce special effects for a frame, due to GPU limitations.

Another problem arises from the fact that it is quite complex to render a reflection from a non- planar surface or a multifaceted object, because of the multiple secondary rasterizations that are required. The reflection from non-planar surface could potentially be emulated using an approximation of the non-planar surface by a set of flat segments, and, then, combined using a distortion map. However, in both cases, it is not efficient to rasterize so many sub-buffers due to the overhead for rasterization commands formation. Another problem is, that if a 3D scene contains a lot of multi-bounce special effects, then rendering of such a scene can technically become very complex. The management of the temporary memory, buffers and textures, the evaluation of the objects’ visibility, the 3D object’s clipping complexity, and the computations of the camera parameters can be difficult. Reflection and refractions could be emulated using a cube-map, a sphere-map, or an abstract texture, if details of the reflection are hidden due to a muddy surface. Unfortunately, such kind of reflections or refractions are not geometrically proven, and provide the impression of reflection, not the real reflection appearance. Reflections and refractions on non-planar surfaces could be emulated using planar reflection or refraction and some distortion that emulates the curvature of a 3D surface. Unfortunately, this has a lot of limitations when being used for rendering 3D objects. For example, the problem of discontinuities on borders arises.

Moreover, it is also possible to use non-rasterization methods, such as, signed distance functions (SDF) based ray-marching. Unfortunately, these methods are very limited and complex for implementation. Another conventional technique is known as ray-tracing. This technique is based on the emulation of photon movements in a scene to be rendered, and is widely used, for example, for movies. Unfortunately, this method is very slow, especially without dedicated hardware support. Moreover, this technique is not available for mobile devices, due to their limitations in power consumption and chip size.

Furthermore, there are also rendering methods based on so-called “voxels”, wherein a voxel represents a value on a regular grid in a three-dimensional space. Ray-tracing methods can be very efficient for scenes represented in the optimized “voxel” format. There are many games that use this technique. Unfortunately, this technique is limited, because the voxels have certain sizes, which affect the quality of surfaces. Moreover, problems also arise in representing animated objects, which are used for modern real-time products. There are also so-called “screen space” special effects such as, for example, a screen space local reflection and other varieties of the G-Buffer post-processing. Unfortunately, these methods have problems at the borders of the frame. Furthermore, they are not efficient for the mobile platforms due to the significant bandwidth.

In addition to the above, a conventional tile-based architecture (TBA) is commonly used for GPUs to render scenes. This architecture divides a render target into compact sub-render targets (so-called tiles). A typical size of such a tile can be, for example, 16x16 pixels. From a technical point of view, a GPU can work with a lot of such tiles separately, and even almost independently from each other. This architecture allows optimizing the rendering of a scene by decreasing the bandwidth. By using the TBA, the driver of a GPU can place some temporary buffers into a fast local cache of the GPU, and can thus avoid useless transfer into a memory. However, all known implementations of the TBA have a significant limitation, namely, a tile memory will immediately be copied to the render target after rasterization of a primitive list created by a tiler.

In conclusion of the above, there is a need for improved techniques for rendering digital scenes.

SUMMARY

In view of the above-mentioned problems and disadvantages, an objective is, in particular, to provide a device and method, which are able to perform an improved rasterization suitable for rendering a scene. The above-mentioned disadvantages of the conventional rendering and rasterization methods, and particularly the TBA, should be overcome. It is another goal of this disclosure to provide a TBA-based solution for the rasterization. However, the solution should allow multiple (recursive) rasterizations, and the possibility of using different camera spaces for a tile or for a group of tiles.

These and other objectives are achieved by the embodiments provided in the enclosed independent claims. Advantageous implementations of the embodiments are further defined in the dependent claims.

A first aspect of this disclosure provides a device for recursive rasterization, the device being configured to determine a render target comprising a plurality of first tiles; obtain an acceleration structure; obtain one or more first frustums, wherein each first tile of the render target is associated with one first frustum; select, for each first tile, first primitives from the acceleration structure, wherein the selected first primitives are included in the first frustum associated with the first tile; write the selected first primitives into a primitive storage of a rendering pipeline; receive first rasterized tiles from a rasterizer of the rendering pipeline, wherein each first rasterized tile has been obtained by rasterizing the first primitives selected for one first tile of the render target; process the first rasterized tiles using a filter shader to obtain one or more second tiles; generate one or more second frustums for each second tile; select, for each second tile, second primitives from the acceleration structure, wherein a set of second primitives is selected from each of the one or more second frustums generated for said second tile; write the selected second primitives into the primitive storage; and receive, from the rasterizer, one or more second rasterized tiles for each second tile, wherein each second rasterized tile has been obtained by rasterizing one set of second primitives selected for said second tile. The device of the first aspect is able to support a recursive rasterization (which is performed by the rasterizer), i.e., is able to support a rasterization of at least the first primitives and the second primitives, which are selected from the acceleration structure and respectively included in the one or more first frustums and the one or more second frustums. The rasterizer may be a conventional rasterizer of a conventional rendering pipeline, which can advantageously be reused by the device. The recursive rasterization may be performed based on the first tiles of the render target and the second tiles generated after rasterization of the first tiles, and based on the corresponding first and second primitives selected by the device.

The processing of the first rasterized tiles may comprises normalizing colors and/or calculating special effects. A frustum may also be referred to as view frustum or viewing frustum, and may be a convex hexahedron, or a portion of a pyramid that lies between two planes cutting it, and describes the borders of a projection.

The device may use, in a first step, the one or more first frustums for selecting the first primitives individually for the first tiles of the render target. Further, in a second step, the one or more second frustums for selecting the sets of second primitives for the second tiles, wherein the second tiles are associated with the first tiles. Accordingly, the device may be configured to recursively select different sets of primitives for each tile, particularly, from the newly generated frustums associated with that tile. Therefore, the device enables an improved rasterization in comparison to the conventional rasterization methods.

The device of the first aspects supports a TBA-based solution. Advantageously, after selecting the first primitives for rasterization or the sets of second primitives for rasterization, respectively, the pipeline of a conventional TBA - including the rasterizer - may be (re)used for performing the rasterization of the selected primitives. The limitations of the conventional TBA are overcome in this disclosure. That is, because of the recursively generated first and second frustums, multiple rasterizations using different camera spaces may be performed for a tile or a groups of tiles. Further, results of a rasterization based on some primitives do not have to be immediately be copied to the render target.

The device of the first aspect provides the additional advantage that the rendering of a complex digital scene, particularly a 3D digital scene, can be significantly accelerated, due to the use of the acceleration structure (instead of, for instance, a tiler as is often used by conventional solutions), and due to the fact that the primitives are selected from the acceleration structure. This improvement can also provide a significant simplification of the development of Tenderers that use rasterization, since the acceleration structure helps avoiding a complicated management of digital scenes that are sent to a rasterizer, and is also beneficial for the organization of a command buffer for sending the scene to a rasterizer.

Notably, in this disclosure: A “scene” is a collection of 3D models, light sources and other kinds of objects in a world space, into which a camera may be placed, and is used to describe a scene for 3D rendering. Further, a “frustum” is a convex hexahedron, or a portion of a pyramid that lies between two planes cutting it, and describes the borders of a projection. Such a frustum may also be referred to as a view frustum or a viewing frustum, and may correspond to a region of space of the scene, e.g., that may appear on a display when the rendered scene is displayed. Further, “rasterization” may describe the task of taking an image defined in a vector graphics format (e.g., shapes) and converting it into a raster image (e.g., a series of pixels, dots or lines, which, when displayed together, create the image which was represented by the shapes). The rasterized image may, then, be displayed on a computer display, video display or printer, or stored in a bitmap file format. Therefore, as mentioned previously, “rasterization” may refer to the technique of converting primitives, such as polygons, into rasterized format. Moreover, an “acceleration structure” may be a subroutine that allows deciding as quickly as possible, which objects from the scene, a particular ray, frustum or other primitive, are likely to intersect and reject large group of objects, which it is know for certain that the primitive will never hit. Finally, a “primitive” may be basic drawing shape, for instance, at least one of a polygon and a triangle.

As mentioned above, the device of the first aspect supports the TBA. Thereby, tiled rendering, particularly tiled rasterization, describes the process of subdividing a computer graphics image by a regular grid in optical space and rendering or rasterizing, respectively, each section of the grid, or tile, separately. The advantage is that the amount of memory and bandwidth is reduced compared to immediate mode rendering systems, which draw the entire frame at once. This has made tile rendering systems, particularly tile rasterization systems, common for low-power handheld devices. Tiled rendering is sometimes known as a "sort middle" architecture, because it performs the sorting of the geometry in the middle of the graphics pipeline instead of near the end. A filter shader may be the shader stage that will process (rasterized) tiles of a render target after rasterization. A filter shader may be applied to each pixel of a rasterized tile, or to each group of pixels of the rasterized tile (for example, each group comprising 2x2 pixels, or 1x16 pixels, etc.). A different filter shader may be applied several times to one rasterized tile.

A filter shader may be implemented by a compute shader, which may be a program, for instance, executed on a GPU. A compute shader may provide high performance execution of general purpose computing tasks, utilizing large numbers of parallel processors on the GPU.

In an implementation form of the first aspect, the device is further configured to: merge each second tile with the second rasterized tiles that correspond to said second tile; and output the merging result to the render target.

The merging allows for an improved rendering of a scene using the recursive rasterization, wherein effects like reflections from mirror surfaces, or refraction, or direct shadows from light sources, or other kinds of special effects can be shown in an improved manner.

In an implementation form of the first aspect, the device is further configured to: process the second rasterized tiles using a filter shader to obtain one or more third tiles; generate one or more third frustums for each third tile; select, for each third tile, third primitives from the acceleration structure, wherein a set of third primitives is selected from each of the one or more third frustums generated for said third tile; write the selected third primitives into the primitive storage; and receive, from the rasterizer, one or more third rasterized tiles for each third tile, wherein each third rasterized tile has been obtained by rasterizing one set of third primitives selected for said third tile.

In this way, another recursive rasterization step is implemented. That is, here the method performed by the device comprises a first rasterization step based on the first primitives, a second rasterization step (first recursive rasterization step) based on the second primitives, and a third rasterization step (second recursive rasterization step) based on the third primitives. Of course, further recursive rasterization steps performed in the same manner are possible. In an implementation form of the first aspect, the device is further configured to: merge each second tile with the second rasterized tiles that correspond to said second tile and further with the third rasterized tiles that correspond to said second rasterized tiles corresponding to said second tile; and output the merging result to the render target.

In this way, the performance of rendering a scene can be significantly improved, in particular, with respect to non-trivial effects shown in an improved manner.

In an implementation form of the first aspect, the device is further configured to: repeat, one or more times until a determined condition is achieved, the steps of processing, selecting, writing, and receiving; then merge all tiles and rasterized tiles that correspond to another; and output the merging result to the render target.

The determined condition may be achieved by a control shader, and may indicate that all desired effects are covered by the rendering using the recursive rasterization.

In an implementation form of the first aspect, said steps are repeated in accordance with one or more rendering requirements of a control shader.

A control shader in this disclosure may allow controlling features of the recursive rasterization. For instance, the control shader may manage at least one of: a maximal number of rasterization steps (e.g., sub-rasterizations for the recursive rasterization), a source of new primitives (for example, from the acceleration structure or from a previous primitive set), and one or more other parameters relevant for the rasterization.

In an implementation form of the first aspect, at least one of the first primitives, and the second primitives, and the third primitives comprise triangles.

The triangles allow for an efficient rasterization and rendering of a digital scene into the render target with a very high quality. Generally, however, the primitives may comprise polygons.

In an implementation form of the first aspect, at least one of the first primitives, and the second primitives, and the third primitives comprise one or more pairs of a triangle and a material. Thus, the scene may be rendered by the multi-frustum rasterization taking into account material effects of objects, for instance, different kinds of reflections from different materials.

In an implementation form of the first aspect, a material includes a description of one or more algorithms and parameters required for rasterizing.

This may lead to an improved rendering of a digital scene, including the material-related effects.

In an implementation form of the first aspect, the device is configured to: separately obtain an individual first frustum for each first tile or group of first tiles included in the plurality of first tiles of the render target; and/or separately generate one or more individual second frustums for each second tile.

This enables the configuration of specific camera spaces for different first or second tiles, or for different groups of first or second tiles.

In an implementation form of the first aspect, the device is configured to: perform the steps of selecting the first primitives, writing the selected first primitives into the primitive storage, and receiving the first rasterized tiles, in parallel for all first tiles or in parallel for groups of first tiles; and/or perform the steps of processing the first rasterized tiles, selecting the second primitives for the second tiles, writing the selected second primitives into the primitive storage, and receiving the second rasterized tiles, in parallel for all second tiles or in parallel for groups of second tiles.

Thus, a particular efficient recursive rasterization can be performed by the device.

In an implementation form of the first aspect, the device is configured to perform the steps of selecting the first primitives, writing the selected first primitives into the primitive storage, and receiving the first rasterized tiles, separately for each first tile; and/or perform the steps of processing the first rasterized tiles, selecting the second primitives for the second tiles, writing the selected second primitives into the primitive storage, and receiving the second rasterized tiles, separately for each second tile. Thus, the recursive rasterization can be performed by the device with the highest level of performance.

A second aspect of this disclosure provides a system for performing rasterization, the system comprising: the device according to the first aspect or any of its implementation forms; and the rendering pipeline, wherein the rendering pipeline includes the primitive storage, the rasterizer, and a memory connected to the rasterizer.

In an implementation form of the second aspect, the rasterizer is configured to: rasterize the first primitives selected for the plurality of first tiles of the render target to obtain the first rasterized tiles, wherein for each first tile the first primitives selected for said first tile are rasterized to obtain one first rasterized tile; and rasterize the second primitives selected for the second tiles to obtain the second rasterized tiles, wherein for each second tile each set of second primitives selected for said second tile is rasterized to obtain one second rasterized tile.

In an implementation form of the second aspect, the rasterizer is configured to rasterize the first primitives selected for the plurality of first tiles of the render target tile per tile, and/or rasterize the sets of second primitives selected for the second tiles tile per tile.

In an implementation form of the second aspect, the first tiles, the first rasterized tiles, the second tiles, and the second rasterized tiles are stored in the same space of the memory.

This overcomes a drawback of the conventional TBA, and may lead to improved scene rendering including special effects shown in an improved manner.

The rasterizer may be further configured to receive a model- view-projection transformation for each tile before executing the rasterization.

A model view transformation may be the concatenation of a model transformation and a view transformation. A view transformation may define the position (location and orientation) of the camera, while the model transformation may define the frame’s position of the primitives to be drown. A projection transformation may define the characteristics of the camera, such as clip planes, field of view, or projection method. A transformation may be or comprise a matrix. The rasterizer may be further configured to execute a fragment shader algorithm on each tile after executing the rasterization.

A fragment shader may be the shader stage that processes a fragment generated by the rasterization into a set of colors and a single depth value. The fragment shader may be the OpenGL pipeline stage after a primitive is rasterized. For each sample of the pixels covered by a primitive, a "fragment" may be generated.

A third aspect of this disclosure provides a method for rasterization, the method comprising: determining a render target comprising a plurality of first tiles; obtaining an acceleration structure; obtaining one or more first frustums, wherein each first tile of the render target is associated with one first frustum; selecting, for each first tile, first primitives from the acceleration structure, wherein the selected first primitives are included in the first frustum associated with the first tile; writing the selected first primitives into a primitive storage of a rendering pipeline; receiving first rasterized tiles from a rasterizer of the rendering pipeline, wherein each first rasterized tile has been obtained by rasterizing the first primitives selected for one first tile of the render target; processing the first rasterized tiles using a filter shader to obtain one or more second tiles; generating one or more second frustums for each second tile; selecting, for second tile, second primitives from the acceleration structure, wherein a set of second primitives is selected from each of the one or more second frustums generated for said second tile; writing the selected second primitives into the primitive storage; and receiving, from the rasterizer, one or more second rasterized tiles for each second tile, wherein each second rasterized tile has been obtained by rasterizing one set of second primitives selected for said second tile.

In an implementation form of the third aspect, the method further comprises: merging each second tile with the second rasterized tiles that correspond to said second tile; and outputting the merging result to the render target.

In an implementation form of the third aspect, the method further comprises: processing the second rasterized tiles using a filter shader to obtain one or more third tiles; generating one or more third frustums for each third tile; selecting, for each third tile, third primitives from the acceleration structure, wherein a set of third primitives is selected from each of the one or more third frustums generated for said third tile; writing the selected third primitives into the primitive storage; and receiving, from the rasterizer, one or more third rasterized tiles for each third tile, wherein each third rasterized tile has been obtained by rasterizing one set of third primitives selected for said third tile.

In an implementation form of the third aspect, the method further comprises: merging each second tile with the second rasterized tiles that correspond to said second tile and further with the third rasterized tiles that correspond to said second rasterized tiles corresponding to said second tile; and outputting the merging result to the render target.

In an implementation form of the third aspect, the method further comprises: repeating, one or more times until a determined condition is achieved, the steps of processing, selecting, writing, and receiving; then merging all tiles and rasterized tiles that correspond to another; and output the merging result to the render target.

In an implementation form of the third aspect, said steps are repeated in accordance with one or more rendering requirements of a control shader.

In an implementation form of the third aspect, at least one of the first primitives, and the second primitives, and the third primitives comprise triangles.

In an implementation form of the third aspect, at least one of the first primitives, and the second primitives, and the third primitives comprise one or more pairs of a triangle and a material.

In an implementation form of the third aspect, a material includes a description of one or more algorithms and parameters required for rasterizing.

In an implementation form of the third aspect, the method further comprises: separately obtaining an individual first frustum for each first tile or group of first tiles included in the plurality of first tiles of the render target; and/or separately generating one or more individual second frustums for each second tile.

In an implementation form of the third aspect, the method comprises: performing the steps of selecting the first primitives, writing the selected first primitives into the primitive storage, and receiving the first rasterized tiles, in parallel for all first tiles or in parallel for groups of first tiles; and/or performing the steps of processing the first rasterized tiles, selecting the second primitives for the second tiles, writing the selected second primitives into the primitive storage, and receiving the second rasterized tiles, in parallel for all second tiles or in parallel for groups of second tiles.

In an implementation form of the third aspect, the method comprises performing the steps of selecting the first primitives, writing the selected first primitives into the primitive storage, and receiving the first rasterized tiles, separately for each first tile; and/or performing the steps of processing the first rasterized tiles, selecting the second primitives for the second tiles, writing the selected second primitives into the primitive storage, and receiving the second rasterized tiles, separately for each second tile.

A fourth aspect of this disclosure provides a computer program comprising a program code for performing the method according to the third aspect or any of its implementation forms, when executed on a computer.

A fifth aspect of the present disclosure provides a non-transitory storage medium storing executable program code which, when executed by a processor, causes the method according to the third aspect or any of its implementation forms to be performed.

BRIEF DESCRIPTION OF DRAWINGS

The above described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:

Fig. 1 shows a device for performing a recursive rasterization according to an embodiment, and shows a system according to an embodiment of the invention.

Fig. 2 shows a conventional pipeline for performing tile-based rasterization.

Fig. 3 shows a tiler of the conventional pipeline for performing tile-based rasterization. Fig. 4 shows a pipeline for performing a recursive tile-based rasterization according to an embodiment of the invention.

Fig. 5 shows a pipeline for performing a recursive tile-based rasterization according to an embodiment of the invention.

FIG. 6 shows a flow-diagram for a recursive tile-based rasterization according to an embodiment of the invention.

Fig. 7 illustrates reflection- in-reflection and the mixing of refraction with reflection, respectively, which can be rendered according to embodiments of the invention.

Fig. 8 illustrates a dynamic evaluation if the weight of a 3D object during recursive rasterization according to embodiments of the invention.

Fig. 9 shows an implementation of a system for recursive rasterization according to an embodiment of the invention.

Fig. 10 shows steps of a method for recursive rasterization according to an embodiment of the invention.

Fig. 11 shows further steps of the method of FIG. 10 for recursive rasterization according to an embodiment of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS

Fig. 1 shows a schematic representation of a device 100 according to an embodiment of the invention. The device 100 is configured for recursive rasterization, in particular, the device 100 is configured to support a rasterizer 105 to perform the recursive rasterization. For instance, the recursive rasterization may be performed to render a digital (3D) scene.

Further, Fig. 1 also shows a render target 107, a primitive storage 104, the rasterizer 105, and an (optional) memory 113, wherein the primitive storage 104, the rasterizer 105 and the optional memory 113 are part of a rendering pipeline and may be connected to each other. The rendering pipeline and the device 100 form a system 10 according to an embodiment of the invention, which is configured to perform the recursive rasterization proposed in this disclosure.

The device 100 is configured to determine the render target 107, wherein the render target 107 comprises a plurality of first tiles. The device 100 may determine the division of the render target 107 into the plurality of first tiles. Further, the device 100 is configured to obtain an acceleration structure 101, for instance, an acceleration structure 101 related to a scene to be rendered by using the recursive rasterization. That is, the device 100 may take the acceleration structure 101 related to the scene as and input. Further, the device 100 is configured to obtain one or more first frustums 103, i.e., the device 100 may take the one or more first frustums 103 as an input. If the device 100 obtains two or more first frustums 103, these multiple first frustums 103 may be smaller sub- frustums obtained based on a larger frustum. Notably, each first tile of the render target 107 is associated with one first frustum 103, wherein it is possible that the device 100 determines this association of the first tiles and the first frustums 103.

Then, the device 100 is configured to select first primitives 102 from the acceleration structure 101, wherein the first primitives 102 are included in the one or more first frustums 103. Thereby, the device 100 is configured to select, for each first tile of the render target 107, first primitives 102 from the acceleration structure 101 included in the associated first frustum 103. These selected first primitives 102 may thus be associated with the first tile. The selected first primitives 102 may include polygons and/or triangles. All selected first primitives 102, i.e. for all the first tiles, are then written by the device 100 into the primitive storage 104 of the rendering pipeline. Thereby, a first primitive list format may be used by the device 100. For example, the tile-specific first primitives 102 may be selected from the acceleration structure 101 by the device 100 using the corresponding tile-specific one or more first frustums 103, and may then be stored in the primitive list in association with their related first tile. Thereby, a standard format of the current TBA may be used for the first primitive list in the primitive storage 104.

The rasterizer 105 of the rendering pipeline may then rasterize the first primitives 102 selected for the plurality of first tiles of the render target 107, in order to obtain a plurality of first rasterized tiles 106. That is, the rasterizer 105 may access the primitive storage 104, and may obtain the first primitives 102 per (first) tile, for example, using the first primitive list. In particular, for each first tile of the plurality of first tiles, the first primitives 102 selected for said first tile are rasterized by the rasterizer 105, in order to obtain one first rasterized tile 106. The rasterizer 105 may be further connected to the memory 113, and may use the memory 113 to store or cache data during the rasterization. The rasterizer 105 may provide the first rasterized tiles 106 back to the device 100.

Accordingly, the device 100 is configured to receive the first rasterized tiles 106 from the rasterizer 105 of the rendering pipeline, wherein each first rasterized tile 106 has been obtained by rasterizing the first primitives 102 selected for one first tile of the render target 107.

The device 100 is further configured to process the first rasterized tiles 106 using a filter shader

108, in order to obtain one or more second tiles 109. That is, each processed first rasterized tile 106 may result in a second tile 109. However, the obtained second tiles 109 may also be less in number than the first rasterized tiles 106. That means, for some first rasterized tile(s) 106 the processing using the filter shader 108 may not yield (a) second tile(s) 109, for instance, because further rasterization may not be necessary for the tiles corresponding to these first rasterized tiles 106. For example, these tiles may not have any special effects. Notably, each processed first rasterized tile 106 may result in at most one second tile 109.

The device 100 is further configured to generate one or more second frustums 110 for each second tile 109. If the device 100 generates two or more second frustums 110 for a second tile

109, these two or more second frustums 110 may be smaller sub- frustums obtained based on a larger frustum. Notably, each second tile 109 may be associated with one second frustum 110 or with multiple second frustums 110, wherein the device 100 may be configured to determine this association of the second tiles 109 and the one or more second frustums 110.

Then, the device 100 is then configured to select second primitives 111 from the acceleration structure 101, wherein the second primitives 111 are included in the one or more second frustums 110. Thereby, the device 100 is configured to select, for each second tile 109, one or more sets of second primitives 111 from the acceleration structure 101, wherein a set of second primitives 111 is selected from each of the one or more second frustums 110 generated for said second tile 109. These selected sets of second primitives 1 11 may thus be associated with the second tile 109. The selected second primitives 111 may include polygons and/or triangles. All selected second primitives 1 11, i.e. for each second tile 109, are then written by the device 100 into the primitive storage 104 of the rendering pipeline. Thereby, a second primitive list format may be used by the device 100. For example, the tile-specific second primitives 111 may be selected from the acceleration structure 101 by the device 100 using the corresponding tilespecific one or more second frustums 110, and may then be stored in the second primitive list in association with their related second tile 109. Thereby, a standard format of the current TBA may be used for the second primitive list in the primitive storage 104.

The rasterizer 105 of the rendering pipeline may then rasterize the second primitives 111 selected for the one or more second tiles 109, in order to obtain one or more second rasterized tiles 112. That is, the rasterizer 105 may access the primitive storage 104, and may obtain the sets of second primitives 111 generated for each second tile 109, for example, using the second primitive list. In particular, for each second tile 109, the one or more sets of second primitives 111 selected for said second tile 109 are rasterized by the rasterizer 105, in order to obtain the one or more second rasterized tiles 112. The rasterizer 105 may be further connected to the memory 113, and may again use the memory 113 to store or cache data during the rasterization. The rasterizer 105 may provide the one or more second rasterized tiles 112 per second tile 109 back to the device 100.

Accordingly, the device 100 is configured to receive the second rasterized tiles 112 from the rasterizer 105 of the rendering pipeline, wherein each second rasterized tile 106 has been obtained by rasterizing one set of second primitives 111 selected for said second tile 109.

The device 100 may then be further configured to merge at least each second tile 109 with the second rasterized tiles 112 that correspond to said second tile 109, and to output the merging result to the render target 107. In this way, the device 100 may render a (3D) digital scene - wherein the scene is related to the input acceleration structure 101 - into the render target 107.

The device 100 may comprise a processor or processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the device 100 described herein. The processing circuitry may comprise hardware and/or the processing circuitry may be controlled by software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as applicationspecific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors. The device 100 may further comprise memory circuitry, which stores one or more instruction(s) that can be executed by the processor or by the processing circuitry, in particular under control of the software. For instance, the memory circuitry may comprise a non-transitory storage medium storing executable software code which, when executed by the processor or the processing circuitry, causes the various operations of the device 100 to be performed.

In one embodiment, the processing circuitry comprises one or more processors and a non- transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device 100 to perform, conduct or initiate the operations or methods described herein.

The device 100 may provide the advantage of a non-trivial rasterization, in particular, a recursive rasterization. Thereby, multiple first frustums 103 and/or multiple second frustums 110 may advantageously be selected for each first tile and/or for each second tile 109, respectively, and likewise in any further recursive rasterization step. Overall, this is beneficial for rendering non-planar reflections, refractions, complex shadow maps, and other special effects in the scene, which need a more complex camera space. Notably in this respect, the union of all first frustums 103 and/or all second frustums 110 used by the device 100 to select the first primitives 102 and the second primitives 111, respectively, may determine the complex camera space. Moreover, another advantage may be that separate camera subspaces may be configured by the device 100 per each first tile or per each second tile 109.

The device 100 may also provide the advantage that it is possible to extend the TB A and to render a scene from several directions. The results of each rasterization step (i.e., the first rasterization step rasterizing the first primitives 102, the first recursive rasterization step rasterizing the second primitives 111, and any further recursive rasterization step rasterizing third or further primitives) may be saved into a fast cache of e.g. a GPU, and the results may then be mixed together in the end by the device 100, in order to implement the special effect rendering of the scene. Moreover, it is possible to copy the obtained rasterized tiles 106, 112, into the render target 107. Notably, the architecture of the system 10 as shown in Fig. 1 can effectively be implemented in an already available tile-based GPU hardware, for instance, using a modified driver. Moreover, the ability to initiate several rasterization steps associated with a respective tile of the render target 107 has further benefits mentioned in the following. For example, there is no need to allocate and manage temporary render targets. All intermediately created data (rasterization results before final merging) may be stored e.g. in a small stack inside a cache. This elimination of the temporary render targets does not only greatly simplify the development, but also significantly reduces the bandwidth.

The device 100 may initiate secondary rasterization steps (i.e. recursive rasterization) only for tiles, in which special effect are present. That is, the device 100 may determine the second tiles 109 based on the first rasterized tiles 106, wherein only the second tiles 109 are again rasterized based on the selected second primitives 111 as described above. Each second tile 109 may correspond to one first tile, but the number of second tiles 109 may be less than the number of first tiles. That is, some first tiles that do not have special effects may not need to be rasterized again (and accordingly no second tile 109 is generated). Instead, for these first tiles, the rasterized first tile 106 may be directly output to the render target 107. However, if a first tile contains reflections or other special effects (as may be determined when processing the corresponding first rasterized tile 106 by the filter shader 108), a second tile 109 may be generated and the secondary rasterization may be initiated by the device 100 as described above.

Thus, the need to analyse the visibility of the entirety of objects of a scene before rendering the scene disappears. Moreover, it is possible to calculate the second frustums 110 (or third or further frustums) with an individual shape for each second tile 109 (or third or further tile), i.e., wherever a second bounce is required. It can be useful to provide special effects based on non- affine transformations, for example, in order to show fast refraction or reflection from a non- planar surface. Furthermore, it may be easy to implement deeply nested special effects, wherein the number of required recursive rasterizations steps may be more than one. For example, for mirror-in-mirror, mirror-in-refraction, or for other cases that require a chain of sub-renders. It is also possible to implement an effective order-independent transparency using a “depth peeling” algorithm, because computations can be applied in tiles that cover the transparent objects, and the number of iterations may be limited by the actually required number.

Fig. 2 shows a conventional rasterization pipeline for the purpose of comparison with embodiments of the invention described below. The conventional pipeline is particularly modified for implementing embodiments of the invention, e.g., as shown in Fig. 4. The pipeline in Fig. 2 includes a tiler (see also Fig. 3), which provides triangles (i.e., primitives) for a tile, wherein the triangles are then rasterized and stored for the tile.

As shown in Fig. 3, this tiler may be described as a black box, and may receive a scene as an input. Based on the scene, the tiler may then determine tiles, and may then select triangles for these tiles. For example, by using API like OpenGL and Vulkan, the scene can be pushed into a rendering command buffer of the tiler using “draw calls”, like vkCmdDraw. After the whole scene is pushed into the rendering command buffer, a TBA based driver may send the triangles into the tiler, which may distribute the triangles to the corresponding tiles (wherein each triangle is distributed to the tile(s) in which it is visible). After this tiling stage, the driver can begin the actual rasterization.

In Fig. 4, a modified pipeline for a recursive rasterization method, which may be employed by the system 10 (including device 100 and the rendering pipeline), is shown. The pipeline is exemplarily shown, for reasons of simplicity, to process one first tile of the render target 107. In block 401, the first tile is obtained. In block 402, the first primitives 102 (here, as an example, triangles) are selected from the acceleration structure 101 as described above. In block 403, these first primitives 102 associated with the first tile are rasterized by the rasterizer 105 as described above. The resulting first rasterized tile 106 is then processed at block 404 as described above, particularly, using the filter shader 108. Then it is determined, whether further tile rasterization is needed at block 405. If yes, then a second tile 109 is returned to the block 402, and the second primitives 111 may be selected at block 402 for the second tile 109 from the acceleration structure 101 as described above. If not, then the first rasterized tile 106 is stored. In the end, all (stored) rasterization results for the first tile may be merged and output to the render target 107.

The change from the architecture shown in Fig. 2 to the architecture shown in Fig. 4 may be implemented by a driver of a GPU, without expensive hardware modifications. In particular, there may be two modifications. First, the acceleration structure 101 may be integrated into the driver. The acceleration structure 101 may thereby fill the same data structures in the GPU memory as the tiler of Fig. 2. The acceleration structure 101 may accordingly be used instead of the tiler. Second, after the rasterization of the first primitives 102, one or more stages may be added for analysing all pixels in the first rasterized tile 106, and to decide about the next step. Namely, to either copy the first rasterized tile 106 directly to the render target 107 or to initiate a second rasterization cycle, i.e., to generate a second tile for this first rasterized tile 106. In this case, the acceleration structure 101 can be used again to select the second primitives 111 over the first primitives 102.

The rasterization cycle of the first tile can be implemented in accordance with requirements for the presentation of multi-bounce special effects, which may be visible in the first tile. There are different possibilities to implement this requirement management, for example, using an extension of one or more compute shaders, or using a fixed pipeline. There may be three stages of the management. First, the extraction of the primitives 102, 111 and their rasterization. Second, a filter-shader-based stage processing of the rasterized tile 106, 112, after the rasterization of the primitives 102, 111, which can particularly post -process all pixels in the rasterized tile 106, 112, and can determine the parameters of special effects. For example, it may determine whether (a) further rasterization(s) is required or not, or may detect parameters of a sub-camera (e.g., the one or more second frustums 110), or may mix the pixels of the different bounces together, in order to combine the special effects into the final picture of the rendered scene. Third, a scheduler may manage the tile’s rendering loop. It may manage the order of in-tile rasterizations, may repeat the rasterization of the current set of primitives 102, 111, and may finally copy the rasterized tiles 106, 112 to the render target 107.

Notably, in embodiments of the invention, the source of the first primitives 102 or the second (or further) primitives 111 is not important for the rasterizer 105 of the rendering pipeline. All that matters is that these primitives 102, 111 are correctly placed in the primitive storage 104 (which may be a GPU memory). Further, various algorithms may be used to implement the functionality of the acceleration structure 101, for instance, for rendering 3D scenes. These algorithms may include BVH, OctTree, or MeshGrid. These algorithms can be very different, but as far as embodiments of the invention are concerned, it is only important that they can return the first primitives 102 or the second (or further) primitives 111 , which are potentially visible, for example, included in some first frustums 103 or second (or further) frustums 110, respectively. In accordance with this, the tiler shown in Fig. 2 and 3 can be replaced by the acceleration structure (AS) 101, as shown in Fig. 4. Notably, however, also a mixed system is possible, i.e., the device 100 may additionally include a tiler as shown in Fig. 2 or 3. In this case, the tiler may be used for an initial rasterization step, and the acceleration structure 101 may be used for further rasterizations to render reflections, refractions, sub-surface scattering, and so on. Fig. 5 shows another view of the pipeline for performing the recursive rasterization method, which may be employed by the system 10 (including device 100 and the rendering pipeline) according to an embodiment of the invention, wherein the pipeline shown in Fig. 5 is based on the pipeline shown in Fig. 4. In particular, Fig. 5 shows that a scene 500, which is to be rendered, may be pushed into the acceleration structure 101 (how this may be done is described further below). From the acceleration structure 101, the first primitives 102 (e.g., one or more triangles included in one or more triangle lists) may then be selected, wherein the first primitives 102 are included in one or more first frustums (or sub-cameras) as described above. The selected first primitives 102 may then be rasterized by the rasterizer 105 as described above. The first rasterized tiles 106 may then be post-processed using the filter shader 108, and the one or more second tiles 109 may be generated for the next cycle. Further, the one or more second frustums

110 may be generated for the second tiles 109 as described above. Next, the second primitives

111 may be selected from the acceleration structure 101. These second primitives 111 (e.g., one or more triangles included in one or more triangle lists) may then be rasterized by the rasterizer 105, and resulting second rasterized tiles 112 may be processed using again the filter shader 108 to produce third tiles for the next cycle. In further cycles, fourth tiles, fifth tiles, and so on may be produced, whereby multiple recursive rasterization steps may be performed with respect to one tile.

The cycles of the selection of the primitives 102, 111 and the subsequent rasterization of these primitives 102, 111 allows to avoid bandwidth related with copying and reading a tile memory into a slow memory of the render target 107 after rasterization. Further, management of one or more memories for temporary render targets can be avoided, because temporary rasterization results may not exceed the size of the tile in embodiments of the invention, and can thus be presented in a fast cache of the GPU only.

A more detailed procedure tor performing the recursive rasterization method is shown in Fig. 6. For instance, the system 10 may be configured to perform this recursive rasterization method including the following steps:

1 st step: begin processing of a tile in block 601 ;

2 nd step: rasterize the first primitives 102 (e.g., primary triangles) selected for the tile in block 602; 3 rd step: check if a reflection is found for the rasterized tile in block 603; and, if yes, then: a) calculate a reflected camera (new second frustum 110) in block 607; b) extract a first set of second primitives 111 (e.g., as triangle list) from the acceleration structure (AS) 101 as included in the second frustum 110 in block 608; c) rasterize the selected second primitives 111 selected for the tile in block 609; and d) mix reflections to the primary buffer in block 610; if no, then:

4 th step: check if a refraction is found in block 604, if yes, then: a) calculate a refracted camera (new second frustum 110) in block 611; b) extract a second set of second primitives 111 from the AS 101 as included in the second frustum 110 in block 612; c) rasterize the second set of second primitives 111 selected for the tile in block 613; d) mix refractions to the primary buffer in block 614. if no, then:

5 th step: check if there are any other effects in block 605; and

6 th step: copy primary buffer to the render target 107 in block 606.

The device 100 and system 10 according to above embodiments of the invention allow to rasterize multifaceted reflective objects without the usage of an extra memory, because each reflective surface or local group of reflective surfaces may be processed in each tile separately, and temporary data may be stored in a GPU memory due to their size being compatible with the size of one tile. The recursive rasterization allows eliminating the dependency on the number of reflective facets, since each tile may cover only a small part of the whole picture of the scene, may be processed independently of the others, and may intersect a limited number of the facets.

Further, the device 100 and system 10 according to embodiments of the invention allow to rasterize reflection- in-reflection (illustrated in Fig. 7), without the usage of extra memory and complex memory management, due to the same reasons. Namely, reflections may be calculated in each tile independently, so as to work with small data sets even for several recursive rasterizations. The same can be applied to refraction, or the mixing of refraction with reflection (illustrated also Fig. 7). As shown in FIG. 8, it is also possible for the device 100 or system 10 to dynamically evaluate the weight of a 3D object, because the recursive rasterization steps can be used to compare the distance to the front and back surface of the 3D object. It is possible to implement soft shadows using technique based on the occlusion queries. For example, if a developer wants to estimate the degree of shadowing at a given point of scene, a basic algorithm may be:

1. To build a second frustum 110, where the near end corresponds to the point, and far end of the second frustum 10 cover of some virtual 3D object, which represent a non-point light source (for example rectangular or spherical).

2. Start sub-rasterization using this second frustum 110 (i.e., rasterize second primitives 111 selected from the acceleration structure 101 based on the second frustum 111), particularly rasterizer the intersected second primitives 111 into a depth buffer only.

3. Then count a number of pixels in the second tile 109, for which the second primitives 111 were selected, wherein the pixels that are closer to the camera than the light source. This number divided by the total number of pixels in the second tile 109, and yields the required the degree of shadowing.

Fig. 9 shows how the device 100 or system 10 according to embodiments of the invention, can be integrated/implemented into a mobile GPU with a TBA. In particular, Fig. 9 shows an implementation using a memory unit (DDR 900), a GPU unit 901, and a recursiveness unit 902 (which includes a recursiveness processor 903 and a tile compute shader 904). The implementation may base on an extension of Vulkan or OpenGL API. There may be an extension that allows to work with the acceleration structure 101 in Vulkan.

The acceleration structure 101 may be integrated into a driver. In an embodiment of the invention, this acceleration structure 101 can be based on a bounding volume hierarchy (BVH) algorithm, because it provides a good balance of speed and quality for typical 3D products. But it is also possible to implement an acceleration structure 101 using another algorithm or a combination of them. Trivial implementation of the acceleration structure 101 may include a memory management and a BVH building system. These can be implemented using any suitable methods or algorithms. The access to this functionality may be made available for the developers through the Vulkan API for the acceleration structure 101. BVH traversing algorithm for narrow frustums 103, 110 may be implemented using one or more compute shaders. Input for a traversing function may the frustum 103, 1 10 stored in the GPU memory. The output of that function may the primitives 102, 1 1 1 and related data, which may be stored in the same memory (primitive storage 104), and the same format like the tiler in Fig. 2 or 3 does. The primitives 102, 111 may be rasterized by the rasterizer 105, which may also execute a fragment shader algorithm on each tile after executing the rasterization.

A new entity may further be created, namely, a dictionary of shaders of three different types. Each dictionary may be a set of records with a key and a value. One dictionary may be for the standard vertex and pixel shaders 907. The key may be the pair of the material and bounce. The shader 907 may be the value. This dictionary may be used to establish a correspondence between the geometry in a BVH and the original materials for each bounce (rasterization step). A second dictionary may determine the access to the post-processing fragment shader 908 (executed by rasterizer 105). The shader 908 may work after the primitives 102, 112 of a tile are rasterized by the rasterizer 105, and all pixels are ready. The shader 908 may modify the pixels, or extract some technical info, for example, the parameters of the reflective planes that are visible in the tile. After extraction, the shader 908 may calculate the reflective camera using the original camera and the reflective plane. A large number of other use-cases are also possible. It should be possible to configure several post-processing shaders 908 for the several steps of the tile’s processing, namely, immediately after rasterization, after each secondary bounce, and before the tile stored to the frame buffer. A third shader, the shader 904 which may comprise the filter shader 108 described above, may manage the recursiveness of the tile’s rasterization, e.g., a maximum number of recursive rasterization steps, an order of the special effects rendering, number of bounces and so one.

In order to implement the system of the tile’s cache management, the number, format and other parameters of the drawing layers (colour, depth, normal and other buffers) may be configured for each bounce.

Further, a function may be added into the API (as extension) that may begin the recursive rasterization. This function may receive the acceleration structure 101, shader dictionaries, tile cache structure, and main camera. The simplest camera can be presented as the trivial matrix 4x4, it is a standard approach in the real-time 3D graphics.

In order to push a scene into the acceleration structure 101 , the standard API of an acceleration structure 101 may be used. All types of the shaders for each material, for each bounce, for each effects may be created. They may be configured in order to allow to rasterize the geometry on each bounce, post-process the pixels after rasterization, extract the temporary sub-cameras, mix bounces, and, finally, to call the function that starts the recursive rasterization.

To this end, some new extensions for the Vulkan or OpenGL APIs may be used, but this implementation can be based on the standard compute shaders and other possibilities available for the driver of the GPU. There are no specific requirements for the hardware if it is implemented using the tile based architecture. In the future, the hardware can additionally be optimized for such methods of rasterization, but first versions can be implemented after some modification of the drivers.

Fig. 10 and 11 shows a schematic embodiment of a method 1000 for performing a recursive rasterization according to an embodiment. The method 100 may be performed by the device 100.

The method 1000 comprises (see Fig. 10): a step 1001 of determining a render target 107 comprising a plurality of first tiles; a step 1002 of obtaining an acceleration structure 101 ; a step 1003 of obtaining 1003 one or more first frustums 103, wherein each first tile of the render target 107 is associated with one first frustum 103; a step 1004 of selecting, for each first tile, first primitives 102 from the acceleration structure 101 , wherein the selected first primitives 102 are included in the first frustum 103 associated with the first tile; a step 1005 of writing 1005 the selected first primitives 102 into a primitive storage 104 of a rendering pipeline; and a step 1006 of receiving first rasterized tiles 106 from a rasterizer 105 of the rendering pipeline, wherein each first rasterized tile 106 has been obtained by rasterizing the first primitives 102 selected for one first tile of the render target 107.

Further, the method 1000 comprises (see Fig. 11): a step 1007 of processing the first rasterized tiles 106 using a filter shader to obtain one or more second tiles 109; a step 1009 of generating one or more second frustums 110 for each second tile 109; a step 1009 of selecting, for second tile 109, second primitives 111 from the acceleration structure 101, wherein a set of second primitives 111 is selected from each of the one or more second frustums 110 generated for said second tile 109; a step 1010 of writing 1010 the selected second primitives 111 into the primitive storage 104; and a step 1011 of receiving 1011, from the rasterizer 105, one or more second rasterized tiles 112 for each second tile 109, wherein each second rasterized tile 1 12 has been obtained by rasterizing one set of second primitives 111 selected for said second tile 109.

The present invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.