SOFT SHADOW OPTIMIZATION - HUAWEI TECH CO LTD

Title:

SOFT SHADOW OPTIMIZATION

Document Type and Number:

WIPO Patent Application WO/2021/073729

Kind Code:

Abstract:

A graphics processing system for rendering an image having a plurality of pixels, at least some of the pixels representing a shadow of an object in said image, wherein the system is configured to, for each of at least some of the plurality of pixels: determine a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a shading state in an image space; determine whether the data points of each texel all indicate the same shading state; and if the data points of each texel do not all indicate the same shading state, render the respective pixel by applying a weighted shadow filtering algorithm to the data points of the respective texels. Soft shadows can thus be generated efficiently.

More Like This:

JP4513423	Display control method of object image by virtual 3D coordinate polygon and image display device using this
JP2020513581	Display device and display method
JP3377491	[Title of Invention] Game system and information storage medium

Inventors:

LIU BAOQUAN (DE)

Application Number:

PCT/EP2019/078018

Publication Date:

April 22, 2021

Filing Date:

October 15, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HUAWEI TECH CO LTD (CN)
LIU BAOQUAN (DE)

International Classes:

G06T15/60

Foreign References:

US7817823B1	2010-10-19
US7106326B2	2006-09-12

Other References:

CERQUEIRA DE FARIAS MACEDO MARCIO ET AL: "Hard Shadow Anti-Aliasing for Spot Lights in a Game Engine", 2017 16TH BRAZILIAN SYMPOSIUM ON COMPUTER GAMES AND DIGITAL ENTERTAINMENT (SBGAMES), IEEE, 2 November 2017 (2017-11-02), pages 106 - 115, XP033369331, DOI: 10.1109/SBGAMES.2017.00020
LI HUA ET AL: "Accurate Shadow Generation Analysis in Computer Graphics", 2018 IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), IEEE, 28 June 2018 (2018-06-28), pages 1116 - 1120, XP033506976, DOI: 10.1109/HPCC/SMARTCITY/DSS.2018.00186
JOHN R ISIDORO: "Shadow Mapping: GPU-based Tips and Techniques", GAME DEVELOPERS CONFERENCE, 20 March 2006 (2006-03-20), XP055179674, Retrieved from the Internet [retrieved on 20150326]
CHRISTIAN SIGG ET AL: "GPU Gems - Chapter 20. Fast Third-Order Texture Filtering", 1 April 2005 (2005-04-01), XP055220422, Retrieved from the Internet [retrieved on 20151013]
CHRISTIAN SIGGMARKUS HADWIGER: "GPU Gems2: Programming Techniques for High-Performance Graphics and General-Purpose Computation", 2005, ADDISON-WESLEY, article "Fast Third-Order Texture Filtering", pages: 313 - 329

Attorney, Agent or Firm:

KREUZ, Georg (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A graphics processing system for rendering an image having a plurality of pixels, at least some of the pixels representing a shadow of an object in said image, wherein the system is configured to, for each of at least some of the plurality of pixels: determine a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a shading state in an image space; determine whether the data points of each texel all indicate the same shading state; and if the data points of each texel do not all indicate the same shading state, render the respective pixel by applying a weighted shadow filtering algorithm to the data points of the respective texels.

2. The system of claim 1 , wherein the system is further configured to, if the data points of each texel all indicate the same shading state, not apply a weighted shadow filtering algorithm to the data points of the respective texels.

3. The system of claim 1 or 2, wherein the step of determining whether all of the data points indicate the same shading state comprises averaging the data points and determining whether the average is equal to a predetermined maximum or minimum value.

4. The system of any preceding claim, wherein the weighted shadowing filtering algorithm is a bilinear percentage closer filtering algorithm.

5. The system of any of claims 1 to 3, wherein the weighted shadowing algorithm comprises a weighting function having a continuous derivative.

6. The system of claim 5, wherein the weighted shadowing filtering algorithm is a bicubic shadow filtering algorithm.

7. The system of claim 6, wherein the weighted shadow filtering algorithm comprises a B- spline bicubic filter or a Catmull-Rom bicubic filter

8. The system of any preceding claim, wherein the system is further configured to: combine a plurality of subsets of the data points to generate a plurality of combined data samples; determine whether all of the combined data samples indicate the same shading state; and if the combined data samples do not all indicate the same shading state, render the respective pixel by applying a weighted shadow fdtering algorithm to the combined data samples.

9. A method for rendering an image having a plurality of pixels, at least some of the pixels representing a shadow of an object in said image, wherein the method comprises, for each of at least some of the plurality of pixels: determining a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a shading state in an image space; determining whether the data points of each texel all indicate the same shading state; and if the data points of each texel do not all indicate the same shading state, rendering the respective pixel by applying a weighted shadow fdtering algorithm to the data points of the respective texels.

10. The method of claim 9, wherein the method further comprises, if the data points of each texel all indicate the same shading state, not applying a weighted shadow fdtering algorithm to the data points of the respective texels.

11. The method of claim 9 or 10, wherein the step of determining whether all of the data points indicate the same shading state comprises averaging the data points and determining whether the average is equal to a predetermined maximum or minimum value.

12. The method if any of claims 9 to 11, wherein the weighted shadowing fdtering algorithm is a bilinear percentage closer fdtering algorithm.

13. The method of any of claims 9 to 11, wherein the weighted shadowing algorithm comprises a weighting function having a continuous derivative.

14. The method of claim 13, wherein the weighted shadowing fdtering algorithm is a bicubic shadow fdtering algorithm.

15. The method of any of claims 9 to 14, the method further comprising: combining a plurality of subsets of the data points to generate a plurality of combined data samples; determining whether all of the combined data samples indicate the same shading state; and if the combined data samples do not all indicate the same shading state, rendering the respective pixel by applying a weighted shadow fdtering algorithm to the combined data samples.

16. A computer program which, when executed by a computer, causes the computer to perform the method of any of claims 9 to 15.

Description:

SOFT SHADOW OPTIMIZATION

FIELD OF THE INVENTION

This invention relates to optimization of the rendering of soft shadows, for example for game rending in a graphics processing unit.

BACKGROUND

The rendering of shadows is very important for video games, because the presence of shadows can not only increase the realism of a rendered image, but can also indicate inter-object distances and positions to the viewer. However, rendering high quality soft shadows using high-order fdtering algorithms with weighted calculations is very demanding, especially for the modern mobile graphics processing unit (GPU).

A simple shadow-map algorithm, known as hard shadow, does not use any fdtering algorithms (i.e. there are no weighted calculations) and fetches one single nearest texture sample from a shadow-map texture to test whether the current rendered fragment is inside the shadow region of the image or not. However, this may result in a poor-quality image, as shown by the shadow indicated at 101 in Figure 1, which is not realistic.

Figure 2a shows the improved rendering result when 2 x 2 bilinear percentage closer fdtering (PCF) is used to produce a soft shadow, indicated at 201. This shadow-map algorithm fetches four texture samples in a 2 x 2 neighbourhood area surrounding the target UV coordinate from a shadow-map texture, shown as (s, t) in Figure 2b. 2 x 2 bilinear weighted fdtering is then performed to calculate a percentage of whether the current rendered fragment is inside the shadow or not, i.e., whether the surface point is closer to the lit area and, therefore, not in shadow. However, the resulting image is still unrealistic and features zigzag aliasing.

Figure 3a shows the rendering result when 3 x 3 bilinear PCF is used. Many modem games employ 3 x 3 bilinear PCF fdtering, which fetches nine texture samples in a 3 x 3 neighbourhood area surrounding the target UV coordinate, shown as (s, t) in Figure 3b. A 3 x 3 bilinear weighted calculation is then performed on the nine samples in a fragment shader in order to obtain a smooth filtered shadow. The resulting rendered image quality is improved compared to 2 x 2 PCF filtering, as can be seen from the shadow area indicated at 301, but images may still exhibit bilinear diamond artefacts and aliasing.

Bilinear filtering algorithms are therefore simple and efficient, but do not produce very smooth image quality results for soft shadows. This is because although the bilinear function is continuous at the junctions between neighboring sets of filtered regions, the first derivative of the function is discontinuous.

The best quality shadow filtering for real time applications may be achieved by using high- order filtering. F or example, a bicubic filtering function, which fetches sixteen texture samples in a 4 x 4 neighbourhood area and then performs a 4 x 4 bicubic weighted calculation on the sixteen samples in a fragment shader, may be used to result in a smooth filtered shadow. Figure 4 illustrates the sampling of a 4 x 4 grid of texels surrounding the target UV coordinate (shown as a cross) in bicubic filtering. Improved image quality may be obtained, as can be seen in the comparison of Figures 5a and 5b, where Figure 5a shows an image processed using a 3 x 3 bilinear PCF and Figure 5b shows an image processed using a 4 x 4 bicubic filtering algorithm. In Figure 5a, diamond artifacts are visible in the transitional area of the top image, whilst in Figure 5b, the image appears much smoother at the same area. The bicubic filtering function is differentiable, and its first-order derivative is a continuous function. The previously described bilinear filtering functions do not have continuous derivatives.

This bicubic filtering method is described in US 7,106,326 B2. A computational unit is configured to access data values from memory and perform filtering operations (such as linear, bilinear, trilinear, cubic or bicubic filtering) on the data values of a neighbourhood (Np x Np), where Np x Np is the size of the neighbourhood in texels. However, this method requires Np x Np data samples and a complex weighted calculation for each sample.

For a cubic function (i.e. a third order polynomial), not only is the function itself continuous, but the first derivative of the function is still continuous. For a fourth-order function, the second derivative is continuous. Therefore, high-order filtering algorithms, such as bicubic or higher cubic, produce good image qualities due to the continuity of their derivatives. However, many texture fetching instructions are required for data sampling, which may also consume a lot of data bandwidth. Furthermore, very complex high-order weighted calculations are required for filtering. These processes are very heavy burdens for a mobile GPU’s texture bandwidth and computing capability.

Equation (1) is the 2D bicubic convolution equation in matrix notation, where S(t) is the ID third-order polynomial function to calculate the weighting coefficients, u and v are the fractional parts of the sampling coordinates in texture space along the x and y directions respectively, i and j are the integer parts and f(x, y) is the 2D function of values stored in the texture to be sampled.

A (5(1 + v) S(P) 5(1 - u) 5(2 - t?))

Bicubic filtering requires sixteen data samples into a shadow-map texture, as shown by matrix B in Equation (1), which then need to be filtered by a third-order weighted filtering calculation along the x and y directions in a 4 x 4 area. This involves thirty-two multiplications for the shading of a single pixel, as shown in Equation (1), where two vectors A and C each hold four cubic (third-order) weighted coefficients, and the matrix B holds sixteen data samples (in a 4 x 4 area).

Furthermore, A and C also need to be dynamically calculated by a formula of third-order polynomials, which are evaluated using four points along an axis, which only operates in one dimension (either in the x or y direction). Equations (2) and (3) show formulae of two examples of the polynomials, which are cubic weighting functions, where d is the texel’s distance from the sampling point d can be computed as the fractional offset from the texel to the original sampling location in the texture space d is input into a cubic polynomial (such as B spline, or the Catmull-Rom spline weighting function) to obtain the fdter weights along the x and y directions.

Equation (2) shows a formula to calculate the B spline weight values by using a third-order polynomial:

147

Equation (3) shows a formula to calculate Catmull-Rom spline weight values by using a third- order polynomial:

147

Therefore, high-order fdtering algorithms for high quality soft shadow require many data samples in a N x N area to be fetched from the shadow-map texture, which could lead to many texture fetching instructions. This is a memory bandwidth burden for the GPU. They also require complex weighted calculations to be performed for each single output fragment shading, which may involve thirty -two multiplications for bicubic weighted filtering in a 4 x 4 area. Additionally, the cubic weight values (vectors A and C in Equation (1)) need to be dynamically calculated by using third-order polynomials. Mobile devices require real-time rendering performance, with a high frame-rate and low latency interaction. At the same time, they require low power consumption to extend battery life, and low heat dissipation for user comfort when the device is hand-held for a long periods of time. Such requirements may not be achievable when a high-order filtering algorithm is used for the rendering of high quality soft shadows. It is desirable to develop a new method for rendering high quality soft shadows that overcomes these problems.

SUMMARY OF THE INVENTION

According to a first aspect there is provided a graphics processing system for rendering an image having a plurality of pixels, at least some of the pixels representing a shadow of an object in said image, wherein the system is configured to, for each of at least some of the plurality of pixels: determine a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a shading state in an image space; determine whether the data points of each texel all indicate the same shading state; and if the data points of each texel do not all indicate the same shading state, render the respective pixel by applying a weighted shadow filtering algorithm to the data points of the respective texels.

If the data points of each texel do not all indicate the same shading state, this may indicate that the pixel is located at the boundary region of a shadow. By determining which pixels are located at the boundary of the shadow and only performing weighted shadow filtering calculations for these pixels, the processing required may in some cases be reduced by approximately 95%.

The system is further configured to, if the data points of each texel all indicate the same shading state, not apply a weighted shadow filtering algorithm to the data points of the respective texels. In this case, the cubic weight values (vectors A and C in Equation (1)) do not need to be dynamically calculated either. This may reduce the processing required to render the image by up to 95%.

The step of determining whether all of the data points indicate the same shading state may comprise averaging the data points and determining whether the average is equal to a predetermined maximum or minimum value. This is a convenient way of determining whether the data points indicate the same shading state. The weighted shadowing fdtering algorithm may be a bilinear percentage closer fdtering algorithm. The system is therefore compatible with fdtering algorithms used in many modern video games.

The weighted shadowing algorithm may comprise a weighting function having a continuous derivative. This may produce a smooth image quality result for soft shadows.

The weighted shadowing fdtering algorithm may be a bicubic shadow fdtering algorithm. This may result in improved image quality when rendering soft shadows.

The weighted shadow fdtering algorithm may comprise a B-spline bicubic fdter or a Catmull- Rom bicubic fdter. The system is therefore compatible with fdtering algorithms used in many modern video games.

The system may be further configured to: combine a plurality of subsets of the data points to generate a plurality of combined data samples; determine whether all of the combined data samples indicate the same shading state; and if the combined data samples do not all indicate the same shading state, render the respective pixel by applying a weighted shadow fdtering algorithm to the combined data samples. This may further reduce the processing required to render the image. Therefore, the system may enhance and accelerate traditional high-order shadow fdtering, as well as fdtering methods using a sample-combining technique.

According to a second aspect there is provided a method for rendering an image having a plurality of pixels, at least some of the pixels representing a shadow of an object in said image, wherein the method comprises, for each of at least some of the plurality of pixels: determining a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a shading state in an image space; determining whether the data points of each texel all indicate the same shading state; and if the data points of each texel do not all indicate the same shading state, rendering the respective pixel by applying a weighted shadow fdtering algorithm to the data points of the respective texels.

If the data points of each texel do not all indicate the same shading state, this may indicate that the pixel is located at the boundary region of a shadow. By determining which pixels are located at the boundary of the shadow and only performing weighted shadow fdtering calculations for these pixels, in some cases the method may reduce the processing required by approximately 95%.

The method may further comprise, if the data points of each texel all indicate the same shading state, not applying a weighted shadow fdtering algorithm to the data points of the respective texels. This may reduce the processing required to render the image.

The weighted shadowing fdtering algorithm may be a bilinear percentage closer fdtering algorithm. The method is therefore compatible with fdtering algorithms used in many modern video games.

The weighted shadowing algorithm may comprise a weighting function having a continuous derivative. This may produce a smooth image quality result for soft shadows.

The weighted shadowing fdtering algorithm may be a bicubic shadow fdtering algorithm. This may result in improved image quality when rendering soft shadows.

The method may further comprise: combining a plurality of subsets of the data points to generate a plurality of combined data samples; determining whether all of the combined data samples indicate the same shading state; and if the combined data samples do not all indicate the same shading state, rendering the respective pixel by applying a weighted shadow fdtering algorithm to the combined data samples. This may further reduce the processing required to render the image. Therefore, the method may enhance and accelerate traditional high-order shadow fdtering, as well as fdtering methods using a sample-combining technique. According to a third aspect there is provided a computer program which, when executed by a computer, causes the computer to perform the method described above. The computer program may be provided on a non-transitory computer readable storage medium.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings.

In the drawings:

Figure 1 illustrates the rendering result of a hard shadow algorithm, without any fdtering.

Figure 2a shows the rendering result when 2 x 2 bilinear percentage closer fdtering (PCF) is used to produce a soft shadow.

Figure 2b illustrates sampling a 2 x 2 grid of texels surrounding the target UV coordinate.

Figure 3a shows the rendering result when 3 x 3 bilinear percentage closer fdtering (PCF) is used to produce a soft shadow.

Figure 3b illustrates sampling a 3 x 3 grid of texels surrounding the target UV coordinate.

Figure 4 illustrates sampling a 4 x 4 grid of texels surrounding the target UV coordinate in bicubic linear fdtering.

Figure 5a shows an image processed using a 3 x 3 bilinear PCF.

Figure 5b shows an image processed using a 4 x 4 bicubic fdtering algorithm.

Figure 6 shows areas of pixels which are located at the boundary of the shadow, within the shadow area and outside the shadow area. Figure 7 illustrates how four textureGather instructions are used to obtain 4 x 4 data samples.

Figure 8 illustrates the sampling pattern when combining sixteen nearest-neighbour taps down to four linear taps in 2D in a sample-combining cubic bilinear filtering technique.

Figure 9 illustrates a method for rendering an image having a plurality of pixels, at least some of the pixels representing a shadow of an object in said image.

Figure 10 shows an example of a graphics processing system.

DETAILED DESCRIPTION OF THE INVENTION

The present invention proposes a solution for rendering high quality soft shadows. By using a pre-calculation to determine which pixels are located at the boundary of shadows in an image, the processing required may in some cases be reduced by approximately 95%.

Figure 6 shows an image where some of the pixels represent a shadow of an object. Before the complex weighted shadow filtering calculation is performed in the pixel shader, the pixels of the image are each classified as being completely outside of the shadow area, as indicated at 601, 602, 603 and 604, completely within the shadow, as indicated at 605, and located at the boundary of the shadow, as indicated at 606.

For pixels that are either completely inside or completely outside of the shadow, performing filtering for these pixels does not affect the image quality to the observer. Therefore, these pixels can be disregarded in the filtering calculations. The complex weighted shadow filtering calculation is only performed for the boundary pixels.

In order to determine whether a pixel is located at the shadow boundary, a plurality of texels are determined for each pixel about a UV location. Each texel comprises a sub-pixel data point indicating a shading state in an image space. Shadow sampling uses a single channel from a depth texture. One textureGather instruction for a single channel can obtain four data samples. As an example, illustrated in Figure 7, when using bicubic filtering, four textureGather instructions are needed to obtain 4 x 4 data samples, as shown by the by the four boxes in Figure 7.

Without any fdtering, a data sample’s value from a shadow-map texture is either 1.0 or 0.0. As a result, the rendered shadow in a final image shows strong aliasing. This is because if a data sample’s value is equal to 1.0, it means that the pixel is completely outside of the shadow, while if it is 0.0, it means that the pixel is completely in shadow.

Therefore, before filtering, in order to determine whether a pixel is located within or outside of the shadow area, it is determined whether the data points of each texel all indicate the same shading state.

If the data points of each texel all indicate the same shading state, this indicates that the respective pixel is located within or outside the shadow area and the complex filtering calculation does not need to be performed for that pixel. Therefore, the weighted shadow filtering algorithm is not applied to the data points of the respective texels.

If the data points of each texel do not all indicate the same shading state, this indicates that the respective pixel is located at the shadow boundary and the pixel is rendered by applying a weighted shadow filtering algorithm to the data points of the respective texels.

After filtering, a data sample’s value could be a floating point, which lies between 1.0 and 0.0, and thus it could be rendered as a soft shadow in the final image.

In one example, in order to determine whether the data points of a pixel all have the same shading state, the sum of all of the data samples in the N x N area to be filtered is computed. A pixel may be determined as being located completely within or completely outside a shadow if the sum is equal to a predetermined maximum or minimum value. For example, without any filtering a data sample’s value from a shadow-map texture can only be either 1.0 or 0.0, therefore for a 4 x 4 texel area, the predetermined maximum value may be 4.0 and the predetermined minimum value may be 0.0. The system may alternatively or additionally compute the average of the values of the data samples in the N x N area to be filtered. A pixel may be determined as being located completely within or completely outside a shadow if the average is equal to a predetermined maximum or minimum value. For example, for a 4 x 4 texel area, the predetermined maximum average value may be 1.0 and the predetermined minimum average value may be 0.0.

Therefore, a map indicating which pixels of the image are to be filtered can be determined by calculating the sum and/or average of the values from a shadow-map texture of the texels within a pixel.

In modem game engines, the shadow filtering is directly computed on the N x N data samples to get a final high quality filtered soft shadow value. The exemplary pseudocode below shows this computing pattern: vec4 leftBottom=textureGather(shadowMap, ojfsetUV.xy, refZ); vec4 rightTop =textureGather(shadowMap, ojfsetUV.zw, refZ); vec4 rightBottom=textureGather(shadowMap, ojfsetUV.zy, refZ); vec4 leftTop =textureGather(shadowMap, offsetUV.xw, refZ); matrix4x4 B = matrix4x4(leftBottom, rightBottom, leftTop , rightTop );

{ compute the cubic weights A along x; // cubic polynomial on a lby4 vector compute the cubic weights C along y; // cubic polynomial on a 4byl vector perform the bicubic weighted filtering on the 4x4 matrix B using the Equation 1; return the resulting shadow value;

}

The sum and average value of the N x N data samples is calculated for each pixel, and then it is determined whether the average value is equal to 1.0 or 0.0. If the average is equal to 1.0 or 0.0, this means that the current pixel for shading is either completely inside or completely outside of the shadow. Therefore, for these pixels, complex filtering is not performed.

Otherwise, any average value lying between 0.0 or 1.0 means that the current pixel for shading is located at the boundary of the shadow and needs to be fdtered by applying the complex weighting and fdtering calculation.

The exemplary pseudocode below shows the computing pattern of the algorithm: vec4 leftBottom=textureGather(shadowMap, ojfsetUV.xy, refZ); vec4 rightTop =textureGather(shadowMap, ojfsetUV.zw, refZ); vec4 rightBottom=textureGather(shadowMap, ojfsetUV.zy, refZ); vec4 leftTop =textureGather(shadowMap, offsetUV.xw, refZ); matrix4x4 B = matrix4x4(leftBottom, rightBottom, leftTop , rightTop ); vec4 sum = leftBottom+ rightTop +rightBottom + leftTop; float shadow = (sum.x +sum.y +sum.z +sum.w) / (N*N); // here N*N=16

If( shadow ==0 II shadow ==1 )

I do nothing;} // this corresponds to the green zone, accounting for 95 % of pixels else // this corresponds to the red zone, accounting for only 5% of pixels

} return shadow;

The pseudocode illustrates that the algorithm can avoid performing the complex cubic weighted calculation for most outputting pixels in an image that are located either completely within or completely outside the shadow. That means that for these pixels, thirty-two multiplications for bicubic fdtering in a 4 x 4 area may be avoided. Additionally, the calculation of the cubic weight values (vectors A and C), which involve third-order polynomials, may also be avoided.

In the above example, the determination of whether the samples in a pixel have the same shading state is carried out by determining the sum and/or average of the data points of each texel. The calculation may alternatively employ different methods. For example, the pre calculation may use the logic -AND of all of the sampled data points to obtain a similar intermediate result, from which it may be determined if the pixel is located at the shadow boundary.

The above method may be extended to other methods of fdtering, such as N x N PCF fdtering. Similarly to the above-described bicubic fdter (requiring sixteen data samples over a 4 x 4 texel area), the pre-calculation can also be used to enhance a N x N PCF fdtering, such as 3 x 3 bilinear PCF fdtering. The same pre-calculation may be performed in order to detect pixels located at the boundary of the shadow, before the N x N weighted fdtering calculation is performed for only the data points from those pixels.

As described above, many fdtering algorithms use repeated nearest-neighbor sampling of the input texture. To reduce the number of input samples, a sample-combining technique may perform cubic fdtering building on linear texture fetches (rather than nearest neighbor ones), which reduces the number of texture accesses considerably, especially for 2D and 3D fdtering. Such a sample-combining technique is described in Christian Sigg and Markus Hadwiger, "Fast Third-Order Texture Fdtering", In GPU Gems2: Programming Techniques for High- Performance Graphics and General-Purpose Computation, Matt Pharr (ed.), Addison-Wesley; chapter 20, pp. 313-329, 2005, is used. The pre-calculation method of the present invention may also be used to determine the boundary pixels when samples are combined, thus further reducing the processing required to render an image.

By adapting the UV coordinates, this bicubic shadow fdtering with sample-combining technique can take fewer than sixteen data samples to fdter a 4 x 4 texel area. For example, the technique may use four taps for a B-Spline bicubic fdter and nine taps for a Catmull-Rom bicubic fdter. This is based on the technique described in GPU Gems 2, chapter 20, Fast Third-Order Texture Fdtering. This brings the best case for bicubic fdtering down to four taps, instead of sixteen taps in the traditional method.

Figure 8 shows the sampling pattern when combining sixteen nearest-neighbour taps down to four linear taps in 2D in a sample-combining cubic bilinear fdtering technique. In order to merge the sixteen taps down to four, this technique combines the first pair of ID taps into one linear tap and the second pair into another linear tap (a linear 2:1 reduction becomes a 4:1 reduction in 2D, taking sixteen taps down to four). This is illustrated by Equations (4) and (5) below, where i is the integer part of x and a = x — i is the fractional part of x. Building on such a convex combination, we can rewrite a general linear combination as a x f + b x fi ₊₁ with general parameters a and b, where f _L is the function value at integer location of t, and fi ₊₁ is the function value at the next integer location of i + 1.

The ID formulae to combine a pair of nearest-neighbour taps into a single linear tap (Equation (4)), and to combine two pairs of nearest-neighbour taps into two linear taps (Equation (5)) are: w ₀(x) x fi + Wi(x) x f + w ₂(x) x f _i+1 + w ₃(x) x f _{i +}2 =

The pseudocode below shows the computing pattern of this technique when using a B-Spline bicubic fdter: compute cubic weights : wO, wl, w2, w3; compute new weights vector: gO, gl, hO and hi; perform 4 tex sampling: texl, tex2, tex3, tex4;

{do the weighted calculation; and update shadowValue ;} return shadowValue;

By using this sample-combining method, this technique can exploit bilinear fdtering to implement a four-tap B-Spline bicubic fdter, or a nine-tap Catmull-Rom bicubic fdter, instead of requiring sixteen taps, as in the traditional bicubic fdtering method.

The pre-calculation may be applied to the combined data points to avoid performing the cubic weighted calculation for most pixels on the screen.

The exemplary pseudocode below shows the computing pattern using a B-Spline bicubic fdter: compute cubic weights : wO, wl, w2, w3; compute new weights vector: gO, gl, hO and hi; perform 4 tex sampling: texl, tex2, tex3, tex4; float shadowValue = ( texl + tex2 + tex3 + tex4) *0.25;

If(shadowValue ==0 II shadowValue ==1 )

I do nothing;} else

{do the weighted calculation; and update shadowValue;} return shadowValue;

From this pseudocode, it can be seen that many of high-order weighted fdtering calculations can be avoided by using the pre-calculation. Furthermore, this solution requires four data samples instead of sixteen, as in traditional bicubic fdtering, due to the sample-combining technique. This is particularly beneficial to the data bandwidth required.

Similarly to the above B-Spline bicubic filter (with data samples combined to four taps), the pre-calculation can also be used to enhance a Catmull-Rom bicubic filter, for which data samples can be combined to nine taps, from the original sixteen taps. Similarly, the pre calculation can avoid performing high-order weighted filtering calculations for most of the pixels to be shaded.

Therefore, this solution requires fewer texture sampling instructions, whilst also avoiding many high-order filtering weighted calculations.

The approach described herein may therefore enhance and accelerate traditional high-order shadow filtering methods, as well as filtering methods using a sample-combining technique.

Figure 9 illustrates a method for rendering an image having a plurality of pixels, at least some of the pixels representing a shadow of an object in said image, wherein the method comprises, for each of at least some of the plurality of pixels. At step 901, the method comprises determining a plurality of texels corresponding to the respective pixel, each texel comprising a sub-pixel data point indicating a shading state in an image space. At step 902, the method comprises determining whether the data points of each texel all indicate the same shading state. At step 903, the method comprises, if the data points of each texel do not all indicate the same shading state, rendering the respective pixel by applying a weighted shadow fdtering algorithm to the data points of the respective texels.

Figure 10 is a schematic representation of a system 1000 configured to perform the methods described herein. The system 1000 may be implemented on a device, such as a laptop, tablet, smart phone, TV or any other device in which graphics data is to be processed.

The system 1000 comprises a graphics processor 1001 configured to process data. For example, the processor 1001 may be a GPU. Alternatively, the processor 1001 may be implemented as a computer program running on a programmable device such as a GPU or a Central Processing Unit (CPU). The system 1000 comprises a memory 1002 which is arranged to communicate with the graphics processor 1001. Memory 1002 may be a non-volatile memory. The graphics processor 1001 may also comprise a cache (not shown in Figure 10), which may be used to temporarily store data from memory 1002. The system may comprise more than one processor and more than one memory. The memory may store data that is executable by the processor. The processor may be configured to operate in accordance with a computer program stored in non-transitory form on a machine readable storage medium. The computer program may store instructions for causing the processor to perform its methods in the manner described herein.

The system and method described herein therefore employ a pre-calculation before applying a complex shadow filtering algorithm, such that the complex high-order weighted calculation is avoided in the pixel shader for pixels that are not located at the boundary of a shadow without affecting the perceived rendering quality. In some implementations, this may avoid performing complex calculations for up to 95% of the pixels in an image. The complex cubic calculation is only performed for pixels which are nearby the shadow edges (those normally only account for around 5% of the pixels in an image), while for the other (approximately 95%) pixels, the computation of the eight cubic weights, and all of the weighted calculations along the x and y directions for the sixteen data samples may also be avoided.

Therefore, if the texels of a respective pixel all indicate the same shading state, then the costly weighted shadow fdtering operations can be avoided for the pixel, and additionally, the weight values (vectors A and C in Equation (1)) do not need to be dynamically calculated either. Fewer weighted calculations may result in longer mobile battery life. This may also result in reduced latency and improved frame-rate for complex and demanding game rendering. The pre-calculation can also be used to enhance methods using N x N PCF fdtering, such as 3 x 3 bilinear PCF fdtering, and may avoid performing the complex N x N weighted fdtering calculation for most of the pixels to be shaded.

The solution described herein may therefore avoid the performance of high-order shadow fdtering calculations for most of the pixels on the screen (approximately 95%) and at the same time may achieve a high quality of shadow fdtering for pixels located at the boundaries of the shadows.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Previous Patent: A WINDSCREEN WIPER ARM

Next Patent: IMPROVEMENTS TO ORTHOPAEDIC PINS