Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PARALLELIZING OPENGL RENDERING PIPELINES
Document Type and Number:
WIPO Patent Application WO/2022/250708
Kind Code:
A1
Abstract:
Novel tools and techniques are provided for implementing parallelization of OpenGL rendering pipelines. In various embodiments, a computing system may analyze an image to identify split points therein, may identify any dependencies among split points, may determine a multi-threaded solution to perform parallel rendering of the image based on the identified split points and dependencies, and may push each group of split points to a corresponding thread. Each CPU core among many CPU cores may render one group of split points, each concurrent with rendering by other CPU cores. GPU synchronization mechanisms may be used to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU. The GPU may render data sent from each CPU core, in the order enforced by the GPU synchronization mechanisms, to generate the rendered image, which is subsequently output by the GPU.

Inventors:
SUN HONGYU (US)
LI CHEN (US)
LI CHENGENG (US)
CHANOT AURELIEN (US)
KARUPPANNAN VARUN (US)
Application Number:
PCT/US2021/035038
Publication Date:
December 01, 2022
Filing Date:
May 28, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INNOPEAK TECH INC (US)
SUN HONGYU (US)
LI CHEN (US)
LI CHENGENG (US)
CHANOT AURELIEN (US)
KARUPPANNAN VARUN (US)
International Classes:
G06F15/80
Foreign References:
US20190102859A12019-04-04
US20180308195A12018-10-25
US20090256836A12009-10-15
Attorney, Agent or Firm:
BRATSCHUN, Thomas D. et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A method to parallelize rendering pipelines for rendering an image, comprising: receiving, using a computing system on a user device, a request to render an image; in response to receiving the request to render the image, analyzing, using the computing system, the image to identify a plurality of split points in the image; identifying, using the computing system, any dependencies among two or more split points among the plurality of split points; and rendering, using the computing system, the image by using a multi -threaded solution determined based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points.

2. The method of claim 1, wherein the multi -threaded solution comprises two or more threads, wherein rendering the image by using the multi -threaded solution comprises: concurrently rendering, using first and second central processing unit ("CPU") cores on the user device, first and second groups of split points among the identified plurality of split points based on the determined multi -threaded solution; rendering, using a graphics processing unit ("GPU") on the user device, data sent from each of the first and second CPU cores to generate a rendered image; and outputting, using the GPU, the rendered image.

3. The method of claim 2, wherein concurrently rendering the first and second groups of split points comprises: pushing, using the computing system, the first group of split points to a first thread and the second group of split points to a second thread, based on the determined multi-threaded solution; rendering, using the first CPU core on the user device, the first group of split points in the first thread; and rendering, using the second CPU core on the user device, the second group of split points in the second thread, concurrent with the rendering of the first group of split points by the first CPU core.

4. The method of claim 1, wherein the multi -threaded solution comprises three or more threads, wherein rendering the image by using the multi -threaded solution comprises: pushing, using the computing system, each group of split points among a plurality of groups of split points to a corresponding thread among the three or more threads, based on the determined multi -threaded solution; rendering, using each CPU core among three or more CPU cores on the user device, one group of split points among the plurality of groups of split points, each concurrent with the rendering of the other groups of split points by other CPU cores among the three or more CPU cores; rendering, using the GPU, data sent from each of the three or more CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image; and outputting, using the GPU, the rendered image.

5. The method of claim 4, wherein the three or more threads comprise N threads, wherein the first thread among the three or more threads comprises an original context, wherein the method further comprises: creating, using the computing system, a shared context for each of the second through N111 threads, based on the original context.

6. The method of any of claims 2-5, further comprising: utilizing, using the computing system, one or more GPU synchronization mechanisms to enforce an order of rendering by the GPU by controlling timing with which each CPU core sends rendered data to the GPU; wherein rendering data sent from each of the CPU cores comprises rendering, using the GPU on the user device, data sent from each of the CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image.

7. The method of claim 6, wherein the one or more GPU synchronization mechanisms comprise one of one or more GPU fence commands or one or more delay commands.

8. The method of any of claims 1-7, wherein the computing system comprises at least one of a graphics engine, a graphics rendering engine, a game engine, a three-dimensional ("3D") game engine, a processor on the user device, or at least one CPU core on the user device.

9. The method of any of claims 1-8, wherein the user device comprises one of a portable gaming device, a smart phone, a tablet computer, a laptop computer, a desktop computer, or a server computer.

10. The method of any of claims 1-9, wherein the image comprises one of a two-dimensional ("2D") image, a three-dimensional ("3D") image, a video, a 2D frame of a video, a 3D frame of a video, a group of 2D images, or a group of 3D images.

11. The method of any of claims 1-10, wherein the plurality of split points comprises at least one of one or more frames, one or more render layers, one or more render passes, or one or more draw calls.

12. The method of claim 11, wherein: the one or more frames comprise frames in a video; the one or more render layers comprise one or more layers corresponding to different objects in the image; the one or more render passes comprise at least one of one or more shadow passes, one or more lighting passes, one or more main color passes, one or more secondary color passes, one or more post-processing passes, one or more highlight passes, one or more reflection passes, or one or more user interface ("UI") passes; and the one or more draw calls comprise one or more calls to a graphics application programming interface ("API") to draw one or more objects.

13. The method of any of claims 1-12, wherein the plurality of split points comprise one or more render passes, wherein identifying the plurality of split points in the image comprises detecting, using the computing system, occurrence of framebuffer resets, each framebuffer reset being indicative of a beginning of a new render pass.

14. The method of any of claims 1-13, wherein identifying any dependencies among two or more split points among the plurality of split points comprises tracking, using the computing system, all dependencies among the plurality of split points.

15. The method of claim 14, wherein tracking all dependencies among the plurality of split points comprises tracking, using the computing system, all dependencies among the plurality of split points by identifying and storing all resources used to render the image, and by determining and storing which resources are used by which split points.

16. The method of any of claims 1-15, wherein the rendering pipelines comprise Open Graphics Library ("OpenGL") cross-language, cross-platform API- based rendering pipelines.

17. A method to parallelize rendering pipelines for generating and outputting a rendered image, comprising: receiving, using a computing system on a user device, a request to render an image; in response to receiving the request to render the image, analyzing, using the computing system, the image to identify a plurality of split points in the image; identifying, using the computing system, any dependencies among two or more split points among the plurality of split points; determining, using the computing system, a multi -threaded solution to perform parallel rendering of the image based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points; pushing, using the computing system, a first group of split points among the identified plurality of split points to a first thread and a second group of split points among the identified plurality of split points to a second thread, based on the determined multi -threaded solution; rendering, using a first central processing unit ("CPU") core on the user device, the first group of split points in the first thread; rendering, using a second CPU core on the user device, the second group of split points in the second thread, concurrent with the rendering of the first group of split points by the first CPU core; utilizing, using the computing system, one or more graphics processing unit ("GPU") synchronization mechanisms to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU; rendering, using the GPU on the user device, data sent from each of the first and second CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate a rendered image; and outputting, using the GPU, the rendered image.

18. An apparatus that is used to parallelize rendering pipelines for rendering an image, comprising: at least one processor; and a non-transitory computer readable medium communicatively coupled to the at least one processor, the non-transitory computer readable medium having stored thereon computer software comprising a set of instructions that, when executed by the at least one processor, causes the apparatus to: receive a request to render an image; in response to receiving the request to render the image, analyze the image to identify a plurality of split points in the image; identify any dependencies among two or more split points among the plurality of split points; and render the image by using a multi -threaded solution determined based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points.

19. A system that is used to parallelize rendering pipelines for rendering an image, comprising: a computing system, comprising: at least one first processor; and a first non-transitory computer readable medium communicatively coupled to the at least one first processor, the first non-transitory computer readable medium having stored thereon computer software comprising a first set of instructions that, when executed by the at least one first processor, causes the computing system to: receive a request to render an image; in response to receiving the request to render the image, analyze the image to identify a plurality of split points in the image; identify any dependencies among two or more split points among the plurality of split points; and render the image by using a multi -threaded solution determined based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points.

Description:
PARALLELIZING OPENGL RENDERING PIPELINES

COPYRIGHT STATEMENT

[0001] A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

FIELD

[0002] The present disclosure relates, in general, to methods, systems, and apparatuses for implementing two-dimensional ("2D") and/or three-dimensional ("3D") rendering, and, more particularly, to methods, systems, and apparatuses for implementing parallelization of OpenGL rendering pipelines.

BACKGROUND

[0003] Online mobile games have evolved drastically over the past few years and now is the competition center for smartphone competitiveness especially with the upgrade of the 5G network. The best and most optimized approach to develop mobile 3D games is using appropriate cross-platform 3D Mobile Game engines. The core functions of game engines are 3D rendering graphics, Physics Engine, artificial intelligence ("AI"), Memory Management etc. Traditionally, the game engines are optimized for mobile platforms but not necessary for a specific phone or system on a chip ("SoC").

[0004] Most of the time, original equipment manufacturers ("OEMs") are trying to improve the game performances by boosting the resources available, like central processing unit ("CPU"), graphics processing unit ("GPU"), or random-access memory ("RAM") frequencies. The inconvenience of such a method is the extra power consumption which necessarily occurs.

[0005] In OpenGL, parallelism is not officially supported but the specification allows having multiple contexts that can share some data. A programmer or a game engine will only parallelize small parts of the rendering, mostly to do some background work like uploading new textures or data.

[0006] In one solution to improve parallelism, conventional techniques decompose the image into different objects that can be rendered separately, which may allow for more flexible by moving some work on other threads or contexts. Unfortunately, such a technique must be implemented at the engine level as it needs to be aware, or needs to easily access game data, in order to separate the objects in each frame or image to be rendered separately.

[0007] Hence, there is a need for more robust and scalable solutions for implementing 2D and/or 3D rendering, and, more particularly, to methods, systems, and apparatuses for implementing parallelization of OpenGL rendering pipelines.

SUMMARY

[0008] The techniques of this disclosure generally relate to tools and techniques for implementing 2D and/or 3D rendering, and, more particularly, to methods, systems, and apparatuses for implementing parallelization of OpenGL rendering pipelines.

[0009] In an aspect, a method may be provided to parallelize rendering pipelines for rendering an image. The method may comprise receiving, using a computing system on a user device, a request to render an image; in response to receiving the request to render the image, analyzing, using the computing system, the image to identify a plurality of split points in the image; identifying, using the computing system, any dependencies among two or more split points among the plurality of split points; and rendering, using the computing system, the image by using a multi threaded solution determined based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points.

[0010] In some embodiments, the multi -threaded solution may comprise two or more threads, and rendering the image by using the multi -threaded solution may comprise: concurrently rendering, using first and second central processing unit ("CPU") cores on the user device, first and second groups of split points among the identified plurality of split points based on the determined multi -threaded solution; rendering, using a graphics processing unit ("GPU") on the user device, data sent from each of the first and second CPU cores to generate a rendered image; and outputting, using the GPU, the rendered image. In some instances, concurrently rendering the first and second groups of split points may comprise: pushing, using the computing system, the first group of split points to a first thread and the second group of split points to a second thread, based on the determined multi -threaded solution; rendering, using the firstCPU core on the user device, the first group of split points in the first thread; and rendering, using the second CPU core on the user device, the second group of split points in the second thread, concurrent with the rendering of the first group of split points by the first CPU core.

[0011] Merely by way of example, in some cases, the multi -threaded solution may comprise three or more threads. In such cases, rendering the image by using the multi -threaded solution may comprise pushing, using the computing system, each group of split points among a plurality of groups of split points to a corresponding thread among the three or more threads, based on the determined multi -threaded solution; rendering, using each CPU core among three or more CPU cores on the user device, one group of split points among the plurality of groups of split points, each concurrent with the rendering of the other groups of split points by other CPU cores among the three or more CPU cores; rendering, using the GPU, data sent from each of the three or more CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image; and outputting, using the GPU, the rendered image. In some embodiments, the three or more threads may comprise N threads. In such cases, the first thread among the three or more threads may comprise an original context, and the method may further comprise creating, using the computing system, a shared context for each of the second through N 111 threads, based on the original context.

[0012] According to some embodiments, the method may further comprise: utilizing, using the computing system, one or more GPU synchronization mechanisms to enforce an order of rendering by the GPU by controlling timing with which each CPU core sends rendered data to the GPU; wherein rendering data sent from each of the CPU cores comprises rendering, using the GPU on the user device, data sent from each of the CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image. In some cases, the one or more GPU synchronization mechanisms may comprise one of one or more GPU fence commands or one or more delay commands.

[0013] In some embodiments, the computing system may comprise at least one of a graphics engine, a graphics rendering engine, a game engine, a three-dimensional ("3D") game engine, a processor on the user device, or at least one CPU core on the user device, and/or the like. In some cases, the user device may comprise one of a portable gaming device, a smart phone, a tablet computer, a laptop computer, a desktop computer, or a server computer, and/or the like. In some instances, the image may comprise one of a two-dimensional ("2D") image, a three-dimensional ("3D") image, a video, a 2D frame of a video, a 3D frame of a video, a group of 2D images, or a group of 3D images, and/or the like.

[0014] According to some embodiments, the plurality of split points may comprise at least one of one or more frames, one or more render layers, one or more render passes, or one or more draw calls, and/or the like. In some cases, the one or more frames may comprise frames in a video. In some instances, the one or more render layers may comprise one or more layers corresponding to different objects in the image. In some cases, the one or more render passes may comprise at least one of one or more shadow passes, one or more lighting passes, one or more main color passes, one or more secondary color passes, one or more post-processing passes, one or more highlight passes, one or more reflection passes, or one or more user interface ("UI") passes. In some instances, the one or more draw calls may comprise one or more calls to a graphics application programming interface ("API") to draw one or more objects.

[0015] Merely by way of example, in some cases, where the plurality of split points may comprise one or more render passes, identifying the plurality of split points in the image may comprise detecting, using the computing system, occurrence of framebuffer resets, each framebuffer reset being indicative of a beginning of a new render pass.

[0016] In some embodiments, identifying any dependencies among two or more split points among the plurality of split points may comprise tracking, using the computing system, all dependencies among the plurality of split points. In some instances, tracking all dependencies among the plurality of split points may comprise tracking, using the computing system, all dependencies among the plurality of split points by identifying and storing all resources used to render the image, and by determining and storing which resources are used by which split points.

[0017]

[0018] According to some embodiments, the rendering pipelines may comprise Open Graphics Library ("OpenGL") cross-language, cross-platform API-based rendering pipelines, or the like.

[0019] In another aspect, a method may be provided to parallelize rendering pipelines for generating and outputting a rendered image. The method may comprise receiving, using a computing system on a user device, a request to render an image; in response to receiving the request to render the image, analyzing, using the computing system, the image to identify a plurality of split points in the image; identifying, using the computing system, any dependencies among two or more split points among the plurality of split points; and determining, using the computing system, a multi threaded solution to perform parallel rendering of the image based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points. The method may also comprise pushing, using the computing system, a first group of split points among the identified plurality of split points to a first thread and a second group of split points among the identified plurality of split points to a second thread, based on the determined multi -threaded solution; rendering, using a first central processing unit ("CPU") core on the user device, the first group of split points in the first thread; and rendering, using a second CPU core on the user device, the second group of split points in the second thread, concurrent with the rendering of the first group of split points by the first CPU core. The method may further comprise utilizing, using the computing system, one or more graphics processing unit ("GPU") synchronization mechanisms to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU; rendering, using the GPU on the user device, data sent from each of the first and second CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate a rendered image; and outputting, using the GPU, the rendered image. [0020] In yet another aspect, an apparatus, which may be provided to parallelize rendering pipelines for rendering an image, may comprise at least one processor and a non-transitory computer readable medium communicatively coupled to the at least one processor. The non-transitory computer readable medium might have stored thereon computer software comprising a set of instructions that, when executed by the at least one processor, causes the apparatus to: receive a request to render an image; in response to receiving the request to render the image, analyze the image to identify a plurality of split points in the image; identify any dependencies among two or more split points among the plurality of split points; and render the image by using a multi threaded solution determined based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points.

[0021] In still another aspect, a system, which may be provided to parallelize rendering pipelines for rendering an image, may comprise at least one first processor and a first non-transitory computer readable medium communicatively coupled to the at least one first processor. The first non-transitory computer readable medium might have stored thereon computer software comprising a first set of instructions that, when executed by the at least one first processor, causes the computing system to: receive a request to render an image; in response to receiving the request to render the image, analyze the image to identify a plurality of split points in the image; identify any dependencies among two or more split points among the plurality of split points; and render the image by using a multi -threaded solution determined based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points.

[0022] Various modifications and additions can be made to the embodiments discussed without departing from the scope of the invention. For example, while the embodiments described above refer to particular features, the scope of this invention also includes embodiments having different combination of features and embodiments that do not include all of the above-described features.

[0023] The details of one or more aspects of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the techniques described in this disclosure will be apparent from the description and drawings, and from the claims. BRIEF DESCRIPTION OF THE DRAWINGS

[0024] A further understanding of the nature and advantages of particular embodiments may be realized by reference to the remaining portions of the specification and the drawings, in which like reference numerals are used to refer to similar components. In some instances, a sub-label is associated with a reference numeral to denote one of multiple similar components. When reference is made to a reference numeral without specification to an existing sub-label, it is intended to refer to all such multiple similar components.

[0025] Fig. l is a schematic diagram illustrating a system for implementing parallelization of OpenGL rendering pipelines, in accordance with various embodiments.

[0026] Figs. 2A and 2B are schematic block flow diagrams illustrating various non-limiting examples of render pass-level parallelization of OpenGL rendering pipelines, in accordance with various embodiments.

[0027] Fig. 3 A is a schematic block flow diagram illustrating a non-limiting example of the use of shared contexts during implementation of parallelization of OpenGL rendering pipelines, in accordance with various embodiments.

[0028] Figs. 3B and 3C are schematic block flow diagrams that together illustrate a non-limiting example of the use of synchronization mechanisms during implementation of parallelization of OpenGL rendering pipelines, in accordance with various embodiments.

[0029] Figs. 4A-4D are flow diagrams illustrating a method for implementing parallelization of OpenGL rendering pipelines, in accordance with various embodiments.

[0030] Fig. 5 is a block diagram illustrating an example of computer or system hardware architecture, in accordance with various embodiments.

[0031] Fig. 6 is a block diagram illustrating a networked system of computers, computing systems, or system hardware architecture, which can be used in accordance with various embodiments. DETAILED DESCRIPTION

[0032] Overview

[0033] Various embodiments provide tools and techniques for implementing two- dimensional ("2D") and/or three-dimensional ("3D") rendering, and, more particularly, to methods, systems, and apparatuses for implementing parallelization of OpenGL rendering pipelines.

[0034] In various embodiments, a computing system may receive a request to render an image. In response to receiving the request to render the image, the computing system may analyze the image to identify a plurality of split points in the image. According to some embodiments, the image may include, but is not limited to, one of a two-dimensional ("2D") image, a three-dimensional ("3D") image, a video, a 2D frame of a video, a 3D frame of a video, a group of 2D images, or a group of 3D images, and/or the like. In some embodiments, the plurality of split points may include, but is not limited to, at least one of one or more frames, one or more render layers, one or more render passes, or one or more draw calls, and/or the like. In some cases, the one or more frames may include, without limitation, frames in a video, or the like. In some instances, the one or more render layers may include, but are not limited to, one or more layers corresponding to different objects in the image, or the like. In some cases, the one or more render passes may include, without limitation, at least one of one or more shadow passes, one or more lighting passes, one or more main color passes, one or more secondary color passes, one or more post-processing passes, one or more highlight passes, one or more reflection passes, or one or more user interface ("UI") passes, and/or the like. In some instances, the one or more draw calls may include, but are not limited to, one or more calls to a graphics application programming interface ("API") to draw one or more objects, or the like. In some embodiments, the plurality of split points may include one or more render passes, where identifying the plurality of split points in the image may comprise the computing system detecting occurrence of framebuffer resets, each framebuffer reset being indicative of a beginning of a new render pass.

[0035] The computing system may identify any dependencies among two or more split points among the plurality of split points. In some cases, identifying any dependencies among two or more split points among the plurality of split points may comprise the computing system tracking all dependencies among the plurality of split points. In some instances, tracking all dependencies among the plurality of split points may comprise the computing system tracking all dependencies among the plurality of split points by identifying and storing all resources used to render the image, and by determining and storing which resources are used by which split points, and/or the like.

[0036] The computing system may determine a multi-threaded solution to perform parallel rendering of the image based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points. The computing system may push each group of split points among a plurality of groups of split points (i.e., two, three, or more groups of split points, etc.) to a corresponding thread among a plurality of threads (i.e., two, three, or more threads, etc.), based on the determined multi -threaded solution.

[0037] Each CPU among the plurality of CPUs may render one group of split points among the plurality of groups of split points, each concurrent with the rendering of the other groups of split points by other CPU cores among the plurality of CPU cores.

[0038] According to some embodiments, the computing system may utilize one or more GPU synchronization mechanisms to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU. In some embodiments, the one or more GPU synchronization mechanisms may include, without limitation, one of one or more GPU fence commands or one or more delay commands, and/or the like.

[0039] The GPU may render data sent from each of the plurality of CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image. The GPU may subsequently output the rendered image. In this manner, parallelizing rendering pipelines may be achieved. In some embodiments, the rendering pipelines may include, without limitation, Open Graphics Library ("OpenGL") cross-language, cross-platform API-based rendering pipelines, or the like.

[0040] In the various aspects described herein, the multi-context rendering can split a frame into multiple pieces in order to parallelize the work. This allows the system to take advantage of the multiple CPU and GPU cores. Parallelizing the rendering of a frame allows one to find a better balance between power consumption and performance, particularly for image-processing-intensive processes (e.g., 3D game rendering, etc.) on mobile devices (e.g., portable gaming devices, smart phones, tablet computers, laptop computers, etc.).

[0041] The system can decompose the frame in several parts. Each part has dependencies to other parts or OpenGL states. The easiest and simplest decomposition and parallelization is frame by frame, with each frame being put on a different context. But most of the time, a frame can be split into several stages. For example, a frame can be split at the render passes level (e.g., shadow pass, lighting pass, main color pass, post-processing pass, UI pass, etc.). As described in detail below, the system achieves the goal by detecting the split point (e.g., frame, pass, or other split point, etc.), tracking the dependencies between split points, pushing work on different parallel threads, and using GPU synchronization mechanisms to enforce coherence.

[0042] Further, splitting a frame between multiple contexts can take advantage of multiple CPU cores and GPU(s), while also further parallelizing the rendering work. Moreover, the system can offload some rendering work outside of the current GPU as it uses a different context. This will enable the user device to have a better overall power consumption.

[0043] These and other aspects of the system and method for parallelization of OpenGL rendering pipelines are described in greater detail with respect to the figures. [0044] The following detailed description illustrates a few embodiments in further detail to enable one of skill in the art to practice such embodiments. The described examples are provided for illustrative purposes and are not intended to limit the scope of the invention.

[0045] In the following description, for the purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the described embodiments. It will be apparent to one skilled in the art, however, that other embodiments of the present invention may be practiced without some of these details. In other instances, some structures and devices are shown in block diagram form. Several embodiments are described herein, and while various features are ascribed to different embodiments, it should be appreciated that the features described with respect to one embodiment may be incorporated with other embodiments as well. By the same token, however, no single feature or features of any described embodiment should be considered essential to every embodiment of the invention, as other embodiments of the invention may omit such features.

[0046] Unless otherwise indicated, all numbers used herein to express quantities, dimensions, and so forth used should be understood as being modified in all instances by the term "about." In this application, the use of the singular includes the plural unless specifically stated otherwise, and use of the terms "and" and "or" means "and/or" unless otherwise indicated. Moreover, the use of the term "including," as well as other forms, such as "includes" and "included," should be considered non exclusive. Also, terms such as "element" or "component" encompass both elements and components comprising one unit and elements and components that comprise more than one unit, unless specifically stated otherwise.

[0047] Various embodiments as described herein - while embodying (in some cases) software products, computer-performed methods, and/or computer systems - represent tangible, concrete improvements to existing technological areas, including, without limitation, 2D image processing or rendering technology, 3D image processing or rendering technology, 2D game rendering technology, 3D game rendering technology, and/or the like. In other aspects, some embodiments can improve the functioning of user equipment or systems themselves (e.g., 2D image processing or rendering systems, 3D image processing or rendering systems, 2D game rendering systems, 3D game rendering systems, etc.), for example, by analyzing, using a computing system, the image to identify a plurality of split points in the image; identifying, using the computing system, any dependencies among two or more split points among the plurality of split points; determining, using the computing system, a multi -threaded solution to perform parallel rendering of the image based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points; pushing, using the computing system, a first group of split points to a first thread and a second group of split points to a second thread, based on the determined multi -threaded solution; rendering, using a first central processing unit ("CPU") core on the user device, the first group of split points in the first thread; rendering, using a second CPU core on the user device, the second group of split points in the second thread, concurrent with the rendering of the first group of split points by the first CPU core; utilizing, using the computing system, one or more graphics processing unit ("GPU") synchronization mechanisms to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU; rendering, using the GPU on the user device, data sent from each of the first and second CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate a rendered image; and outputting, using the GPU, the rendered image; and/or the like.

[0048] In particular, to the extent any abstract concepts are present in the various embodiments, those concepts can be implemented as described herein by devices, software, systems, and methods that involve novel functionality (e.g., steps or operations), such as, detecting the split point (e.g., frame, pass, or other split point, etc.), tracking the dependencies between split points, pushing work on different parallel threads, and using GPU synchronization mechanisms to enforce coherence, and/or the like, to name a few examples, that extend beyond mere conventional computer processing operations. These functionalities can produce tangible results outside of the implementing computer system, including, merely by way of example, optimized parallelized rendering of images, videos, game image frames, etc. that takes into account dependencies between split points, order of rendering due to dependencies by use of GPU synchronization mechanisms, and/or the like, thereby achieving improved overall power consumption and efficiency, at least some of which may be observed or measured by users, game/content developers, and/or user device manufacturers.

[0049] Some Embodiments

[0050] We now turn to the embodiments as illustrated by the drawings. Figs. 1-6 illustrate some of the features of the method, system, and apparatus for implementing two-dimensional ("2D") and/or three-dimensional ("3D") rendering, and, more particularly, to methods, systems, and apparatuses for implementing parallelization of OpenGL rendering pipelines, as referred to above. The methods, systems, and apparatuses illustrated by Figs. 1-6 refer to examples of different embodiments that include various components and steps, which can be considered alternatives or which can be used in conjunction with one another in the various embodiments. The description of the illustrated methods, systems, and apparatuses shown in Figs. 1-6 is provided for purposes of illustration and should not be considered to limit the scope of the different embodiments. [0051] With reference to the figures, Fig. l is a schematic diagram illustrating a system 100 for implementing parallelization of OpenGL rendering pipelines, in accordance with various embodiments.

[0052] In the non-limiting embodiment of Fig. 1, system 100 may comprise user device 105, which may include, but is not limited to, one of a portable gaming device, a smart phone, a tablet computer, a laptop computer, a desktop computer, or a server computer, and/or the like. In some embodiments, user device 105 may include, without limitation, at least one of computing system(s) 110, data storage 130, communications system 135, display screen 140 (optional), or audio playback device 145 (optional), and/or the like.

[0053] In some instances, computing system(s) 110 may include, but is not limited to, a plurality of central processing unit ("CPU") cores 115a-l 15n (collectively, "CPU cores 115" or the like) and a graphics processing unit ("GPU")

120. In some cases, computing system(s) 110 may further include one or more other processors 125. One or more of the CPU cores 115 and/or one or more other processors 125 may be used to perform orchestration, management, render engine coordination, and/or operating system ("OS") processing functionalities. Other CPU cores 115 may be used to rendering or image processing of split points of images, while the GPU 120 may be used to render images based on the rendered split points of images from the CPU cores 115. In some embodiments, computing system(s) 110 and/or at least one other processor 125 may each include, without limitation, at least one of a graphics engine, a graphics rendering engine, a game engine, or a three- dimensional ("3D") game engine, and/or the like.

[0054] The data storage 130 may include, but is not limited to, at least one of read-only memory ("ROM"), programmable read-only memory ("PROM"), erasable programmable read-only memory ("EPROM"), electrically erasable programmable read-only memory ("EEPROM"), flash memory, other non-volatile memory devices, random-access memory ("RAM"), static random-access memory ("SRAM"), dynamic random-access memory ("DRAM"), synchronous dynamic random-access memory ("SDRAM"), virtual memory, a RAM disk, or other volatile memory devices, non volatile RAM devices, and/or the like.

[0055] The communications system 135 may include wireless communications devices capable of communicating using protocols including, but not limited to, at least one of Bluetooth™ communications protocol, WiFi communications protocol, or other 802.11 suite of communications protocols, ZigBee communications protocol, Z-wave communications protocol, or other 802.15.4 suite of communications protocols, cellular communications protocol (e.g., 3G, 4G, 4GLTE, 5G, etc.), or other suitable communications protocols, and/or the like.

[0056] Some user devices (e.g., a portable gaming device, a smart phone, a tablet computer, a laptop computer, etc.) may each include at least one integrated display screen 140 (in some cases, including a non-touchscreen display screen(s), while, in other cases, including a touchscreen display screen(s), and, in still other cases, including a combination of at least one non-touchscreen display screen and at least one touchscreen display screen) and at least one integrated audio playback device 145 (e.g., built-in speakers or the like). Some user devices (e.g., a desktop computer, or a server computer, or the like) may each include at least one external display screen or monitor (not shown) (which may be a non-touchscreen display device or a touchscreen display device, or the like) and at least one integrated audio playback device 145 (e.g., built-in speakers, etc.) and/or at least one external audio playback device (not shown; e.g., external or peripheral speakers, wired earphones, wired earbuds, wired headphones, wireless earphones, wireless earbuds, wireless headphones, or the like). Some user devices (e.g., some desktop computers, or some server computers, or the like) may have neither an integrated display screen nor an external display screen.

[0057] System 100 may further comprise one or more content sources 150 and corresponding database(s) 155 that communicatively couple with user device 105 via network(s) 160 (and via communications system 135) to provide image data 165 for the computing system(s) 110 to render or process, as described in detail below. The resultant rendered images 170 may be sent to content distribution system 175 and corresponding database(s) 180 (via network(s) 160 and via communications system 135) for storage and/or distribution to other devices (e.g., display devices 185a-185n). In some cases, user device 105 may directly send the rendered images 170 to one or more display devices 185a-185n (collectively, "display devices 185" or the like), which may each include, but are not limited to, at least one of a smart television (directly), a television (indirectly) via a set-top-box or other intermediary media player, a monitor or digital display panel (directly), a monitor or digital display panel (indirectly) via an externally connected user device (e.g., desktop computer, server computer, etc.), etc. The lightning bolt symbols are used to denote wireless communications between communications system 135 and network(s) 160 (in some cases, via network access points or the like (not shown)), between communications system 135 and at least one of the one or more display devices 185a-185n, and between network(s) 160 and at least one of the one or more display devices 185a- 185n (in some cases, via network access points or the like (not shown)).

[0058] In operation, computing system(s) 110, at least one CPU core 115, GPU 120, and/or at least one other processor 125 of user device 105 (collectively, "computing system) may receive a request to render an image (in some cases, from one or more of at least one other CPU core 115, the GPU 120, at least one other processor 125, content source(s) 150, content distribution system 175, at least one display deice 185, or some other requesting device (not shown), or the like). In response to receiving the request to render the image, the computing system may (receive and) analyze the image (or the image data 165, or the like) to identify a plurality of split points in the image (or image data 165).

[0059] According to some embodiments, the image (or image data 165) may include, but is not limited to, one of a two-dimensional ("2D") image, a three- dimensional ("3D") image, a video, a 2D frame of a video, a 3D frame of a video, a group of 2D images, or a group of 3D images, and/or the like. In some embodiments, the plurality of split points may include, but is not limited to, at least one of one or more frames, one or more render layers, one or more render passes, or one or more draw calls, and/or the like. In some cases, the one or more frames may include, without limitation, frames in a video, or the like. In some instances, the one or more render layers may include, but are not limited to, one or more layers corresponding to different objects in the image (or image data 165), or the like. In some cases, the one or more render passes may include, without limitation, at least one of one or more shadow passes, one or more lighting passes, one or more main color passes, one or more secondary color passes, one or more post-processing passes (e.g., one or more blur passes, one or more sharpening passes, one or more blur/sharpening passes, one or more filtering passes, etc.), one or more highlight passes, one or more reflection passes, or one or more user interface ("UI") passes, and/or the like. In some instances, the one or more draw calls may include, but are not limited to, one or more calls to a graphics application programming interface ("API") to draw one or more objects, or the like. In some embodiments, the plurality of split points may include one or more render passes, where identifying the plurality of split points in the image (or image data 165) may comprise the computing system detecting occurrence of framebuffer resets, each framebuffer reset being indicative of a beginning of a new render pass.

[0060] The computing system may identify any dependencies among two or more split points among the plurality of split points. In some cases, identifying any dependencies among two or more split points among the plurality of split points may comprise the computing system tracking all dependencies among the plurality of split points. In some instances, tracking all dependencies among the plurality of split points may comprise the computing system tracking all dependencies among the plurality of split points by identifying and storing all resources used to render the image (or image data 165), and by determining and storing which resources are used by which split points, and/or the like.

[0061] The computing system may determine a multi-threaded solution to perform parallel rendering of the image (or image data 165) based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points. The computing system may push (or otherwise send or transfer) each group of split points among a plurality of groups of split points (i.e., two, three, or more groups of split points, etc.) to a corresponding thread among a plurality of threads (i.e., two, three, or more threads, etc.), based on the determined multi -threaded solution. In some cases, the plurality of threads may comprise N threads, and the computing system may create a shared context for each of second through N 111 threads among the plurality of threads, based on the original context. Figs. 2A and 2B depict non-limiting examples of multi -threaded or parallelized rendering pipeline implementation, in according with various embodiments, while Fig. 3 A depicts a non limiting example of multi -threaded or parallelized rendering pipeline implementation using shared contexts.

[0062] Each CPU among the plurality of CPUs 115 may render one group of split points among the plurality of groups of split points, each concurrent with the rendering of the other groups of split points by other CPU cores among the plurality of CPU cores 115. For instance, if the multi -threaded solution includes two concurrent threads, e.g., using a first CPU core 115a and a second CPU core 115b, respectively, then the first CPU core 115a may render a first group of split points in the first thread, while the second CPU core 115b may render a second group of split points in the second thread, concurrent with the rendering of the first group of split points by the first CPU core 115a. Alternatively, if the multi -threaded solution includes three concurrent threads, e.g., using a first CPU core 115a, a second CPU core 115b, and a third CPU core 115c, respectively, then the first CPU core 115a may render a first group of split points in the first thread, while the second CPU core 115b may render a second group of split points in the second thread, and the third CPU core 115c may render a third group of split points in the third thread, each concurrent with the rendering of the other of first, second, or third group of split points by corresponding first, second, or third CPU core 115a, 115b, or 115c, respectively. And so on, for up to N cores (where N is any suitable integer number of cores greater than 1), which may correspond to the total number of cores contained within computing system 110 of user device 105.

[0063] In some cases, due to dependencies or other logical reasons or the like, some split points cannot properly be rendered or image-processed by the GPU before another one or more split points. In such instances, an order of rendering by the GPU should be enforced. According to some embodiments, such rendering order enforcement may be implemented by the computing system utilizing one or more GPU synchronization mechanisms to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU. In some embodiments, the one or more GPU synchronization mechanisms may include, without limitation, one of one or more GPU fence commands or one or more delay commands, and/or the like. Figs. 3B and 3C depict the use of GPU fence commands in enforcing an order of rendering of split points by the GPU.

[0064] The GPU may render data sent from each of the plurality of CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image(s) 170. The GPU may subsequently output the rendered image(s) 170 directly or indirectly (in some cases, via wireless communications links, as depicted by the lightning bolt symbols in Fig. 1, or the like) with at least one of content source(s) 150 (and/or corresponding database(s) 155), content distribution system(s) 175 (and/or corresponding database(s) 180), and/or at least one display device 185 among the one or more display devices 185a-185n, or the like. In this manner, parallelizing rendering pipelines may be achieved. In some embodiments, the rendering pipelines may include, without limitation, Open Graphics Library ("OpenGL") cross-language, cross-platform API-based rendering pipelines, or the like.

[0065] These and other functions of the system 100 (and its components) are described in greater detail below with respect to Figs. 2-4.

[0066] Figs. 2A and 2B (collectively, "Fig. 2") are schematic block flow diagrams illustrating various non-limiting examples 200 and 200' of render pass-level parallelization of OpenGL rendering pipelines, in accordance with various embodiments. Fig. 2A depicts a double-threaded implementation, while Fig. 2B depicts a triple-threaded implementation. Although two or three parallelized rendering pipelines are shown in Fig. 2 for generating and outputting a rendered image, the various embodiments are not so limited, and any number of threads may be used for parallelizing rendering pipelines for generating and outputting rendered images. In some cases, each thread may be processed using a corresponding CPU core among a plurality of CPU cores. In such cases, for N number of CPU cores available on a user device for image processing and/or rendering, the system may utilize up to N number of threads that provide for concurrent, parallelized rendering pipelines for generating and outputting rendered images.

[0067] In the non-limiting examples of Fig. 2, a computing system (such as the computing system of Fig. 1, or the like) may identify a plurality of split points in the image (or image data), in this case, a plurality of render passes, including, but not limited to, at least one of one or more shadow passes, one or more lighting passes, one or more main color passes, one or more secondary color passes, one or more post processing passes (e.g., one or more blur passes, one or more sharpening passes, one or more blur/sharpening passes, one or more filtering passes, etc.), one or more highlight passes, one or more reflection passes, or one or more user interface ("UI") passes, and/or the like. The computing system may identify the plurality of render passes by detecting occurrence of framebuffer resets, each framebuffer reset being indicative of a beginning of a new render pass.

[0068] The computing system may then identify any dependencies among two or more render passes among the plurality of render passes. In some cases, identifying any dependencies among two or more render passes among the plurality of render passes may comprise the computing system tracking all dependencies among the plurality of render passes. In some instances, tracking all dependencies among the plurality of render passes may comprise the computing system tracking all dependencies among the plurality of render passes by identifying and storing all resources used to render the image, and by determining and storing which resources are used by which render passes, and/or the like.

[0069] The computing system may then determine a multi -threaded solution to perform parallel rendering of the image based at least in part on the identified plurality of render passes and the identified dependencies among the two or more render passes. The computing system may push (or otherwise send or transfer) each group of render passes among a plurality of groups of render passes (i.e., two, three, or more groups of render passes, etc.) to a corresponding thread among a plurality of threads (i.e., two, three, or more threads, etc.), based on the determined multi -threaded solution.

[0070] Turning to the non-limiting example 200 of Fig. 2A, the render passes that the computing system may identify may include, but are not limited to, a color pass 205, a shadow pass 210, a high-dynamic-range ("HDR") rendering pass 220, a UI pass 230, and a final pass 235. The computing system may determine that the HDR rendering pass 220 may be dependent on the color pass 205 and the shadow pass 210, and may also determine that the final pass 235 may be dependent on the HDR rendering pass 220 and the UI pass 230. In such a case, a single-threaded implementation involving a first thread 240a may be used in which a first CPU (e.g., CPU 115a of Fig. 1, or the like) may image-process or render the following render passes in sequence: color pass 205, shadow pass 210, HDR rendering pass 220, UI pass 230, and then final pass 235.

[0071] Rather than this single-threaded implementation, however, the computing system may determine that a multi -threaded solution involving two concurrent threads 240a and 240b may be used. In such a case, a first thread 240a may be used in which a first CPU (e.g., CPU 115a of Fig. 1, or the like) may image-process or render the following render passes in sequence: color pass 205, HDR rendering pass 220, and then final pass 235. Concurrent with the first thread 240a image-processing or rendering the above-mentioned render passes in sequence, a second thread 240b may be used in which a second CPU (e.g., CPU 115b of Fig. 1, or the like) may image- process or render the following render passes in sequence: shadow pass 210 and UI pass 230. Because HDR rendering pass 220 is dependent on both the color pass 205 and the shadow pass 210 (depicted in Fig. 2 A by the HDR pass 220 being connected by arrows from color pass 205 and from shadow pass 210), both of these passes must be image-processed or rendered by the first and second CPUs 115a and 115b in the first and second threads 240a and 240b, respectively, before the first CPU 115a in the first thread 240a can begin image-processing or rendering the HDR rendering pass 220. Similarly, because final pass 235 is dependent on both the HDR rendering pass 220 and the UI pass 230 (depicted in Fig. 2A by the final pass 235 being connected by arrows from HDR rendering pass 220 and from UI pass 230), both of these passes must be image-processed or rendered by the first and second CPUs 115a and 115b in the first and second threads 240a and 240b, respectively, before the first CPU 115a in the first thread 240a can begin image-processing or rendering the final pass 235. [0072] In this manner, by parallelizing rendering pipelines (although not officially supported by OpenGL), efficiency can be improved, while reducing overall time for rendering images, especially with the availability of multiple CPU cores and GPUs in modem user devices (e.g., portable gaming devices, smart phones, tablet computers, laptop computers, etc.).

[0073] With reference to Fig. 2B, the render passes that the computing system may identify may include, but are not limited to, a color pass 205, a shadow pass 210, a lighting pass 215, a HDR rendering pass 220, a blur/sharpening pass 225, a UI pass 230, and a final pass 235. The computing system may determine that the HDR rendering pass 220 may be dependent on the color pass 205, the shadow pass 210, and the lighting pass 215, and may also determine that the final pass 235 may be dependent on the HDR rendering pass 220, the UI pass 230, and the blur/sharpening pass 225.

[0074] The computing system may determine that a multi -threaded solution involving three concurrent threads 240a, 240b, and 240c may be used. In such a case, a first thread 240a may be used in which a first CPU (e.g., CPU 115a of Fig. 1, or the like) may image-process or render the following render passes in sequence: color pass 205, HDR rendering pass 220, and then final pass 235. Concurrent with the first thread 240a image-processing or rendering the above-mentioned render passes in sequence, a second thread 240b may be used in which a second CPU (e.g., CPU 115b of Fig. 1, or the like) may image-process or render the following render passes in sequence: shadow pass 210 and UI pass 230. Concurrent with the first thread 240a and the second thread 240b image-processing or rendering the above-mentioned corresponding render passes in sequence, a third thread 240c may be used in which a third CPU (e.g., CPU 115c of Fig. 1, or the like) may image-process or render the following render passes in sequence: lighting pass 215 and blur/sharpening pass 225. [0075] Because HDR rendering pass 220 is dependent on the color pass 205, the shadow pass 210, and the lighting pass 215 (depicted in Fig. 2B by the HDR pass 220 being connected by arrows from color pass 205, from shadow pass 210, and from lighting pass 215), all three of these passes must be image-processed or rendered by the first through third CPUs 115a- 115c in the first through third threads 240a-240c, respectively, before the first CPU 115a in the first thread 240a can begin image- processing or rendering the HDR rendering pass 220. Similarly, because final pass 235 is dependent on the HDR rendering pass 220, the UI pass 230, and the blur/sharpening pass 225 (depicted in Fig. 2B by the final pass 235 being connected by arrows from HDR rendering pass 220, from UI pass 230, and from blur/ sharpening pass 225), all three of these passes must be image-processed or rendered by the first through third CPUs 115a-l 15c in the first through third threads 240a-240c, respectively, before the first CPU 115a in the first thread 240a can begin image- processing or rendering the final pass 235.

[0076] In this manner, further efficiencies may be achieved. Although Fig. 2 depicts render passes as the split points in the image (or image data) and depicts particular render passes, the various embodiments are not so limited, and the split points that may be identified may include, but are not limited to, at least one of one or more frames, one or more render layers, one or more render passes, or one or more draw calls, and/or the like, as described above with respect to Fig. 1, and any suitable ones of these types of split points may be used, as necessary or as desired.

[0077] Figs. 3A-3C (collectively, "Fig. 3") depict various examples of the implantation of parallelization of OpenGL rendering pipelines. Fig. 3 A is a schematic block flow diagram illustrating a non-limiting example 300 of the use of shared contexts during implementation of parallelization of OpenGL rendering pipelines, in accordance with various embodiments. Figs. 3B and 3C are schematic block flow diagrams 300' and 300" that together illustrate a non-limiting example of the use of synchronization mechanisms during implementation of parallelization of OpenGL rendering pipelines, in accordance with various embodiments.

[0078] Because render states are not shared between contexts in OpenGL, a context referring to, e.g., an object that stores or holds all of the states associated with an instance of OpenGL, and may be represented, e.g., by a framebuffer that rendering commands will draw to when not drawing to a framebuffer object. Each context has its own set of OpenGL objects, which are independent from those of other contexts.

A context's objects may be shared with other contexts, which must be made explicitly, either as the context is created or before a newly created context creates any objects. Also, although OpenGL allows having multiple contexts, only one can be active on a given thread. In order to multi-thread, multiple contexts are needed. The contexts may be shared as described above, meaning that they will share most of the data and/or resources. The various embodiments take advantage of the shared resources between contexts in implementing parallelized rendering pipelines for generating and outputting rendered images.

[0079] Referring to Fig. 3 A, for an application using one context, the computing system (such as the computing systems of Figs. 1 and 2, or the like) may detect that the frame can be split into three parts. In such a case, the computing system may create shared contexts for every thread used, and then offload or push work from the first thread 305a to each of the other threads (i.e., second thread 305b and third thread 305c). For instance, the computing system may create "Context 1()" 310 for the first thread 305a, and may thereafter create a shared context for each of the second through N 111 threads (in this case, "Context 1.1()" 315 for the second thread 305b and "Context 1.2(3" 320 for the third thread 305c), based on the original context (i.e., "Context 1()" 310). This is depicted in Fig. 3 A by dash-lined arrows from "Context 1()" 310 to each shared context, i.e., "Context 1.1()" 315 and "Context 1.2Q" 320.

[0080] As shown in Fig. 3 A, "Part_l()" 325 may be split into "Part_l()" 325a being image-processed or rendered by first CPU (e.g., CPU 115a of Fig. 1, or the like) in the first thread 305a and "Part_l()" 325b being image-processed or rendered by second CPU (e.g., CPU 115b of Fig. 1, or the like) in the second thread 305b. Similarly, "Part_2()" 330 may be split into "Part_2()" 330a being image-processed or rendered by first CPU in the first thread 305a and "Part_2()" 330b being image- processed or rendered by third CPU (e.g., CPU 115c of Fig. 1, or the like) in the third thread 305c. Likewise, "Part_3()" 335 may be split into "Part_3()" 335a being image- processed or rendered by the first CPU in the first thread 305a and "Part_3()" 335b being image-processed or rendered by the second CPU in the second thread 305b.

The shared contexts 310-320 allow the application to be image-processed or rendered faster than conventional approaches, as the use of shared contexts in the manner described obviates any need to wait for all the previous parts to be completed before starting the next part.

[0081] Splitting a frame between multiple contexts can take advantage of multiple CPU cores and GPU(s), while also further parallelizing the rendering work. Furthermore, the computing system can offload some rendering work outside of the current GPU as it uses a different context. This will enable the user device to have a better overall power consumption.

[0082] Although a particular number of threads, a particular number of shared contexts, and a particular implementation of split parts is shown and described with respect to Fig. 3 A, the various embodiments are not so limited, and any suitable number of threads (and a corresponding number of shared contexts) may be used (subject to the total number of CPU cores available on the user device, as described above), and any suitable implementation of split parts (whether the parts are frames, render layers, render passes, draw calls, or the like) may be used consistent with the various embodiments described herein.

[0083] With reference to Figs. 3B and 3C, as split points (e.g., frames, render layers, render passes, draw calls, or the like) may be spread over several threads, a synchronization mechanism may be used to ensure the correct order if there are any dependencies between the passes. Conventional OpenGL does not include such synchronization mechanisms as parallelism is not currently supported. In conventional OpenGL, contexts are supposed to work independently without any dependencies, and the order of execution on a GPU will follow the order of execution on the CPU cores. In the case that there are dependencies among split points, in conventional OpenGL, CPU synchronization would be needed to ensure that the order is correct, which will defeat the purpose of having multiple threads. Unfortunately, the conventional GPU fence commands are meant to synchronize between GPU and CPU, which means that the CPU will wait until the GPU reaches the fence, which is not ideal for the parallelization of rendering pipelines embodiments described herein. Also, synchronization between threads consumes CPU usage.

[0084] To avoid synchronization on the CPU-side, the computing system may use a multi-context GPU synchronization mechanism (which is different from the conventional CPU synchronization mechanisms or the conventional GPU fence commands, or the like). The multi-context GPU synchronization mechanism, in accordance with the various embodiments, allows work to be sent to the GPU from multiple contexts without CPU thread synchronization. In such a case, the order will be enforced by the new GPU fence command. Figs. 3B and 3C are illustrative.

[0085] As shown in Fig. 3B, "Draw_l()" call 345a is image-processed or rendered by a first CPU core (e.g., CPU core 115a of Fig. 1, or the like) in a first thread 305a, while "Draw_2()" call 350a is image-processed or rendered by a second CPU core (e.g., CPU core 115b of Fig. 1, or the like) in a second thread 305b. Once each of "Draw_l()" call 345a and "Draw_2()" call 350a have been image-processed or rendered as "Draw_l()" call 345b and "Draw_2()" call 350b by their respective CPU cores in corresponding threads 305a and 305b, "Draw_l()" call 345b and "Draw_2()" call 350b are sent to GPU 340 for final image-processing or rendering. However, if a particular order is required due to dependencies or the like, errors may occur.

[0086] With reference to Fig. 3C, a "Wait Fence 1" command 355 and a "Signal Fence 1" command 360 are used on "Draw_l()" call 345a by using the "Wait Fence 1" command 355 before or in conjunction with image-processing or rendering of "Draw_l()" call 345a, which causes release or sending of image-processed or rendered "Draw_l()" call 345b to be held or paused until a release signal (i.e., "Signal Fence 1") command 360) has been issued. In the example of Fig. 3C, such a release signal is used after image-processing or rendering of "Draw_2()" call 350a, which causes release or sending of image-processed or rendered "Draw_2()" call 350b to the GPU 340 before release or sending of image-processed or rendered "Draw_l()" call 345b (depicted in Fig. 3C by "Draw_l()" call 345b being shown below "Draw_2()" call 350b in the GPU 340). That is, the "Fence 1" command(s) is used to ensure that the "Draw_l()" call 345b is executed after "Draw_2()" call 350b.

[0087] Although a particular number of threads, a particular number of shared contexts, and a particular implementation of draw calls is shown and described with respect to Figs. 3B and 3C, the various embodiments are not so limited, and any suitable number of threads (and a corresponding number of shared contexts) may be used (subject to the total number of CPU cores available on the user device, as described above), and any suitable implementation of draw calls may be used consistent with the various embodiments described herein.

[0088] Figs. 4A-4D (collectively, "Fig. 4") are flow diagrams illustrating a method 400 for implementing parallelization of OpenGL rendering pipelines, in accordance with various embodiments. Method 400 of Fig. 4A continues onto Fig.

4B following the circular marker denoted, "A," and returns to Fig. 4A following the circular marked denoted, "B."

[0089] While the techniques and procedures are depicted and/or described in a particular order for purposes of illustration, it should be appreciated that some procedures may be reordered and/or omitted within the scope of various embodiments. Moreover, while the method 400 illustrated by Fig. 4 can be implemented by or with (and, in some cases, are described below with respect to) the systems, examples, or embodiments 100, 200, 200', 300, and 300' of Figs. 1, 2 A, 2B, 3A, and 3B-3C, respectively (or components thereof), such methods may also be implemented using any suitable hardware (or software) implementation. Similarly, while each of the systems, examples, or embodiments 100, 200, 200', 300, and 300' of Figs. 1, 2A, 2B, 3A, and 3B-3C, respectively (or components thereof), can operate according to the method 400 illustrated by Fig. 4 (e.g., by executing instructions embodied on a computer readable medium), the systems, examples, or embodiments 100, 200, 200', 300, and 300' of Figs. 1, 2A, 2B, 3A, and 3B-3C can each also operate according to other modes of operation and/or perform other suitable procedures.

[0090] In the non-limiting embodiment of Fig. 4A, method 400, at block 405, may comprise receiving, using a computing system on a user device, a request to render an image. In some embodiments, the computing system may include, without limitation, at least one of a graphics engine, a graphics rendering engine, a game engine, a three- dimensional ("3D") game engine, a processor on the user device, or at least one CPU core on the user device, and/or the like. In some instances, the user device may include, but is not limited to, one of a portable gaming device, a smart phone, a tablet computer, a laptop computer, a desktop computer, or a server computer, and/or the like. In some cases, the image may include, without limitation, one of a two- dimensional ("2D") image, a three-dimensional ("3D") image, a video, a 2D frame of a video, a 3D frame of a video, a group of 2D images, or a group of 3D images, and/or the like.

[0091] At block 410, method 400 may comprise, in response to receiving the request to render the image, analyzing, using the computing system, the image to identify a plurality of split points (i.e., two, three, or more split points, etc.) in the image. According to some embodiments, the plurality of split points may include, but is not limited to, at least one of one or more frames, one or more render layers, one or more render passes, or one or more draw calls, and/or the like. In some cases, the one or more frames may include, without limitation, frames in a video, or the like. In some instances, the one or more render layers may include, but are not limited to, one or more layers corresponding to different objects in the image, or the like. In some cases, the one or more render passes may include, without limitation, at least one of one or more shadow passes, one or more lighting passes, one or more main color passes, one or more secondary color passes, one or more post-processing passes (e.g., one or more blur passes, one or more sharpening passes, one or more blur/sharpening passes, one or more filtering passes, etc.), one or more highlight passes, one or more reflection passes, or one or more user interface ("UI") passes, and/or the like. In some instances, the one or more draw calls may include, but are not limited to, one or more calls to a graphics application programming interface ("API") to draw one or more objects, or the like.

[0092] Method 400 may further comprise, at block 415, identifying, using the computing system, any dependencies among two or more split points among the plurality of split points. Method 400 may further comprise determining, using the computing system, a multi -threaded solution to perform parallel rendering of the image based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points (block 420).

[0093] Method 400, at block 425, may comprise pushing, using the computing system, each group of split points among a plurality of groups of split points (i.e., two, three, or more groups of split points, etc.) to a corresponding thread among a plurality of threads (i.e., two, three, or more threads, etc.), based on the determined multi threaded solution. Method 400 either may continue onto the process at block 430 or may continue onto the process at block 450 in Fig. 4B following the circular marker denoted, "A." At block 450 in Fig. 4B (following the circular marker denoted, "A"), method 400 may comprise creating, using the computing system, a shared context for each of second through N th threads among the plurality of threads (which may comprise a first through N th thread), based on the original context. Method 400 may then return to the process at block 430 in Fig. 4 A following the circular marker denoted, "B."

[0094] At block 430 in Fig. 4 A (either continuing directly from the process at block 425 or following the circular marker denoted, "B," from the process at block 450 in Fig. 4B), method 400 may comprise rendering, using each CPU core among a plurality of CPU cores (i.e., two, three, or more CPU cores, etc.) on the user device, one group of split points among the plurality of groups of split points, each concurrent with the rendering of the other groups of split points by other CPU cores among the plurality of CPU cores.

[0095] Method 400 may further comprise, at block 435, utilizing, using the computing system, one or more graphics processing unit ("GPU") synchronization mechanisms to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU. In some embodiments, the one or more GPU synchronization mechanisms may include, without limitation, one of one or more GPU fence commands or one or more delay commands, and/or the like. [0096] Method 400, at block 440, may comprise rendering, using the GPU on the user device, data sent from each of the plurality of CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image. Method 400 may further comprise outputting, using the GPU, the rendered image (block 445). In this manner, parallelizing rendering pipelines may be achieved. In some embodiments, the rendering pipelines may include, without limitation, Open Graphics Library ("OpenGL") cross-language, cross-platform API-based rendering pipelines, or the like. In some embodiments, method 400 may return to the process at block 405 and the processes at blocks 405-445 may be repeated, as necessary or as desired, until all images have been image-processed or rendered.

[0097] With reference to the non-limiting example of Fig. 4C, the plurality of split points may include one or more render passes. In such cases, identifying the plurality of split points in the image (at block 410) may comprise detecting, using the computing system, occurrence of framebuffer resets, each framebuffer reset being indicative of a beginning of a new render pass (block 455). [0098] Referring to the non-limiting example of Fig. 4D, identifying any dependencies among two or more split points among the plurality of split points (at block 415) may comprise tracking, using the computing system, all dependencies among the plurality of split points (block 460). According to some embodiments, tracking all dependencies among the plurality of split points (at block 460) may comprise tracking, using the computing system, all dependencies among the plurality of split points by identifying and storing all resources used to render the image, and by determining and storing which resources are used by which split points (block 465).

[0099] Examples of System and Hardware Implementation [0100] Fig. 5 is a block diagram illustrating an example of computer or system hardware architecture, in accordance with various embodiments. Fig. 5 provides a schematic illustration of one embodiment of a computer system 500 of the service provider system hardware that can perform the methods provided by various other embodiments, as described herein, and/or can perform the functions of computer or hardware system (i.e., user device 105, computing system 110, display screen 140, audio playback device 145, content source(s) 150, content distribution system 175, and display devices 185a-185n, etc.), as described above. It should be noted that Fig. 5 is meant only to provide a generalized illustration of various components, of which one or more (or none) of each may be utilized as appropriate. Fig. 5, therefore, broadly illustrates how individual system elements may be implemented in a relatively separated or relatively more integrated manner.

[0101] The computer or hardware system 500 - which might represent an embodiment of the computer or hardware system (i.e., user device 105, computing system 110, display screen 140, audio playback device 145, content source(s) 150, content distribution system 175, and display devices 185a-185n, etc.), described above with respect to Figs. 1-4 - is shown comprising hardware elements that can be electrically coupled via a bus 505 (or may otherwise be in communication, as appropriate). The hardware elements may include one or more processors 510, including, without limitation, one or more general-purpose processors and/or one or more special-purpose processors (such as microprocessors, digital signal processing chips, graphics acceleration processors, and/or the like); one or more input devices 515, which can include, without limitation, a mouse, a keyboard, and/or the like; and one or more output devices 520, which can include, without limitation, a display device, a printer, and/or the like.

[0102] The computer or hardware system 500 may further include (and/or be in communication with) one or more storage devices 525, which can comprise, without limitation, local and/or network accessible storage, and/or can include, without limitation, a disk drive, a drive array, an optical storage device, solid-state storage device such as a random access memory ("RAM") and/or a read-only memory ("ROM"), which can be programmable, flash-updateable, and/or the like. Such storage devices may be configured to implement any appropriate data stores, including, without limitation, various file systems, database structures, and/or the like. [0103] The computer or hardware system 500 might also include a communications subsystem 530, which can include, without limitation, a modem, a network card (wireless or wired), an infra-red communication device, a wireless communication device and/or chipset (such as a Bluetooth™ device, an 802.11 device, a WiFi device, a WiMax device, a WWAN device, cellular communication facilities, etc.), and/or the like. The communications subsystem 530 may permit data to be exchanged with a network (such as the network described below, to name one example), with other computer or hardware systems, and/or with any other devices described herein. In many embodiments, the computer or hardware system 500 will further comprise a working memory 535, which can include a RAM or ROM device, as described above.

[0104] The computer or hardware system 500 also may comprise software elements, shown as being currently located within the working memory 535, including an operating system 540, device drivers, executable libraries, and/or other code, such as one or more application programs 545, which may comprise computer programs provided by various embodiments (including, without limitation, hypervisors, VMs, and the like), and/or may be designed to implement methods, and/or configure systems, provided by other embodiments, as described herein. Merely by way of example, one or more procedures described with respect to the method(s) discussed above might be implemented as code and/or instructions executable by a computer (and/or a processor within a computer); in an aspect, then, such code and/or instructions can be used to configure and/or adapt a general purpose computer (or other device) to perform one or more operations in accordance with the described methods.

[0105] A set of these instructions and/or code might be encoded and/or stored on a non-transitory computer readable storage medium, such as the storage device(s) 525 described above. In some cases, the storage medium might be incorporated within a computer system, such as the system 500. In other embodiments, the storage medium might be separate from a computer system (i.e., a removable medium, such as a compact disc, etc.), and/or provided in an installation package, such that the storage medium can be used to program, configure, and/or adapt a general purpose computer with the instructions/code stored thereon. These instructions might take the form of executable code, which is executable by the computer or hardware system 500 and/or might take the form of source and/or installable code, which, upon compilation and/or installation on the computer or hardware system 500 (e.g., using any of a variety of generally available compilers, installation programs, compression/decompression utilities, etc.) then takes the form of executable code.

[0106] It will be apparent to those skilled in the art that substantial variations may be made in accordance with particular requirements. For example, customized hardware (such as programmable logic controllers, field-programmable gate arrays, application-specific integrated circuits, and/or the like) might also be used, and/or particular elements might be implemented in hardware, software (including portable software, such as applets, etc.), or both. Further, connection to other computing devices such as network input/output devices may be employed.

[0107] As mentioned above, in one aspect, some embodiments may employ a computer or hardware system (such as the computer or hardware system 500) to perform methods in accordance with various embodiments of the invention.

According to a set of embodiments, some or all of the procedures of such methods are performed by the computer or hardware system 500 in response to processor 510 executing one or more sequences of one or more instructions (which might be incorporated into the operating system 540 and/or other code, such as an application program 545) contained in the working memory 535. Such instructions may be read into the working memory 535 from another computer readable medium, such as one or more of the storage device(s) 525. Merely by way of example, execution of the sequences of instructions contained in the working memory 535 might cause the processor(s) 510 to perform one or more procedures of the methods described herein. [0108] The terms "machine readable medium" and "computer readable medium," as used herein, refer to any medium that participates in providing data that causes a machine to operate in some fashion. In an embodiment implemented using the computer or hardware system 500, various computer readable media might be involved in providing instructions/code to processor(s) 510 for execution and/or might be used to store and/or carry such instructions/code (e.g., as signals). In many implementations, a computer readable medium is a non-transitory, physical, and/or tangible storage medium. In some embodiments, a computer readable medium may take many forms, including, but not limited to, non-volatile media, volatile media, or the like. Non-volatile media includes, for example, optical and/or magnetic disks, such as the storage device(s) 525. Volatile media includes, without limitation, dynamic memory, such as the working memory 535. In some alternative embodiments, a computer readable medium may take the form of transmission media, which includes, without limitation, coaxial cables, copper wire, and fiber optics, including the wires that comprise the bus 505, as well as the various components of the communication subsystem 530 (and/or the media by which the communications subsystem 530 provides communication with other devices). In an alternative set of embodiments, transmission media can also take the form of waves (including without limitation radio, acoustic, and/or light waves, such as those generated during radio wave and infra-red data communications).

[0109] Common forms of physical and/or tangible computer readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read instructions and/or code.

[0110] Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the processor(s) 510 for execution. Merely by way of example, the instructions may initially be carried on a magnetic disk and/or optical disc of a remote computer. A remote computer might load the instructions into its dynamic memory and send the instructions as signals over a transmission medium to be received and/or executed by the computer or hardware system 500. These signals, which might be in the form of electromagnetic signals, acoustic signals, optical signals, and/or the like, are all examples of carrier waves on which instructions can be encoded, in accordance with various embodiments of the invention.

[0111] The communications subsystem 530 (and/or components thereof) generally will receive the signals, and the bus 505 then might carry the signals (and/or the data, instructions, etc. carried by the signals) to the working memory 535, from which the processor(s) 505 retrieves and executes the instructions. The instructions received by the working memory 535 may optionally be stored on a storage device 525 either before or after execution by the processor(s) 510.

[0112] As noted above, a set of embodiments comprises methods and systems for implementing two-dimensional ("2D") and/or three-dimensional ("3D") rendering, and, more particularly, to methods, systems, and apparatuses for implementing parallelization of OpenGL rendering pipelines. Fig. 6 illustrates a schematic diagram of a system 600 that can be used in accordance with one set of embodiments. The system 600 can include one or more user computers, user devices, or customer devices 605. A user computer, user device, or customer device 605 can be a general purpose personal computer (including, merely by way of example, desktop computers, tablet computers, laptop computers, handheld computers, and the like, running any appropriate operating system, several of which are available from vendors such as Apple, Microsoft Corp., and the like), cloud computing devices, a server(s), and/or a workstation computer(s) running any of a variety of commercially- available UNIX™ or UNIX-like operating systems. A user computer, user device, or customer device 605 can also have any of a variety of applications, including one or more applications configured to perform methods provided by various embodiments (as described above, for example), as well as one or more office applications, database client and/or server applications, and/or web browser applications. Alternatively, a user computer, user device, or customer device 605 can be any other electronic device, such as a thin-client computer, Internet-enabled mobile telephone, and/or personal digital assistant, capable of communicating via a network ( e.g ., the network(s) 610 described below) and/or of displaying and navigating web pages or other types of electronic documents. Although the system 600 is shown with two user computers, user devices, or customer devices 605, any number of user computers, user devices, or customer devices can be supported.

[0113] Some embodiments operate in a networked environment, which can include a network(s) 610. The network(s) 610 can be any type of network familiar to those skilled in the art that can support data communications using any of a variety of commercially-available (and/or free or proprietary) protocols, including, without limitation, TCP/IP, SNA™, IPX™, AppleTalk™, and the like. Merely by way of example, the network(s) 610 (similar to network(s) 160 of Fig. 1, or the like) can each include a local area network ("LAN"), including, without limitation, a fiber network, an Ethernet network, a Token-Ring™ network, and/or the like; a wide-area network ("WAN"); a wireless wide area network ("WWAN"); a virtual network, such as a virtual private network ("VPN"); the Internet; an intranet; an extranet; a public switched telephone network ("PSTN"); an infra-red network; a wireless network, including, without limitation, a network operating under any of the IEEE 802.11 suite of protocols, the Bluetooth™ protocol known in the art, and/or any other wireless protocol; and/or any combination of these and/or other networks. In a particular embodiment, the network might include an access network of the service provider (e.g., an Internet service provider ("ISP")). In another embodiment, the network might include a core network of the service provider, and/or the Internet.

[0114] Embodiments can also include one or more server computers 615. Each of the server computers 615 may be configured with an operating system, including, without limitation, any of those discussed above, as well as any commercially (or freely) available server operating systems. Each of the servers 615 may also be running one or more applications, which can be configured to provide services to one or more clients 605 and/or other servers 615.

[0115] Merely by way of example, one of the servers 615 might be a data server, a web server, a cloud computing device(s), or the like, as described above. The data server might include (or be in communication with) a web server, which can be used, merely by way of example, to process requests for web pages or other electronic documents from user computers 605. The web server can also run a variety of server applications, including HTTP servers, FTP servers, CGI servers, database servers,

Java servers, and the like. In some embodiments of the invention, the web server may be configured to serve web pages that can be operated within a web browser on one or more of the user computers 605 to perform methods of the invention.

[0116] The server computers 615, in some embodiments, might include one or more application servers, which can be configured with one or more applications accessible by a client running on one or more of the client computers 605 and/or other servers 615. Merely by way of example, the server(s) 615 can be one or more general purpose computers capable of executing programs or scripts in response to the user computers 605 and/or other servers 615, including, without limitation, web applications (which might, in some cases, be configured to perform methods provided by various embodiments). Merely by way of example, a web application can be implemented as one or more scripts or programs written in any suitable programming language, such as Java™, C, C#™ or C++, and/or any scripting language, such as Perl, Python, or TCL, as well as combinations of any programming and/or scripting languages. The application server(s) can also include database servers, including, without limitation, those commercially available from Oracle™, Microsoft™,

Sybase™, IBM™, and the like, which can process requests from clients (including, depending on the configuration, dedicated database clients, API clients, web browsers, etc.) running on a user computer, user device, or customer device 605 and/or another server 615. In some embodiments, an application server can perform one or more of the processes for implementing 2D and/or 3D rendering, and, more particularly, to methods, systems, and apparatuses for implementing parallelization of OpenGL rendering pipelines, as described in detail above. Data provided by an application server may be formatted as one or more web pages (comprising HTML, JavaScript, etc., for example) and/or may be forwarded to a user computer 605 via a web server (as described above, for example). Similarly, a web server might receive web page requests and/or input data from a user computer 605 and/or forward the web page requests and/or input data to an application server. In some cases, a web server may be integrated with an application server.

[0117] In accordance with further embodiments, one or more servers 615 can function as a file server and/or can include one or more of the files (e.g., application code, data files, etc.) necessary to implement various disclosed methods, incorporated by an application running on a user computer 605 and/or another server 615. Alternatively, as those skilled in the art will appreciate, a file server can include all necessary files, allowing such an application to be invoked remotely by a user computer, user device, or customer device 605 and/or server 615.

[0118] It should be noted that the functions described with respect to various servers herein (e.g., application server, database server, web server, file server, etc.) can be performed by a single server and/or a plurality of specialized servers, depending on implementation-specific needs and parameters.

[0119] In some embodiments, the system can include one or more databases 620a- 620n (collectively, "databases 620"). The location of each of the databases 620 is discretionary: merely by way of example, a database 620a might reside on a storage medium local to (and/or resident in) a server 615a (and/or a user computer, user device, or customer device 605). Alternatively, a database 620n can be remote from any or all of the computers 605, 615, so long as it can be in communication (e.g., via the network 610) with one or more of these. In a particular set of embodiments, a database 620 can reside in a storage-area network ("SAN") familiar to those skilled in the art. (Likewise, any necessary files for performing the functions attributed to the computers 605, 615 can be stored locally on the respective computer and/or remotely, as appropriate.) In one set of embodiments, the database 620 can be a relational database, such as an Oracle database, that is adapted to store, update, and retrieve data in response to SQL-formatted commands. The database might be controlled and/or maintained by a database server, as described above, for example.

[0120] According to some embodiments, system 600 might further comprise computing system(s) 625 (similar to computing system 110 of Fig. 1, or the like) that is disposed in user device 605. The computing system(s) 625 may comprise a plurality of central processing unit ("CPU") cores 630a-630n (collectively, "CPU cores 630" or the like; similar to CPU cores 115a-l 15n of Fig. 1, or the like), a graphics processing unit ("GPU") 635 (similar to GPU 120 of Fig. 1, or the like), and/or other processor(s) 640 (similar to other processor(s) 125 of Fig. 1, or the like). The user device 605 may further comprise data storage device 645 (similar to data storage device 130 of Fig. 1, or the like), communications system 650 (similar to communications system 135 of Fig. 1, or the like), display screen 655 (optional; similar to display screen 140 of Fig. 1, or the like), and audio playback device 660 (optional; similar to audio playback device 145 of Fig. 1, or the like). System 600 may further comprise one or more content sources 665 and corresponding database(s) 670 (similar to content source(s) 150 and corresponding database(s) 155 of Fig. 1, or the like) and, in some cases, one or more content distribution systems 675 and corresponding database(s) 680 (similar to content distribution system(s) 175 and corresponding database(s) 180 of Fig. 1, or the like).

[0121] In operation, computing system(s) 625, at least one CPU core 630, GPU 635, and/or at least one other processor 640 of user device 605 (collectively, "computing system) may receive a request to render an image. In response to receiving the request to render the image, the computing system may (receive and) analyze the image to identify a plurality of split points in the image.

[0122] According to some embodiments, the image may include, but is not limited to, one of a two-dimensional ("2D") image, a three-dimensional ("3D") image, a video, a 2D frame of a video, a 3D frame of a video, a group of 2D images, or a group of 3D images, and/or the like. In some embodiments, the plurality of split points may include, but is not limited to, at least one of one or more frames, one or more render layers, one or more render passes, or one or more draw calls, and/or the like. In some cases, the one or more frames may include, without limitation, frames in a video, or the like. In some instances, the one or more render layers may include, but are not limited to, one or more layers corresponding to different objects in the image, or the like. In some cases, the one or more render passes may include, without limitation, at least one of one or more shadow passes, one or more lighting passes, one or more main color passes, one or more secondary color passes, one or more post processing passes, one or more highlight passes, one or more reflection passes, or one or more user interface ("UI") passes, and/or the like. In some instances, the one or more draw calls may include, but are not limited to, one or more calls to a graphics application programming interface ("API") to draw one or more objects, or the like.

In some embodiments, the plurality of split points may include one or more render passes, where identifying the plurality of split points in the image may comprise the computing system detecting occurrence of framebuffer resets, each framebuffer reset being indicative of a beginning of a new render pass.

[0123] The computing system may identify any dependencies among two or more split points among the plurality of split points. In some cases, identifying any dependencies among two or more split points among the plurality of split points may comprise the computing system tracking all dependencies among the plurality of split points. In some instances, tracking all dependencies among the plurality of split points may comprise the computing system tracking all dependencies among the plurality of split points by identifying and storing all resources used to render the image, and by determining and storing which resources are used by which split points, and/or the like.

[0124] The computing system may determine a multi-threaded solution to perform parallel rendering of the image based at least in part on the identified plurality of split points and the identified dependencies among the two or more split points. The computing system may push (or otherwise send or transfer) each group of split points among a plurality of groups of split points (i.e., two, three, or more groups of split points, etc.) to a corresponding thread among a plurality of threads (i.e., two, three, or more threads, etc.), based on the determined multi -threaded solution.

[0125] Each CPU among the plurality of CPUs 630 may render one group of split points among the plurality of groups of split points, each concurrent with the rendering of the other groups of split points by other CPU cores among the plurality of CPU cores 630.

[0126] According to some embodiments, the computing system may utilize one or more GPU synchronization mechanisms to enforce an order of rendering by a GPU by controlling timing with which each CPU core sends rendered data to the GPU. In some embodiments, the one or more GPU synchronization mechanisms may include, without limitation, one of one or more GPU fence commands or one or more delay commands, and/or the like.

[0127] The GPU may render data sent from each of the plurality of CPU cores, in the order enforced by the one or more GPU synchronization mechanisms, to generate the rendered image. The GPU may subsequently output the rendered image. In this manner, parallelizing rendering pipelines may be achieved. In some embodiments, the rendering pipelines may include, without limitation, Open Graphics Library ("OpenGL") cross-language, cross-platform API-based rendering pipelines, or the like.

[0128] These and other functions of the system 600 (and its components) are described in greater detail above with respect to Figs. 1-4.

[0129] While particular features and aspects have been described with respect to some embodiments, one skilled in the art will recognize that numerous modifications are possible. For example, the methods and processes described herein may be implemented using hardware components, software components, and/or any combination thereof. Further, while various methods and processes described herein may be described with respect to particular structural and/or functional components for ease of description, methods provided by various embodiments are not limited to any particular structural and/or functional architecture but instead can be implemented on any suitable hardware, firmware and/or software configuration. Similarly, while particular functionality is ascribed to particular system components, unless the context dictates otherwise, this functionality need not be limited to such and can be distributed among various other system components in accordance with the several embodiments. [0130] Moreover, while the procedures of the methods and processes described herein are described in a particular order for ease of description, unless the context dictates otherwise, various procedures may be reordered, added, and/or omitted in accordance with various embodiments. Moreover, the procedures described with respect to one method or process may be incorporated within other described methods or processes; likewise, system components described according to a particular structural architecture and/or with respect to one system may be organized in alternative structural architectures and/or incorporated within other described systems. Hence, while various embodiments are described with — or without — particular features for ease of description and to illustrate some aspects of those embodiments, the various components and/or features described herein with respect to a particular embodiment can be substituted, added and/or subtracted from among other described embodiments, unless the context dictates otherwise. Consequently, although several embodiments are described above, it will be appreciated that the invention is intended to cover all modifications and equivalents within the scope of the following claims.