IGCC: A REMOTE VIRTUALIZATION ARCHITECTURE FOR DISTRIBUTED GPU RESOURCES OVER THE INTERNET

Title:

IGCC: A REMOTE VIRTUALIZATION ARCHITECTURE FOR DISTRIBUTED GPU RESOURCES OVER THE INTERNET

Document Type and Number:

WIPO Patent Application WO/2023/219554

Kind Code:

Abstract:

A remote GPU virtualization architecture over the internet is disclosed, which enables afront-end client to be deployed with or without any GPU processor by leveraging theback-end distributed GPU servers with physical GPU processors remotely. This caneffectively solve major HPC facilities' limitations such as high acquisition and maintenancecosts, high power consumption during idle, and performance-to-power ratio without anysource code changes. Abstraction of wrapper stub-libraries APIs are the main communicatorfor the distribution of GPU resources between the front-end client and back-end. The requestparameters will be transmitted via the Internet to the distributed back-end GPU servers uponauthentication and configuration from the front-end relay server. Requests execution andmapping of memory pointers with direct access to the GPU driver when the wrapper API onthe back-end GPU servers intercepts the requests. The output will finally transmit to theclient over the network.

More Like This:

JP2001034489	METHOD FOR EXECUTING AND CONTROLLING OBJECT-ORIENTED PROGRAM EXECUTION ENVIRONMENT
JP5254943	Batch processing system
JPS5369564	METHOD OF AND DEVICE FOR COMMUNICATING IN MULTIPROCESSOR

Inventors:

ZHU HONGBING (SG)

Application Number:

PCT/SG2022/050303

Publication Date:

November 16, 2023

Filing Date:

May 10, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

ZHU HONGBING (SG)

International Classes:

G06F15/16; G06F9/455; H04L67/10

Foreign References:

US20210058470A1	2021-02-25
US20220050699A1	2022-02-17
US20220138192A1	2022-05-05

Other References:

KEGEL PHILIPP; STEUWER MICHEL; GORLATCH SERGEI: "dOpenCL: Towards uniform programming of distributed heterogeneous multi-/many-core systems", JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING., ELSEVIER, AMSTERDAM., NL, vol. 73, no. 12, 14 August 2013 (2013-08-14), NL , pages 1639 - 1648, XP028738002, ISSN: 0743-7315, DOI: 10.1016/j.jpdc.2013.07.021
vol. 69, 6 May 2014, SPRINGER INTERNATIONAL PUBLISHING, article LACCETTI GIULIANO; MONTELLA RAFFAELE; PALMIERI CARLO; PELLICCIA VALENTINA: "The High Performance Internet of Things: Using GVirtuS to Share High-End GPUs with ARM Based Cluster Computing Nodes", pages: 734 - 744, XP047646002, DOI: 10.1007/978-3-642-55224-3_69

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. 1. A method of remote visualization of GPU resources over the Internet, comprising:- a front-end client with or without GPU processors accesses all the GPU resources remotely via the Internet from back-end distributed GPU servers as if local GPU arrays to redistribute; back-end distributed GPU server arrays equipped with physical GPU processors ready to be leveraged by incoming requests concurrently; a front-end stub driver API wrapper to provide abstraction methods to the GPU-accelerated application and handle communication; a back-end stub driver API wrapper to receive front-end requests, process GPU routine, and handle communication; and a front-end relay server to authenticate the front-end client and back-end server and provide configuration files in JSON format.

2. The method according to claim 1, wherein the common communication protocol between the front-end and back-end is TCP/IP, and a gigabit Internet connection is mandatory as the medium.

3. The method according to claim 1, wherein the stub driver API wrapper for front-end and back-end are implemented as a collection of C++ classes.

4. The method according to claim 1, wherein the stub driver API wrapper is transparent in the front-end which requires no modifications in application source code for the developers.

5. The front-end client remote virtualization method of any one of the preceding claims comprises the steps of:- a kernel module that wraps the driver API and provides abstraction methods to the GPU-accelerated application; authentication from the front-end relay server to obtain configuration files in JSON format; stub module intercepts and collects request parameters upon callback received;

SUBSTITUTE SHEET (RULE 26) fill up the input buffer before establishing a connection to the back-end GPU servers by the handler; remote procedure output data will be obtained and handled; and exit code returned by the back-end. The backend-end distributed GPU server remote virtualization method of any one of the preceding claims comprising the steps:- pre-load common libraries to serve multiple clients; a request handler waits and serves incoming connections and application request from the front-end by spawning threads for each process; handled parameters will undergo a parameter check before being executed;

GPU routines and functions are accurately mapped and handled appropriately to achieve output; and forward output and exit code to the front-end upon completion. The method of any one of the preceding claims, wherein the front-end and back-end servers support all combinations of both x86_64 and ARM architectures with the flexibility of heterogeneous architecture. The method of any one of the preceding claims, wherein the back-end distributed GPU servers are capable to execute proper job migration and load balancing to maximize efficiency. The method of any one of the preceding claims, wherein the GPU-accelerated applications consist of all categories of accelerated applications such as simulation, rendering, imaging, and calculation applications. The accelerated applications comprise PyTorch, TensorFlow, CUDASW++, Caffe, Autodesk - 3ds Max, and MpiBLAST.

SUBSTITUTE SHEET (RULE 26)

Description:

iGCC: A REMOTE VIRTUALIZATION ARCHITECTURE FOR DISTRIBUTED

GPU RESOURCES OVER THE INTERNET

FIELD ON THE INVENTION

[0001] This invention is to realize an overall architecture of remote visualization of GPU resources for GPU acceleration applications over the Internet. Clients on the front-end with or without a physical GPU can now leverage and distribute the physical GPU resources from the remote back-end distributed GPU servers over the Internet with proper authentication and configuration from a front-end relay server.

BACKGROUND

[0002] The rising demand for GPU acceleration computing has been rapidly evolving the vision of computationally intensive industries such as biotechnology, finance, physics, and the imaging industry, due to its significant ability of parallel executions, which are effective towards high execution time, and large iteration equations. The present acceleration method has taken a shared-nothing approach where clusters are equipped and will only compute with their physically attached GPUs. However, such an approach presents drawbacks such as high acquisition and maintenance costs, high power consumption during idle, and low utilization rate.

[0003] The current limitations require a novel remote virtualization architecture where the front-end client is not bounded by physical GPU and the ability to leverage theoretically unlimited GPU resources from GPU acceleration servers over the internet via TCP/IP when hardware resources such as the PCIe bandwidth are not the bottleneck. The proposed remote virtualization solution maximizes the overall GPU utilization by sharing GPU processors within the back-end acceleration server clusters concurrently, thus effectively minimizing the limitations above to achieve sharing of GPU resources concurrently.

SUMMARY OF INVENTION

[0004] A remote GPU Virtualization over the Internet is presented, and the framework is composed of server-client architecture written in C++. The front-end client is deployed with or without any GPU processor whereas the back-end distributed GPU server has physically connected GPU processors and permissions to GPU drivers and runtime. The architecture supports the major accelerator library namely Nvidia CUD A, and all combinations of x86_64

SUBSTITUTE SHEET (RULE 26) and ARM platforms, even for VMs. Upon proper authentication from the front-end relay server and configurations, the front-end client will be configured accordingly and be connected to the back-end. The wrapper stub-libraries API will then dynamically intercept the client’s GPU-accelerated application request and forward the input parameters to the remote servers with physical GPUs over the network. Error checking will be done on the distributed back-end GPU server-side when a remote instance is received and intercepted before passing onto the execution of the request and mapping of memory pointers with direct access to the driver of GPUs. Finally, the associated procedure call results will be forwarded back to the front-end client over the network and captured by the front-end API.

[0005] Throughout the process, the authenticated front-end client accesses all the GPU resources remotely as if local GPU arrays and distributes its resources to its clients, even for non-MPI applications. The framework allows the ability of the client to theoretically use unlimited GPUs from the remote GPU server clusters, and the transparency of architecture requires no modifications in source code. The increased GPU resources per application allow the framework to overcome the overall performance difference compared to direct access to the GPU processors for a GPU client.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] The accompanying embodiments of the present invention with reference numerals serve to better differentiate multiple elements in different views following with a written description of the specification, by way of example only, and to further present the advantages and mechanisms with the present embodiments.

[0007] Figure 1 depicts an illustration of the top-level architecture of remote virtualization of GPU processors between the front-end end users and the back-end GPU servers remotely over the internet, with the authentication of a configuration relay server.

[0008] Figure 2 depicts an illustration of the communication flow of GPU-accelerated application execution between the front-end client and back-end remotely over the internet while supporting abstractions of common mechanisms to the developers to achieve no change in source code.

SUBSTITUTE SHEET (RULE 26) [0009] Figure 3 depicts an illustration of the threading mechanism on the distributed back-end GPU servers where each server can listen to multiple client’s endpoints and spawn threads for each request with various communication protocols such as TCP/IP and VMSocket.

[0010] Figure 4 depicts an illustration of the availability and combination of processor architectures that can be achieved by the heterogeneous architecture where the flexibility is shown for both remote front-end and back-end.

[0011] Figure 5 depicts an illustration of the load balancing migration mechanism to maximize shared resource utilization between multiple front-end application requests overtime to minimize unevenness.

[0012] The accompanying figures are presented with clarity for better understanding and are readily apparent to one of ordinary skill in the art and have not necessarily been depicted to scale.

DETAILED DESCRIPTION

[0013] A detailed description is purely illustrative and is not to limit the invention or its application. Furthermore, no intentions in the invention’s prior background of the invention or the subsequent comprehensive disclosure are intended to be bound.

[0014] In various embodiments, a remote GPU Virtualization over the Internet architecture is presented with the methods and structure involved between the front-end client, back-end distributed GPU servers, and a front-end relay server. The front-end client will not be equipped with any GPU processors, but leveraging all GPU resources from the distributed back-end GPU servers as if local GPUs to run GPU-accelerated applications. Conventional HPC solutions are mainly on a shared-nothing approach, and as discussed, have several limitations, mainly by the inefficiency in GPU resource utilization. Each of the HPC facilities is leveraging one or more accelerators within a cluster which seems to be appealing considering the raw performance, but it is unlikely to maximize all the GPU processors with such a degree of parallelism processing. GPU processors are high in power consumption even in idle which is highly energy inefficient, not to mention the high acquisition, maintenance, administration, and space cost across the clusters. To address the aforementioned limitations,

SUBSTITUTE SHEET (RULE 26) virtualization of GPU resources can be achieved by proper scheduling and dispatching, to concurrently sharing them amongst several applications. A back-end server node can be put into low-power mode too when the acceleration is not required.

[0015] The remote visualization architecture can theoretically serve unlimited GPU processors across the back-end GPU acceleration servers over the internet via TCP/IP, which effectively improves the performance-to-power ratio by increasing the total number of GPU processors available to the front-end client without hardware limitations such as the PCIe bandwidth. The transparency of the framework requires no modifications in application source code for the developers as the front-end client node accesses the GPU resources as if the arrays are locally connected, even for non-MPI applications. The amount of GPUs exponentially increases concurrent operations per second in parallel, which results in overall cluster throughput greater or equal than with native acceleration. All the above operations require authentication and proper configuration to be obtained from the front-end relay server.

[0016] Figure 1 depicts the remote architecture 100 of remote virtualization of GPU processors between the front-end client node 101, the back-end GPU server clusters 107, and the front-end relay server 102. The overall front-end and back-end architecture and wrapper API 103 are constructed with a collection of C++ classes and the common communicator 105 is over the Internet. To prevent throughput overhead over the network speed, a gigabit network or above between the clusters is expected to be mandatory. Process 104 depicts the abstraction of GPU resources where the front-end client node accesses all GPU processors from the remote distributed back-end GPU servers via the Internet as if local GPUs.

[0017] The front-end client node 101 is configured with a kernel module that wraps the driver APIs that provide abstraction methods to the GPU-accelerated application or request 201 as illustrated in Figure 2. With proper authentication from the front-end relay server and configuration, the front-end client will be initialized accordingly and be connected to the back-end distributed GPU servers. The client will then wait to receive callback requests from the driver, the stub module instance 202 will intercept and collect the request parameters in process 203, then fill up the input buffer before establishing a connection to the back-end GPU servers by the handler 204. Finally, the remote procedure output data will be obtained and handled in process 210 before receiving the exit code that is returned by the back-end.

SUBSTITUTE SHEET (RULE 26) There will only be one instance of the front-end module that will be alive throughout the life cycle of an application to achieve thread-safe operation.

[0018] The back-end distributed GPU server 107 has direct access to physical GPU processors and permissions to GPU drivers and runtime. The back-end request handler 206 is implemented to wait and serve incoming connections and application requests from the front-end client node 102. Handled parameters will undergo a parameter check in process 207 before being executed in process 208. All routines and functions are accurately mapped and handled appropriately to achieve output 209 to be forwarded back to the front-end client 101. As depicted in Figure 3, the back-end node 301 is capable of serving multiple clients with the pre-loading libraries by spawning threads like threads 303, 304, 306, and 307 from each process request such as process 302 and 305, and stays alive to keep listening from the front-end.

[0019] The back-end GPU servers or front-end clients might be deployed remotely where their internet-facing network conditions might vary. In some cases, they would be NATed by the ISP to deal with the IPv4 address range, where they are unreachable from outside. In this architecture, the front-end relay server acts as a rendezvous server, which listens to an unblocked, static IP address. A library leveraging the Universal Plug and Play (UPnP) protocol will be implemented for the frontend clients and backend GPU servers, and in this case, compatible routers are necessary. The main goal is to abstract the communication behind NATed nodes to obtain external IP via DHCP server and liaise with dynamic port mapping and firewall rules with the router before forwarding to the front-end relay server. The front-end relay server will then compile a custom configuration file accordingly in form of JSON containing IPs of the front-end clients, distributed backend GPU servers, and other configurations, ready to be obtained by both the authenticated parties.

[0020] Figure 4 depicts the flexibility of the heterogeneous architecture of the remote GPU visualization framework 400 where it supports all combinations of both x86_64 and ARM processor architectures, whether the front-end node is a physical node 401 or a virtual machine 402. This enables the flexibility of running applications in low-powered ARM-based systems while leveraging the GPU acceleration capability from the back-end distributed GPU server 404 over the Internet medium 403 and vice versa.

SUBSTITUTE SHEET (RULE 26) [0021] Figure 5 depicts the possibilities where at a given point in time, one back-end GPU server 504 might have a significant workload compared to the other idle server 505. With proper job migration, load balancing is possible to minimize the execution time. With the scheduling and energy policies, GPU jobs will be distributed to different destinations of back-end GPU server 509 to achieve optimization of execution. In the case of GPU memory exhaustion, the GPU job can be migrated to another GPU, and if, and only if, exhaustion across the back-end GPU servers, the job will be held in the buffer and the accelerated application will stall temporarily until resources are freed.

[0022] Specific embodiments that have been shown and discussed in depth should be recognized by those skilled in the art so that multiple modifications and adaptations may be made without departing from the spirit and scope of the appended claims. Modifications and adaptations that fall within the scope of appended claims are intended to be welcomed.

SUBSTITUTE SHEET (RULE 26)

Previous Patent: WATER QUALITY MEASUREMENT DEVICE

Next Patent: A SYSTEM AND METHOD TO MANAGE DATA EXCHANGE ACROSS A PLURALITY OF DATA ACCESS PORTALS