SYSTEM FOR PROCESSING MULTIPLE QUERIES USING GPU

Title:

SYSTEM FOR PROCESSING MULTIPLE QUERIES USING GPU

Document Type and Number:

WIPO Patent Application WO/2016/053083

Kind Code:

A1

Abstract:

A system for processing multiple queries comprising a preprocessor (126) for processing input data into a preprocessed file structure (130) suitable for being processed by parallel instruction sets; a query analyser (136) for processing a plurality of input queries (602) into parallel instruction sets (340) which are grouped and rearranged into an optimized order; and a GPU query execution engine (138) for distributing the preprocessed input data across multiple GPU cores and executing the parallel instruction sets thereon.

Inventors:

YONG KEH KOK (MY)
MAT NOR FAZLI BIN (MY)
CHUA MENG WEI (MY)
KARUPPIAH ETTIKAN KANDASAMY A L (MY)

Application Number:

PCT/MY2015/050114

Publication Date:

April 07, 2016

Filing Date:

October 01, 2015

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MIMOS BERHAD (MY)

International Classes:

G06F17/30

Foreign References:

US20080027920A1	2008-01-31
JP2008165622A	2008-07-17

Other References:

KAMIMURA,JUNPEI: "Design and Evaluation of a GPU Accelerated Column Store Database", IPSJ SIG TECHNICAL REPORT, 15 August 2011 (2011-08-15), pages 1 - 7, ISSN: 1884-0930

Attorney, Agent or Firm:

HEMINGWAY, Christopher Paul et al. (Unit 6 Level 20, Tower A, Menara UOA Bangsa, 5 Jalan Bangsar Utama 1 Taman Bangsar, MY)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

A system for processing multiple queries comprising:

a preprocessor (126) for preprocessing input data into a preprocessed file structure (130) suitable for being processed by parallel instruction sets;

a query analyser (136) for processing a plurality of input queries (602) into parallel instruction sets (340);

a GPU query execution engine (138) for distributing the preprocessed input data across multiple GPU cores and executing the parallel instruction sets thereon; characterised in that the query analyser (136) groups or rearranges the parallel instruction sets into an optimized order.

A system according to claim 1 wherein the preprocessor (126) partitions the input and extracts a table schema therefrom.

A system according to claim 1 wherein the preprocessor (126) obtains the configuration of the GPU cores and creates a meta analysis profile which details the optimum data partitions for the configuration. 4. A system according to claim 3 wherein the GPU query execution engine (138) performs load balancing by distributing the preprocessed input data depending on the GPU cores configuration.

5. A system according to claim 1 wherein the input queries (602) are SQL queries generated from MDX queries.

6. A system according to claim 1 wherein the query analyser (136) decomposes the queries into instruction sets and performs analysis to determine the shared input regions and result sets.

7. A system according to claim 1 wherein the query analyser (136) reorders the instruction sets by analysing the dependency thereof and consolidating the independent instruction sets with shared input regions.

8. A system according to claim 1 wherein the query analyser (136) tags the instruction sets with data usage by analysing the queries' utilisation of the output data region.

9. A system according to claim 1 wherein the query analyser (136) tags the parallel instruction sets with GPU memory requirements.

10. A system according to claim 1 wherein the GPU query execution engine (138) checks each piece of GPU hardware in which multiple GPU cores are situated to determine if multiple instruction sets can be executed simultaneously thereon.

Description:

SYSTEM FOR PROCESSING MULTIPLE QUERIES USING GPU Field of Invention

The invention relates to a system for processing multiple queries in parallel using a Graphics Processing Unit (GPU).

Background

When dealing with large datasets, complex queries can take a long time to process. For example MultiDimensional Expression (MDX) is a specialised query language which can be used to query Online Analytical Processing (OLAP) databases, but processing is slow in conventional systems.

It is possible to convert an MDX query into multiple SQL queries so that these can be processed in parallel using a GPU, but existing systems do not do this efficiently, effectively using brute force to process the raw data as they are not optimized.

An aim of the invention is to provide an improved system for processing multiple queries in parallel using a GPU. Summary of Invention

In an aspect of the invention, there is provided a system for preprocessing multiple queries comprising: a preprocessor for processing input data into a preprocessed file structure suitable for being processed by parallel instruction sets; a query analyser for processing a plurality of input queries into parallel instruction sets; a GPU query execution engine for distributing the preprocessed input data across multiple GPU cores and executing the parallel instruction sets thereon; characterised in that the query analyser groups or rearranges the parallel instruction sets into an optimized order.

Thus as the data is preprocessed for the parallel instruction sets, and the parallel instruction sets are grouped into an optimized order, processing of the parallel instructions on the data is faster than compared to conventional systems.

In one embodiment the preprocessor extracts a table schema from the input data. Typically the preprocessor partitions the data.

In one embodiment the preprocessor obtains the configuration of the GPU cores. Typically the preprocessor creates a meta analysis profile which details the optimum data partitions for the configuration.

In one embodiment the input queries are SQL queries generated from MDX queries. Typically the query analyser decomposes the SQL queries into instruction sets and performs analysis to determine the shared input regions and result sets.

In one embodiment the query analyser reorders the instruction sets by analysing the dependency thereof and consolidating the independent instruction sets with shared input regions. In one embodiment the query analyser tags the instruction sets with data usage by analysing the queries' utilisation of the output data region.

In one embodiment the query analyser tags the parallel instruction sets with GPU memory requirements.

In one embodiment the GPU query execution engine performs load balancing by distributing the preprocessed input data depending on the GPU cores configuration.

In one embodiment the GPU query execution engine checks each piece of GPU hardware in which multiple GPU cores are situated to determine if multiple instruction sets can be executed simultaneously thereon.

Brief Description of Drawings

It will be convenient to further describe the present invention with respect to the accompanying drawings that illustrate possible arrangements of the invention. Other arrangements of the invention are possible, and consequently the particularity of the accompanying drawings is not to be understood as superseding the generality of the preceding description of the invention.

Figure 1 is a schematic overview of an embodiment of the invention.

Figure 2 is a schematic diagram of the preprocessing stage Figure 3 is a schematic diagram of the query analysis stage Figure 4 is a schematic diagram of the query execution stage Figure 5 illustrates an example of input data being converted to preprocessed data

Figure 6 illustrates an example of SQL queries being converted to parallel instruction sets Detailed Description

With regard to Figure 1, there is illustrated a schematic overview 100 of an embodiment of the invention.

User queries 102 are submitted in the form of MDX requests 104 to an OLAP server 106. The OLAP server 106 may convert the MDX requests to SQL queries 108 for querying 112 the data warehouse 114 via a database connector 110. However, according to the invention, the SQL queries can be scheduled 116 for execution on a GPU using a scheduler 118 on a daemon server 122.

The daemon server includes a preprocessor 126 for preprocessing 124 input data into a preprocessed file structure suitable for being processed by parallel instruction sets. The preprocessed input data is written 128 onto disk in a GPU file system format 130 such that it can be read 132 therefrom when required. In addition, the daemon server includes a query analyser 136 for processing 134 a plurality of input queries into parallel instruction sets, and a GPU query execution engine for distributing the preprocessed input data across multiple GPU cores and executing the parallel instruction sets thereon, returning 140 the results to the daemon server 122.

With respect to Figure 2, the steps performed by the pre-processor are illustrated in more detail. The preprocessor listens 202 for instructions from the daemon server and on receiving the same, determines 204 if any GPU resources are available. Assuming GPU resources are available, data is extracted 206 from the data warehouse via the database connector, which provides data access between the database management system and the GPU parallel database system, and then a crumble analysis is conducted 208 to optimise parallelization of the input data. In the crumble analysis 210 a table schema is extracted 212 from the data and the total of the column sizes of each table is computed 214. The configuration of the GPU devices is then obtained 216 and a meta analysis profile is created 218 which underlines the data partitioning and formulates optimum data independent partitions against the GPU capabilities. Large input data is thus partitioned into segments of efficient data communication size for the GPU parallelization based on the underlining hardware architecture.

A parallel execution plan is then created 220 comprising the steps 222 of: getting 224 the GPU device configuration, receiving 228 the meta analysis profile of the input data, and determining 226 the optimum multi streams configuration by analysing the total streams usage and size of input data.

The data is then organised 230 into parallel streams in the GPU transport manager 232 which governs parallel transferring of data streams asynchronously between CPU and GPU via PCI express. Multiple streams are created 234 for segmented data compression. The data is split 236 into blocks and assigned to GPU threads. The parallel data is compressed 238 into streams and the data from multiple streams is synchronized 240. The subsets of output from the streams are then united 242.

The preprocessed data, now in an optimised GPU file system format, is then written 244 to disk, and once the plan has completed 246, the daemon server is notified 248.

With respect to Figure 3, the steps performed by the query analyser are illustrated in more detail.

Multiple SQL queries, converted from MDX queries, are received 302 from an OLAP server. A break down analysis is then performed 304, in which the multiple queries are decomposed 306 into instruction sets, the shared input region and result set is detected 308 and then written 310 into a shared input structure buffer as shared input data region indexes 312. The resulting query instruction sets are then reordered 314 by analysing 316 the dependency thereof and assembling 318 the independent instruction sets with shared input region to provide reordered multiple instruction sets 320. Parallel boundaries are analysed by examining the output data region 322, more specifically by analysing the boundaries of parallelization 324, generating memory boundary instruction sets 326 and computation boundary instruction sets 328, and analysing the process data usage 330 to provide multiple query instruction sets with data usage information 332.

Parallel instruction sets for GPU execution are then synthesized 334 by analysing 336 the order of the results sets (reordered multiple instruction sets 320 and multiple query instruction sets with data usage information 332) and composing 338 instructions from the combinations, to provide optimized multiple query instruction sets 340. These are then sent 342 to the parallel database engine.

With respect to Figure 4, the steps performed by the GPU query execution engine are illustrated in more detail. The process starts 402 by receiving instructions sets 404 and generating 406 parallel APIs based on the instruction sets and outline tasks.

More specifically, the GPU memory usage is defined 408 by global, constant and shared memory. The GPU parallel thread utilization for blocks is then configured 410. If the GPU supports dynamic parallelism 412, a pre-implemented dynamic query operator and dynamic kernel APIs are used 414, if not, a pre-implemented standard query operator and generic kernel APIs are used 418. A list of APIs for GPU parallel execution is generated 416 to form a task.

The tasks are inserted 420 into a worker queue, a fragment of input data from the GPU file system is read 422, and then executed 424 in the GPU by analysing 426 the streaming load balancing, performing 428 load balancing, then decompressing and computing streams 430 before merging and cohering 432 the streaming data.

The output of the results set from the GPU are united 434, and if there are no more tasks 436, the daemon server is notified 438 that the processing has completed and the results are returned thereto. With regard to Figure 5 an example of input data 502 being converted to preprocessed data 510 is illustrated, wherein the different columns of strings 504, integers 506 and floating point numbers 508 are separated into different chunks 510.

With respect to Figure 6 an example of SQL queries being converted to parallel instruction sets is illustrated.

A group 602 of SQL queries (#1, #2, #3) are transformed 304 into instruction sets 604. The instruction sets are then optimized 314 as described previously to create reordered instructions sets 320 which are then tagged with memory usages for parallel processing. It will be appreciated by persons skilled in the art that the present invention may also include further additional modifications made to the system which does not affect the overall functioning of the system.

Previous Patent: ELECTRONIC COMPONENT TEST APPARATUS

Next Patent: INTERMEDIATE PRODUCT FOR PRODUCING A PACKING MEANS, METHOD FOR MANUFACTURING SAME, PACKING MEANS, AN...