Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DATA MANAGEMENT DEVICE AND METHOD FOR CONTROLLING A NODE IN A DATABASE CLUSTER
Document Type and Number:
WIPO Patent Application WO/2024/008286
Kind Code:
A1
Abstract:
A method of data management in a database cluster (416) includes, (i) in a rewriter (402 A) of a node (410 A) designated as a master node (41 OB), in response to receiving a data definition language, DDL, command from a client device (408 A) or another node (410C-N) (a) allocating a sequential command ID to the DDL command, (b) replicating the DDL command to each other node in the database cluster (416), and (c) sending the DDL command to the database engine (404B); (ii) in the rewriter (402 A) of a node (410 A) not designated as a master node (410B) (a) in response to receiving a DDL command from a client device (408 A), transmitting the DDL command to the node which is designated as the master node (410B), and (b) in response to receiving a DDL command from the master node (410B), sending the DDL command to the database engine (404A).

Inventors:
ZILPA LIOR (DE)
Application Number:
PCT/EP2022/068765
Publication Date:
January 11, 2024
Filing Date:
July 06, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
ZILPA LIOR (DE)
International Classes:
G06F16/27
Foreign References:
EP4002148A12022-05-25
Other References:
NASIBULLIN A R ET AL: "Replication in Distributed Systems: Models, Methods, and Protocols", PROGRAMMING AND COMPUTER SOFTWARE, CONSULTANTS BUREAU, NEW YORK, NY, US, vol. 46, no. 5, 1 September 2020 (2020-09-01), pages 341 - 350, XP037260632, ISSN: 0361-7688, [retrieved on 20201001], DOI: 10.1134/S0361768820050060
ANONYMOUS: "Replication (computing) - Wikipedia", 14 June 2022 (2022-06-14), XP093027771, Retrieved from the Internet [retrieved on 20230228]
Attorney, Agent or Firm:
KREUZ, Georg M. (DE)
Download PDF:
Claims:
CLAIMS

1. A data management device (400) for controlling a node in a database cluster (416) which comprises a plurality of nodes (410A-N), comprising: a rewriter (402A-N); and a database engine (404A-N); wherein if the node (41 OA-N) is designated as a master node, in response to receiving a data definition language, DDL, command from a client device (408A-N) or another node , the rewriter (402A-N) is configured to: allocate a sequential command ID to the DDL command, replicate the DDL command to each other node in the database cluster (416), and send the DDL command to the database engine (404A-N); wherein if the node (41 OA-N) is not designated as a master node, the rewriter (402 A- N) is configured to: in response to receiving a DDL command from the client device (408A-N), transmit the DDL command to the node (41 OB) which is designated as the master node, and in response to receiving a DDL command from the master node (41 OB), send the DDL command to the database engine (404A-N); and wherein the database engine (404A-N) is configured to execute the DDL command received from the rewriter (402A-N) based on the order of the sequential command ID and update a metadata record of the node (41 OA-N).

2. The data management device (400) of claim 1, further comprising a storage engine, SE, driver (406A-N) connected with a storage engine, SE, cluster (414) of the database cluster (416), wherein the database engine (404 A-N) is configured to execute the DDL command using a SE driver (406A-N); wherein if the node (410A-N) is not designated as the master node, the SE driver (406 A- N) is configured to send an API request to the SE driver (406A-N) of the master node (41 OB); wherein if the node (410A-N) is designated as the master node, the SE driver (406A-N) is configured to send the DDL command to the SE cluster (414) for execution after the API request is received from each other node.

3. The data management device (400) of claim 2, wherein the DDL command includes multiple steps and the storage engine driver (406A-N) is configured to send an API request for each step.

4. The data management device (400) of claim 2 or claim 3, wherein the SE driver (406A- N) is further configured to: if the node (410A-N) is designated as the master node, provide the execution result to the DB engine (404A-N) to update the metadata and distribute an execution result to the SE driver (406A-N) of each other node; and if the node (410A-N) is not designated as the master node, in response to receiving the execution result, provide the execution result to the DB engine (404A-N) to update the metadata.

5. The data management device (400) of any preceding claim, wherein the database engine (404A-N) is further configured to notify the rewriter (402A-N) when execution of the DDL command is complete; wherein if the node (410A-N) is not designated as the master node, the rewriter (402A- N) is configured to send an acknowledgment to the rewriter (402B) of the master node (410B); and wherein if the node (410A-N) is designated as the master node, the rewriter (402A-N) is configured to notify the rewriter (402A-N) of the node (410A-N) which received the DDL command from the client device (408A-N) after the acknowledgement is received from each other node.

6. A database cluster (416) comprising the data management device (400) of any preceding claim.

7. A method of data management in a database cluster (416) which comprises a plurality of nodes (410A-N), comprising: in a rewriter (402 A-N) of a node (410A-N) designated as a master node, in response to receiving a data definition language, DDL, command from a client device (408A-N) or another node: allocating a sequential command ID to the DDL command, replicating the DDL command to each other node in the database cluster (416), and sending the DDL command to the database engine (404 A-N); in the rewriter (402A-N) of a node (410A-N) not designated as a master node: in response to receiving a DDL command from a client device (408A-N), transmitting the DDL command to the node (41 OB) which is designated as the master node, and in response to receiving a DDL command from the master node (41 OB), sending the DDL command to the database engine (404A-N); and executing, by the database engine (404A-N), the DDL command received from the rewriter (402A-N) based on the order of the sequential command ID and updating a metadata record of the node (410 A-N).

8. The method of claim 7, wherein executing the DDL command includes using a storage engine, SE, driver (406A-N) connected with a storage engine, SE, cluster (414) of the database cluster (416); and wherein the method (500) further comprises: if the node (410A-N) not designated as the master node, sending, by the SE driver (406A-N), an API request to the SE driver (406A-N) of the master node (410B); and in the node (410A-N) designated as the master node, sending, by the SE driver (406 A-N), the DDL command to the SE cluster (414) for execution after the API request is received from each other node.

9. The method of claim 8, wherein the DDL command includes multiple steps and sending the API request includes sending an API request for each step.

10. The method of claim 8 or claim 9, further comprising: in the node (410A-N) designated as the master node, providing the execution result to the DB engine (404A-N) and distributing an execution result to the SE driver (406A-N) of each other node; and in the node (410A-N) not designated as the master node, in response to receiving the execution result, providing the execution result to the DB engine (404A-N) to update the metadata.

11. The method of any one of claims 7 to 10, further comprising notifying, by the database engine (404A-N), the rewriter (402A-N) when execution of the DDL command is complete; in each node (410 A-N) not designated as the master node, sending an acknowledgment to the rewriter (402B) of the master node (41 OB); and in the node (41 OA-N) designated as the master node, notifying the rewriter (402A-N) of the node (41 OA-N) which received the DDL command from the client device (408 A-N) after the acknowledgement is received from each other node.

12. A computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 7 to 11.

Description:
DATA MANAGEMENT DEVICE AND METHOD FOR CONTROLLING A NODE IN

A DATABASE CLUSTER

TECHNICAL FIELD

The disclosure relates to a data management device for controlling a node in a database cluster, and more particularly, the disclosure relates to a method of data management in the database cluster.

BACKGROUND

Structured Query Language (SQL) is a language used by databases. SQL is used for storing, manipulating, and retrieving user data in databases. SQL includes four groups of commands. The four groups of commands include Data Definition Language (DDL), Data Query Language (DQL), Data Manipulation Language (DML), and Data Control Language (DCL). DDL includes the commands CREATE, DROP, ALTER, TRUNCATE, COMMENT, and RENAME. DQL includes the command SELECT. DML includes the commands INSERT, UPDATE, and DELETE. DCL includes the commands GRANT, and REVOKE. While the user data is organized in tables, metadata (MD) is required for managing the access and definition of the user data. For every access to user data, access is required to the MD. The MD holds information on existing tables, transactions, checkpoints, indexing, etc., and is manipulated by DDL commands. On a high level, the MD needs to be synchronized in a database cluster.

The database cluster is divided into three groups, (i) clustered by design such as Oracle RAC, (ii) Cloud databases such as Amazon Aurora, Microsoft Socrates, etc., and (iii) databases that were originally designed as a single node, but there are solutions to make them into a cluster database. For the first two groups, MD management and access are part of the design and embedded in the execution flow. Databases such as MySQL were designed as a single monolithic process where the MD is maintained internally and locally in random access memory (RAM). The persistent process for the MD is also done internally and not exposed. When making databases like these into a cluster, the problem is how to coordinate and synchronize MD between nodes while maintaining concurrency, coherency, and persistency among the nodes.

An existing approach known as NDB is a synchronous multi-master replication plug-in for MySQL using InnoDB. The plug-in is built on three layers. Firstly, a certification layer that prepares the write-sets and checks that they can be applied, secondly a replication layer that manages the replication protocol and provides the total ordering capability, and thirdly, a group communication framework that implements a plugin architecture that allows other systems to connect via common back-end schema. When a transaction is started, database resources local to that node are involved. At commit, the transaction's operations are broadcasted as part of a write-set. As all nodes have the same state, the write-set will either be successful on all nodes or it will fail on all nodes. When aborts occur at the cluster level, the cluster gives a deadlock error. When the client issues a COMMIT command, but before the actual commit occurs, all changes made to the database by the transaction and primary keys of the changed rows, are collected into a write-set. The database then sends this write-set to all of the other nodes. Every node runs a deterministic certification test, using the primary keys, on the write-set, including the initiating node. Following the certification test, it determines whether or not the node can apply the write-set.

Each node has its full copy of data, so a copy of the entire dataset is available on all nodes. Each node holds the whole database by applying a quorum-based protocol. Each node has its local storage and buffers. The replication is certification-based replication, which consists of a list of operations and not data blocks. MD synchronization depends on the replication and commits mechanism. So, the synchronous multi-master replication plug-in can't be used with a shared storage engine. It requires a full copy of the database in each node. Under high load, the commit phase becomes a bottleneck. Commit is very slow. Long transactions are more likely to be rolled back or to cause dramatic performance degradation. Certification-Based Replication can be supported only by a transactional storage engine because it requires a rollback mechanism. Thus, only the InnoDB is supported. To use the plugin, many DB features are limited like, tables must have a primary key, the query cache isn't supported, only row-level locking is supported, etc.

Therefore, there arises a need to address the aforementioned technical drawbacks in known techniques or technologies in managing data in a data cluster. SUMMARY

It is an object of the disclosure to provide a data management device and a method for controlling a node in a database cluster while avoiding one or more disadvantages of prior art approaches.

This object is achieved by the features of the independent claims. Further, implementation forms are apparent from the dependent claims, the description, and the figures.

The disclosure provides a data management device and a method of data management in a database cluster which includes a plurality of nodes.

According to a first aspect, there is provided a data management device for controlling a node in a database cluster. The database cluster includes one or more nodes. The data management device includes a rewriter, and a database engine. If the node is designated as a master node, in response to receiving a data definition language, DDL, command from a client device or another node, the rewriter is configured to (i) allocate a sequential command ID to the DDL command, (ii) replicate the DDL command to each other node in the database cluster, and (iii) send the DDL command to the database engine. If the node is not designated as a master node, the rewriter is configured to (i) in response to receiving a DDL command from a client device, transmit the DDL command to the node which is designated as the master node, and (ii) in response to receiving a DDL command from the master node, send the DDL command to the database engine. The database engine is configured to execute the DDL command received from the rewriter based on the order of the sequential command ID and update a metadata record of the node.

The data management device is unique as it requires no changes to existing DB architectures. The data management device performs the operations without introducing additional performance degradation for DML queries. The data management device keeps a local metadata registry copy. The data management device can be embedded with zero code changes for DBs who expose plugin APIs and are agnostic to an underlying storage engine. The data management device provides a generic solution for making a single DB server cluster as it comes to MD synchronization. Each node holds the same MD, thereby accelerating recovery because it can be done from a remote node memory. The data management device enables to protect from multiple failures as many copies are fully synced. Other than MD synchronization, the data management device maintains the serialization together with the other mechanisms, SQL statements concurrency control, and error handling without additional changes. Due to the implementation of a shared storage engine in the data management device, any standalone database can become a cluster.

Optionally, the data management device further includes a storage engine, SE, driver connected with a SE cluster of the database cluster. The database engine is configured to execute the DDL command using the SE driver. If the node is not designated as the master node, the SE driver is configured to send an API request to the SE driver of the master node. If the node is designated as the master node the SE driver is configured to send the DDL command to the SE cluster for execution after the API request is received from each other node.

Optionally, the DDL command includes multiple steps and the storage engine driver is configured to send an API request for each step.

Optionally, the SE driver is further configured to (i) if the node is designated as the master node, provide the execution result to the DB engine to update the metadata and distribute an execution result to the SE driver of each other node, and (ii) if the node is not designated as the master node, in response to receiving the execution result, provide the execution result to the DB engine to update the metadata.

Optionally, the database engine is further configured to notify the rewriter when execution of the DDL command is complete (i) If the node is not designated as the master node, the rewriter is configured to send an acknowledgment to the rewriter of the master node, and (ii) if the node is designated as the master node the rewriter is configured to notify the rewriter of the node which received the DDL command from the client device after the acknowledgment is received from each other node.

According to a second aspect, a database cluster including the data management device.

According to a third aspect, there is provided a method of data management in a database cluster. The database cluster includes one or more nodes. The method includes, in a rewriter of a node is designated as a master node, in response to receiving a data definition language, DDL, command from a client device or another node, allocating a sequential command ID to the DDL command. The method includes replicating the DDL command to each other node in the database cluster. The method includes sending the DDL command to the database engine. The method includes in response to receiving a DDL command from a client device, transmitting the DDL command to the node, which is designated as the master node, when the rewriter of a node is not designated as a master node. The method includes, in response to receiving a DDL command from the master node, sending the DDL command to the database engine. The method includes executing, by the database engine, the DDL command received from the rewriter based on the order of the sequential command ID and updating a metadata record of the node.

Optionally, the method includes executing the DDL command includes using a storage engine, SE, driver connected with a SE cluster of the database cluster, and the method further includes if the node not designated as the master node, sending, by the SE driver, an API request to the SE driver of the master node, and in the node designated as the master node, sending, by the SE driver, the DDL command to the SE cluster for execution after the API request is received from each other node.

Optionally, the DDL command includes multiple steps, and sending the API request includes sending an API request for each step.

Optionally, the method further includes in the node designated as the master node, providing the execution result to the DB engine and distributing an execution result to the SE driver of each other node, and in the node not designated as the master node, in response to receiving the execution result, providing the execution result to the DB engine to update the metadata.

Optionally, the method further includes notifying, by the database engine, the rewriter when execution of the DDL command is complete, in each node not designated as the master node, sending an acknowledgment to the rewriter of the master node, and in the node designated as the master node, notifying the rewriter of the node which received the DDL command from the client device after the acknowledgment is received from each other node.

According to a fourth aspect, a computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform the above method.

Therefore, in contradistinction to the prior art, the data management device and the method accelerate data recovery by keeping an updated local metadata registry copy. The data management device and the method is a generic solution that can be applied to any database. The data management device and the method keep on the serialization together with the other mechanisms, SQL statements concurrency control, and error handling without additional changes.

These and other aspects of the disclosure will be apparent from and the implementation(s) described below.

BRIEF DESCRIPTION OF DRAWINGS

Implementations of the disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which:

FIG. 1 illustrates an exemplary in-memory databases in a shared-nothing architecture in accordance with a prior art;

FIG. 2 illustrates an exemplary NDB data dictionary (DD) synchronization in accordance with a prior art;

FIG. 3 illustrates an exemplary PostgreSQL architecture in accordance with a prior art;

FIG. 4 is a block diagram of a data management device for controlling a node in a database cluster which includes one or more nodes in accordance with an implementation of the disclosure;

FIGS. 5 A and 5B are flow diagrams that illustrate a method of data management in a database cluster in accordance with an implementation of the disclosure;

FIG. 5C is a flow diagram that illustrates a method of executing a DDL command in accordance with an implementation of the disclosure;

FIG. 5D is a flow diagram that illustrates a method of providing an execution result to a database (DB) engine in accordance with an implementation of the disclosure;

FIG. 5E is a flow diagram that illustrates a method of notifying the rewriter when execution of the DDL command is complete in accordance with an implementation of the disclosure; and FIG. 6 is an illustration of an exemplary data management device, a distributed data storage, or a computer system in which the various architectures and functionalities of the various previous implementations may be implemented.

DETAILED DESCRIPTION OF THE DRAWINGS

Implementations of the disclosure provide a data management device and a method of data management in a database cluster which includes a plurality of nodes.

To make solutions of the disclosure more comprehensible for a person skilled in the art, the following implementations of the disclosure are described with reference to the accompanying drawings.

Terms such as "a first", "a second", "a third", and "a fourth" (if any) in the summary, claims, and foregoing accompanying drawings of the disclosure are used to distinguish between similar objects and are not necessarily used to describe a specific sequence or order. It should be understood that the terms so used are interchangeable under appropriate circumstances, so that the implementations of the disclosure described herein are, for example, capable of being implemented in sequences other than the sequences illustrated or described herein. Furthermore, the terms "include" and "have" and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, a method, a system, a product, or a device that includes a series of steps or units, is not necessarily limited to expressly listed steps or units but may include other steps or units that are not expressly listed or that are inherent to such process, method, product, or device.

FIG. 1 illustrates an exemplary in-memory database in a shared-nothing architecture 100 in accordance with a prior art. The shared-nothing architecture 100 includes a cluster with components like clients 102A-N, MySQL nodes 104A-N, NDB management server 106, data nodes 108A-N, and storage engine 110. The clients 102A-N may be management clients or SQL clients. The NDB management server 106 configures the cluster. The MySQL node 104A configures MySQL server in the network database (NDB). The NDB storage engine is de-facto divided into two parts. The data node 108A stores the data. The storage engine 110 is a handler for tables that stores data, retrieve data, and finds data through an index. The default storage engine 110 that is used by MySQL is called InnoDB. The MD objects are serialized as in a table structure. The InnoDB is the storage engine 110 that is used for the persistence of the tables.

FIG. 2 illustrates an exemplary NDB data dictionary (DD) synchronization 200 in accordance with a prior art. Each MySQL instance may be a regular MySQL server. The exemplary NDB DD synchronization 200 includes a MySQL server DD MD that keeps using the DD client as if it is a standalone server. The DD may be an MD repository. Each server may have its copy of DD and local InnoDB for storing the DD. The NDB directly accesses the DD and keeps the DD updated. The NDB maintains its own DD. NDB DD may not hold objects, but the MD of the DD object. The NDB DD object holds the MySQL DD object and a schema version coordination of the MD. The NDB DD object is viewed as pointing to the source of truth while each MySQL server’s DD is equivalent to a cached copy.

The NDB DD synchronization 200 is not generic, and its code is intrusive, as many applications programming interfaces (API) calls are required in the query flow. Extra schema coordination is needed to maintain it. During the NDB DD synchronization 200, four different mechanisms and background tasks are used to keep the DD in sync. In a couple of the mechanisms, a global shared lock (GSL) is taken, which leads to performance degradation. GSL may be applied to any change to the DD object. An extra overhead is required to deal with out-of-sync issues and local schema update failures.

FIG. 3 illustrates an exemplary PostgreSQL architecture 300 in accordance with a prior art. The exemplary PostgreSQL architecture 300 includes a global transaction manager (GTM) 304, coordinator 306A-N, and data nodes 308A-N. Every table is either partitioned or replicated across multiple nodes of the PostgreSQL architecture 300. The GTM 304 issues transaction IDs and snapshots as part of its multi-version concurrency control (MVCC) The coordinator 306A-N manages the user sessions and interacts with the GTM 304 and the data nodes 308A- N. The coordinator 306A-N parses and plans queries and sends down a serialized global plan to each of the components involved in a statement. In the simplest case, if a single data node is involved, the coordinator 306A-N simply proxies incoming statements to the data node. The data node 308A stores the actual data. The PostgreSQL servers accept that transaction management information is obtained from the GTM 304, as well as other global values. The PostgreSQL architecture 300 depends on an external component to hold and manage the MD by introducing additional message exchanges on every query which has significant performance loss both on DDL & MDL queries. The extension for the PostgreSQL architecture 300 distributes data and queries across nodes. The extension for the PostgreSQL architecture 300 is based on the coordinator 306A-N which is the authoritative source of metadata for the cluster. Distributed tables can only be created, altered, or dropped via the coordinator 306A-N. The extension for the PostgreSQL architecture 300 propagates metadata for the distributed tables using PostgreSQL’ s built-in 2PC mechanism and distributed locks. The extension for the PostgreSQL architecture 300 has an external component to hold and manage the MD by introducing additional message exchanges on every query which cause significant performance losses both on DDL & MDL queries. While performance degradation on DDL queries is acceptable in a cluster configuration, DML commands performance should be affected as minimally as possible (optimally, not at all).

FIG. 4 is a block diagram of a data management device 400 for controlling a node (e g. 410A) in a database cluster 416 which includes one or more nodes 410A-N in accordance with an implementation of the disclosure. Each node (e.g. 410A) in the data management device 400 includes a rewriter 402A, and a database engine 404A. When the node 410A is designated as a master node, in response to receiving a data definition language, DDL, command from a client device 408A, or another node (e g. 410B-N), the rewriter 402A is configured to allocate a sequential command ID to the DDL command. The rewriter 402A is configured to replicate the DDL command to each other node in the database cluster 416. The rewriter 402A is configured to send the DDL command to the database engine 404A. When the node 410A is not designated as a master node, in response to receiving a DDL command from the client device 408A, the rewriter 402A is configured to transmit the DDL command to the node 410B which is designated as the master node. In response to receiving a DDL command from the master node 410B, the rewriter 402B is configured to send the DDL command to the database engine 404B. The database engine 404B is configured to execute the DDL command received from the rewriter 402B based on the order of the sequential command ID and update a metadata record of the node 410A.

The data management device 400 is unique as it requires no changes to existing DB architectures. The data management device 400 performs the operations without introducing additional performance degradation for DML queries. The data management device 400 can be embedded with zero code changes for DBs who expose plugin APIs and are agnostic to an underlying storage engine. The data management device 400 provides a generic solution for making a single DB server cluster as it comes to MD synchronization. Each node holds the same MD, thereby accelerating recovery because it can be done from a remote node memory. The data management device 400 enables to protect from multiple failures as many copies are fully synced. Other than MD synchronization, the data management device 400 keeps on the serialization together with the other mechanisms, SQL statements concurrency control, and error handling without additional changes. Due to the implementation of a shared storage engine in the data management device 400, any standalone database can become a cluster.

Optionally, each node in the data management device 400 further includes a storage engine, SE, driver 406A-N connected with a storage engine, SE, cluster 414 of the database cluster 416. The database engine 404A is configured to execute the DDL command using the SE driver 406A. If the node 410A is not designated as the master node, the SE driver 406A is configured to send an API request to the SE driver 406B of the master node (e.g. a node 410B). If the node 410B is designated as the master node, the SE driver 406B is configured to send the DDL command to the SE cluster 414 for execution after the API request is received from each other node.

Optionally, the DDL command includes multiple steps, and the storage engine driver 406A-N of each node is configured to send an API request for each step.

Optionally, the SE driver 406A is further configured to, (i) if the node 410A is designated as the master node, provide the execution result to the DB engine 404A to update the metadata and distribute an execution result to the SE driver of each other node, and (ii) if the node 410A is not designated as the master node, in response to receiving the execution result, provide the execution result to the DB engine 404A to update the metadata.

Optionally, the database engine 404A is further configured to notify the rewriter 402A when execution of the DDL command is complete, (i) If the node 410A is not designated as the master node, the rewriter 402A is configured to send an acknowledgment to the rewriter 402B of the master node 410B, and (ii) if the node 410A is designated as the master node 410B, the rewriter 402B is configured to notify the rewriter 402A of the node 410A which received the DDL command from the client device 408A after the acknowledgment is received from each other node.

Optionally, the data management device 400 supports a plugin API to create new server modules for each node (e.g. 410A-N) to load/unload dynamically at runtime without restarting a server. The API may be generic and does not specify what a module can do. The modules that are supported by the API may be the rewriter 402A-N, and the storage engine. The rewriter 402A-N may be used as a filter driver before the server executes. The rewriter 402A-N may take both pre-parse and post-parse queries. For pre-parse queries, the rewriter 402A-N may rewrite the SQL statement string before the server processes it and may return a different string. For post-parse queries, the rewriter 402A-N may rewrite the SQL statement based on parse trees and return a different tree.

Optionally, a data definition language, DDL command is sent from the client device 408A associated with the node 410A. In response to receiving the DDL command by the rewriter 402A of the node 410A, the DDL command is forwarded to the master node 410B. A sequential command ID to the DDL command is then allocated to the master node 410B. The DDL command is replicated to each other node in the database cluster 416.

Optionally, in response to receiving a DDL command from the client device 408B of the master node 410B, the DDL command is transmitted to the master node 410B. The DDL command is executed by the database engine 404B. The SE driver 406B interacts with the database engine 404B for executing the DDL command. The API request is sent to the SE driver 406B of the master node 410B. The DDL command is sent to the database cluster 416 for execution after receiving the API request from each other node by the SE driver 406B of the master node 410B. The database cluster 416 executes the DDL command. The execution result is provided to the DB engine 404B and the execution result is distributed to the SE driver 406A-N of each other node.

Optionally, the node (e g 410A) manages multiple DDL/DML commands concurrently on the same server. The concurrent management of the multiple DDL/DML commands is used for DDL commands synchronization. When the node 410A receives the DDL command, the node 410A replicates the DDL command to other nodes by assuring mutual exclusion between DDL and DML commands. Thus, each node manages the DDL/DML execution.

Optionally, a flow of the DDL commands is divided into steps, each step is communicating with the SE driver 406B of the master node 410B and the SE driver 406B can fail the DDL command and cause a rollback. Each step maybe blocked by running API request of the SE driver 406B. After executing the API request, the SE driver 406B of the master node 41 OB may send answer to all other SE drivers of the other nodes. Optionally, when the SE driver 406B fails to send answer to all other SE drivers, a rollback is done as it is a standalone server. Optionally, the SEs are server modules that act as handlers for different table types and the SEs are used for storing, retrieving data, and finding data through an index.

Optionally, the MySQL server communicates with the SEs by a defined API. The APIs of underlying storage engine are wrapped and are used to implement a protocol used for SE execution synchronization.

Optionally, MD objects are stored and managed internally by system schema, DD tables, and system tables. The DD tables store database object metadata like information on existing tables and their structure and the system tables are used for other operational purposes like users and privileges.

The DD may be stored in transactional tables powered by InnoDB storage engine (SE). The transactional tables that store the DD may be created in a single InnoDB tablespace named, for example, mysql.ibd (together with non-data dictionary system tables) and protected by the same commit, rollback, and crash recovery capabilities of InnoDB tables. The MD objects may be managed the same way by the InnoDB.

FIGS. 5 A and 5B are flow diagrams that illustrate a method of data management in a database cluster in accordance with an implementation of the disclosure. At step 502, the method includes, in response to receiving a data definition language, DDL, command from a client device or another node, allocating a sequential command ID to the DDL command, when a rewriter of a node designated as a master node. The database cluster includes one or more nodes. At step 504, replicating the DDL command to each other node in the database cluster. At step 506, the method includes sending the DDL command to the database engine. At step 508, the method includes in response to receiving a DDL command from a client device, transmitting the DDL command to the node which is designated as the master node, when the rewriter of a node not designated as a master node. At step 510, the method includes, in response to receiving a DDL command from the master node, sending the DDL command to the database engine. At step 512, the method includes executing, by the database engine, the DDL command received from the rewriter based on the order of the sequential command ID and updating a metadata record of the node.

The method uses a plugin architecture as long as the architecture supports the plugins. Thereby this is a generic solution that can be applied to any database. The implementation of the plugin architecture is very easy. This can be embedded with zero changes in code. The method keeps a local metadata registry copy. Thereby, recovery can be greatly accelerated because the recovery can be done from remote node memory. As many copies are fully synced, protection is provided from multiple failures. Other than MD synchronization, the method keeps on the serialization together with the other mechanisms, SQL statements concurrency control, and error handling without additional changes. Due to the implementation of a shared storage engine in the data management device, any standalone database can become a cluster.

FIG. 5C is a flow diagram that illustrates a method of executing a DDL command in accordance with an implementation of the disclosure. At step 514, the method includes executing the DDL command includes using a storage engine, SE, driver connected with a storage engine, SE, cluster of the database cluster. At step 516, the method further includes if the node is not designated as the master node, sending, by the SE driver, an API request to the SE driver of the master node. At step 518, the method includes, if the node is designated as the master node, sending, by the SE driver, the DDL command to the SE cluster for execution after the API request is received from each other node.

Optionally, the DDL command includes multiple steps, and sending the API request includes sending an API request for each step.

FIG. 5D is a flow diagram that illustrates a method of providing execution result to a database (DB) engine in accordance with an implementation of the disclosure. At step 520, the method further includes, if the node is designated as the master node, providing the execution result to the DB engine and distributing an execution result to the SE driver of each other node. At step 522, the method includes, if the node is not designated as the master node, in response to receiving the execution result, providing the execution result to the DB engine to update the metadata.

FIG. 5E is a flow diagram that illustrates a method of notifying the rewriter when execution of the DDL command is complete in accordance with an implementation of the disclosure. At step 524, , the method further includes, in each node not designated as the master node, sending an acknowledgment to the rewriter of the master node. At step 526, the method includes, in the node designated as the master node, notifying the rewriter of the node which received the DDL command from the client device after the acknowledgment is received from each other node.

In an implementation, a computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method. FIG. 6 is an illustration of an exemplary data management device, a distributed data storage, or a computer system 600 in which the various architectures and functionalities of the various previous implementations may be implemented. As shown, the computer system 600 includes at least one processor 604 that is connected to a bus 602, wherein the computer system 600 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), Hyper Transport, or any other bus or point-to- point communication protocol (s). The computer system 600 also includes a memory 606.

Control logic (software) and data are stored in the memory 606 which may take a form of random-access memory (RAM). In the disclosure, a single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be noted that the term single semiconductor platform may also refer to multi-chip modules with increased connectivity which simulate on-chip modules with increased connectivity which simulate on- chip operation, and make substantial improvements over utilizing a conventional central processing unit (CPU) and bus implementation Of course, the various modules may also be situated separately or in various combinations of semiconductor platforms per the desires of the user.

The computer system 600 may also include a secondary storage 610. The secondary storage 610 includes, for example, a hard disk drive and a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, digital versatile disk (DVD) drive, recording device, universal serial bus (USB) flash memory. The removable storage drive at least one of reads from and writes to a removable storage unit in a well-known manner.

Computer programs, or computer control logic algorithms, may be stored in at least one of the memory 606 and the secondary storage 610 Such computer programs, when executed, enable the computer system 600 to perform various functions as described in the foregoing. The memory 606, the secondary storage 610, and any other storage are possible examples of computer-readable media.

In an implementation, the architectures and functionalities depicted in the various previous figures may be implemented in the context of the processor 604, a graphics processor coupled to a communication interface 612, an integrated circuit (not shown) that is capable of at least a portion of the capabilities of both the processor 604 and a graphics processor, a chipset (namely, a group of integrated circuits designed to work and sold as a unit for performing related functions, and so forth).

Furthermore, the architectures and functionalities depicted in the various previous-described figures may be implemented in a context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application-specific system. For example, the computer system 600 may take the form of a desktop computer, a laptop computer, a server, a workstation, a game console, an embedded system.

Furthermore, the computer system 600 may take the form of various other devices including, but not limited to a personal digital assistant (PDA) device, a mobile phone device, a smart phone, a television, and so forth. Additionally, although not shown, the computer system 600 may be coupled to a network (for example, a telecommunications network, a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, a peer-to-peer network, a cable network, or the like) for communication purposes through an I/O interface 608.

It should be understood that the arrangement of components illustrated in the figures described are exemplary and that other arrangement may be possible. It should also be understood that the various system components (and means) defined by the claims, described below, and illustrated in the various block diagrams represent components in some systems configured according to the subject matter disclosed herein. For example, one or more of these system components (and means) may be realized, in whole or in part, by at least some of the components illustrated in the arrangements illustrated in the described figures.

In addition, while at least one of these components are implemented at least partially as an electronic hardware component, and therefore constitutes a machine, the other components may be implemented in software that when included in an execution environment constitutes a machine, hardware, or a combination of software and hardware.

Although the disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions, and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims.