Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR PROCESSING DATA IN STORAGE SYSTEMS
Document Type and Number:
WIPO Patent Application WO/2016/042090
Kind Code:
A1
Abstract:
A class of product-matrix regenerating codes for fault- tolerance protection in a distributed storage system is provided, designed for enhanced network bandwidth and computational performance. The codes are based on the Cauchy matrix and their equivalent systematic codes are sparse, allowing throughputs compatible with current practical deployments of erasure codes while presenting significant improvement in network bandwidth (network related repair cost). Methods and apparatuses for encoding, regenerating and decoding data are also provided utilizing the codes.

Inventors:
LE SCOUARNEC NICOLAS (FR)
Application Number:
PCT/EP2015/071353
Publication Date:
March 24, 2016
Filing Date:
September 17, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THOMSON LICENSING (FR)
International Classes:
H03M13/13; G06F11/08; G06F11/10; H03M13/37
Other References:
RASHMI K V ET AL: "Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction", IEEE TRANSACTIONS ON INFORMATION THEORY, IEEE PRESS, USA, vol. 57, no. 8, 1 August 2011 (2011-08-01), pages 5227 - 5239, XP002669405, ISSN: 0018-9448, DOI: 10.1109/TIT.2011.2159049
Attorney, Agent or Firm:
HUCHET, Anne et al. (Issy-Les-Moulineaux, FR)
Download PDF:
Claims:
CLAIMS

A method of encoding data comprising:

accessing (710) a message matrix M of dimension d X a comprising a data message, said data message comprises kfX symbols to be distributed over k nodes in a distributed storage system comprising a plurality of storage nodes;

accessing (720) an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix, said encoding matrix having an equivalent systematic linear generator matrix that is sparse; and encoding (730) the data message according to said message matrix M and said encoding matrix Ψ, wherein the encoded data C is C = ΨΜ and is distributed over 71 nodes, and ΨΜ is a matrix product. The method of claim 1, wherein said encoding matrix Ψ comprises:

Ψ = [Φ ΛΦ], Ψ is a 71 X d matrix, Φ is a 71 X a matrix and Λ is a 71 X 71 diagonal matrix,

CLi = gi+a bj = g]~

and

g is a generator element of a finite field.

3. A method of encoding data comprising:

accessing (750) a data message X comprising ka symbols to be distributed over k nodes in a distributed storage system comprising a plurality of storage nodes; accessing (760) a systematic linear generator matrix G , wherein said generator matrix is sparse and is equivalent to an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix; and

encoding (770) the data message according to said systematic linear generator matrix G', wherein the encoded data Y is Y = G'X, wherein G'X is distributed over 71 nodes, and G'X is a matrix product.

4. A method of encoding data comprising:

accessing (810) a data message X comprising ka symbols to be distributed over k nodes in a distributed storage system comprising a plurality of storage nodes; accessing (820) an index matrix L of dimension 2a X a;

accessing (830) an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix, said encoding matrix having an equivalent systematic linear generator matrix that is sparse;

transforming (840) said encoding matrix Ψ (840) into a linear generator matrix G ; and

encoding (860) the data message according to said linear generator matrix G, wherein the encoded data Y is Y = GX, wherein GX is distributed over 71 nodes, and GX is a matrix product.

5. The method of claim 4, wherein said linear generator matrix G comprises:

if there exists / such that /' = Li

0, otherwise

wherein L is an index matrix of dimension 2a X a.

6. The method of claim 4 or 5 further comprising:

transforming (850) said generator matrix G into an equivalent systematic matrix prior to encoding said data message.

A method of regenerating data comprising:

accessing (910) an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix, said encoding matrix having an equivalent systematic linear generator matrix that is sparse;

receiving (920) at least d data values from at least d nodes in a distributed storage system comprising a plurality of storage nodes; and

regenerating (930) a failed storage node in said distributed storage system based on said data values and said encoding matrix Ψ.

The method of claim 7, wherein the data is encoded using an equivalent linear systematic generator matrix derived from said encoding matrix Ψ.

The method of claim 7 or 8, wherein said regenerating further comprises:

grouping (032) said data values into a data vector (VR1);

determining (934) an intermediate matrix (VR2 ) from said data vector; and determining (936) the data stored in the failed storage node from said intermediate matrix, wherein said data comprises a symbols.

10. A method of decoding data comprising:

receiving (1010) a encoding matrix Ψ for a (71, k, d, a) for a product matrix minimum storage regenerating code based on the Cauchy matrix, said encoding matrix having an equivalent systematic linear generator matrix that is sparse;

receiving (1020) at least k data vectors from at least k nodes; and decoding (1030) the data of the 71 storage nodes in said distributed storage system based on said data vectors and said encoding matrix Ψ.

11. The method of claim 10 further comprising:

post-processing (1040) the decoded data to recover an original data, said postprocessing comprising applying the inverse of a systematic transformation, wherein said original data was pre-processed by said systematic transformation prior to encoding said original data.

12. The method of claim 11 or 12, wherein said decoding further comprises:

grouping (1032) said data vectors into a data matrix (VD1^ ; determining (1034) an intermediate matrix (VD2^ from said data vector; determining (1036) the symbols of data sub-matrices (P and Q) from said intermediate matrix; and

determining (1038) a message matrix M of the distributed storage system from the data sub-matrices and said encoding matrix Ψ.

13. An apparatus for encoding data comprising a processor (1110) that communicates with at least one input/output interface (1120); and at least one memory (1130, 1140) in signal communication with said processor, said processor being configured to perform the method according to any of the claims 1 to 2

14. An apparatus for encoding data comprising a processor (1110) that communicates with at least one input/output interface (1120); and at least one memory (1130, 1140) in signal communication with said processor, said processor being configured to perform the method of claim 3.

15. An apparatus for encoding data comprising a processor (1110) that communicates with at least one input/output interface (1120); and at least one memory (1130, 1140) in signal communication with said processor, said processor being configured to perform the method according to any of the claims 4 to 6.

16. An apparatus for regenerating data comprising a processor (1110), for receiving at least one input/output interface (1120); and at least one memory (1130, 1140) in signal communication with said processor, said processor being configured to perform the method of any of the claims 7 to 9.

17. An apparatus for decoding data comprising a processor (1110), for receiving at least one input/output interface (1120); and at least one memory in signal communication with said processor (1130, 1140), said processor being configured to perform the method according to any of the claims 10 to 12.

18. A computer readable memory comprising a product matrix minimum storage regenerating code for providing fault-tolerance in a distributed storage system, said code comprising an encoding matrix Ψ based on the Cauchy matrix, said encoding matrix having an equivalent systematic linear generator matrix that is sparse.

19. A computer readable memory comprising a code for providing fault- tolerance in a distributed storage system, said code comprising a linear generator matrix G characterized by:

if there exists / such that /' = LJJ

0, otherwise

wherein L is an index matrix of dimension 2a X a and Ψ is an encoding matrix of a product matrix minimum storage regenerating code based on the Cauchy matrix, said encoding matrix having an equivalent systematic linear generator matrix that is sparse.

20. A computer readable memory comprising a code for providing fault- tolerance in a distributed storage system, said code comprising a systematic linear generator matrix G', wherein G' is a systematic transformation of a linear generator matrix G characterized by:

ίΨϊ, if there exists / such that /' = LJJ

AI+J'1 0, otherwise

wherein L is an index matrix of dimension 2a X a and Ψ is an encoding matrix of a product matrix minimum storage regenerating code based on the Cauchy matrix, wherein said systematic linear generator matrix is sparse.

Description:
METHOD AND APPARATUS FOR PROCESSING DATA IN STORAGE SYSTEMS

TECHNICAL FIELD

[0001] The present principles relate to fault tolerance in storage systems and in particular to product- matrix regenerating codes.

BACKGROUND

[0002] The increasing needs for storage in data centers pushed by numerous cloud-based storage services has led to the development of distributed storage systems with a better tradeoff between reliability and storage. The purpose of distributed storage systems is to store data reliably over long periods of time using a distributed collection of storage nodes, typically employing commodity components for cost considerations. These individual components are generally unreliable, and as a result, the system must deal with frequent failures of these components. In addition, various additional issues such as software glitches, machine reboots and maintenance operations also contribute to machines being rendered unavailable from time to time. Applications involve storage in large data centers and peer-to- peer storage systems which use nodes across the Internet for distributed file storage. In wireless sensor networks, obtaining reliable storage over unreliable motes might be desirable for robust data recovery, especially in catastrophic scenarios.

[0003] In all these scenarios, in order to ensure that the data remains reliable and available even in the presence of frequent machine unavailability, the introduction of redundancy is required. The simplest form of redundancy is replication, which is adopted in many practical storage systems, with data being replicated across multiple machines, typically across multiple racks as well. For instance, the Google File System and the Hadoop Distributed File System (HDFS) store three copies of all data by default. Although disk storage seems cheap for small amounts of data, the massive scales of operation of today's data-centers make storing multiple copies an expensive option. As a result, several large-scale distributed storage systems now employ erasure codes that provide significantly higher storage efficiency, with the most popular choice being the Reed-Solomon (RS) code.

[0004] As a generalization of replication, erasure correcting codes offer better storage efficiency. For instance, one can divide a file of size B (representing the data input or message) into k pieces (to be called fragments), each of size B/k, encode them into ftencoded fragments (of the same size) using a (ft, /c) maximum distance separable (MDS) code (e.g., Reed-Solomon codes), and store them at 71 nodes. Then, the original file can be recovered from any set of k coded fragments. This performance is optimal in terms of the redundancy-reliability tradeoff because k pieces, each of size B /k, provide the minimum data for recovering the file, which is of size B . It is therefore common to use erasure codes instead of replication. For certain cases, erasure coding can achieve orders of magnitude higher reliability for the same redundancy factor compared to replication.

[0005] However, in distributed storage systems, redundancy must be continually refreshed as nodes fail or leave the system, which involves large data transfers across the network. While erasure codes allow decreasing storage costs, they generally come with higher costs in term of network, I/O (Input/Output) or CPU (Computer Processing Unit) requirements and thus, hardware cost and power consumption. With Reed-Solomon (RS) codes, B is measured in terms of symbols over a finite field Fq of size q, and data is stored across 71 nodes in the network in such a way that the entire message or file can be recovered by a data collector by connecting to any k nodes, a process commonly referred to as data-reconstruction or decoding. Several distributed storage systems such as RAID-6, OceanStore and Total Recall employ such an erasure coding technique. This implies that, in general, if an object of size B is divided in k initial fragments, the repair bandwidth with this strategy is B to generate a fragment of size B /k . In contrast, if replication is used instead, a new replica may simply be copied from any other existing node, incurring no bandwidth overhead at the expense of higher storage. It was commonly believed that this k -factor overhead in repair bandwidth was an unavoidable disadvantage that came with the benefits of erasure coding.

[0006] In order to alleviate these various drawbacks resulting from a decreased storage, optimized codes have been designed, particularly:

• Locally repairable codes minimize I/O related costs, or access efficiency;

· XOR-based and some matrix -based erasure codes minimize CPU or computational related costs, also measured in terms of throughput; and

• Regenerating codes minimize network-related costs, that is, the amount of network access and data transfer needed. In fact, regenerating codes offer the best tradeoff between network and storage costs.

[0007] Regenerating codes were introduced by A.G. Dimakis et al. in their pioneering paper entitled "Network Coding for Distributed Storage Systems", published on IEEE Transactions on Information Theory, vol. 56, September 2010, pp. 4539-4551. The principle behind regenerating codes is that downloading the entire B units of data (or k nodes) in order to recover the data stored in a single node which stores only a fraction (B /k) of the entire message is wasteful. Conventional RS codes treat each fragment B /k) stored in a node as a single symbol belonging to the finite field Fq . It can be shown that when individual nodes are restricted to perform only linear operations over Fq , the total amount of data download needed to repair a failed node, can be no smaller than B, the size of the entire file or message. In contrast, regenerating codes are codes over a vector alphabet and hence treat each fragment as being comprised of a symbols over the finite field Fq . Linear operations over Fq , in this case, permit the transfer of a fraction of the data stored at a particular node. Apart from this new parameter a, two other parameters d and β are associated with regenerating codes. Under Dimakis' definition of regenerating codes, a failed node is permitted to connect to an arbitrary set of d of the remaining nodes while downloading β≤ a symbols from each node. This process is termed as regeneration and the total amount d X β of data downloaded for repair purposes is the repair bandwidth. Further, the set of d nodes aiding in the repair are termed as helper nodes. Typically, with a regenerating code, the average repair bandwidth is small compared to the size of the file B .

[0008] Regenerating codes achieve the optimal tradeoff between storage and bandwidth with two main variants: MSR (Minimum Storage Regenerating) codes that favor storage, and MBR (Minimum Bandwidth Regenerating) codes that favor bandwidth. MSR codes encode kA blocks to TiOL blocks, wherein each device stores OL blocks and a = A = d — k + 1. During repair, d blocks are transferred over the network. For d = 2k— 2, then a = d/2. MBR codes are similar but have a = d and Δ = (2d— k + V) /2, thus resulting in higher storage costs but lower network bandwidth during repairs.

[0009] Product-matrix regenerating codes were proposed by R.V. Rashmi et al on their seminal paper entitled "Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction", published on IEEE Transactions on Information Theory, vol. 57, 2011, pp. 5227-5239. Product-matrix regenerating codes rely on specific encoding and decoding algorithms and are not defined using a generator matrix as usual linear codes are. The kA original blocks of X are arranged (with repetitions „

4 and zero's) into a message matrix M of dimension d X a in a way specific to MSR or MBR. The code is defined by its encoding matrix Ψ of dimension 71 X d.

[0010] Regenerating codes have been mainly studied with respect to either theoretical aspects (i.e., existence) or cost in bandwidth. Beside these studies, a few studies have looked at system aspects of regenerating codes, including encoding/decoding throughput (i.e., CPU cost). The throughputs reported are 50KBps (k = 32, d = 63), 0.6 MBps (k = 16, d = 30) and even 100 MBps (k = 2, d = 3) in various publications, while RS codes encode at rates of about 1640 MBps. So far, reported speeds, except for very low values of k, are incompatible with practical deployments.

[0011] It is, therefore, of interest, to propose regenerating codes which permit higher throughput compatible with current practical implementations. The present principles provide product-matrix regenerating codes designed for enhanced computational or throughput performance. SUMMARY

[0012] The present principles provide a class of product-matrix (PM) regenerating codes named sparse-PM regenerating codes for fault-tolerance protection in a distributed storage system, designed for enhanced network bandwidth and computational performance. The codes are based on the Cauchy matrix and their equivalent systematic codes are sparse.

[0013] According to one embodiment of the present principles, a method of encoding data is provided including: accessing (710) a message matrix M of dimension d X a including a data message, wherein the data message includes kfX symbols to be distributed over k nodes in a distributed storage system including a plurality of storage nodes; accessing (720) an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; and encoding (730) the data message according to the message matrix M and the encoding matrix Ψ, wherein the encoded data C is C = ΨΜ and is distributed over 71 nodes, and ΨΜ is a matrix product.

[0014] According to one embodiment of the present principles, the encoding matrix Ψ includes: Ψ = [Φ ΛΦ], Ψ is a 71 X d matrix, Φ is a 71 X a matrix and Λ is a 71 X 71 diagonal matrix,

di = g i+a

bj = 9>- and

g is a generator element of a finite field.

[0015] According to one embodiment of the present principles, a method of encoding data is provided including: accessing (750) a data message X including ka symbols to be distributed over k nodes in a distributed storage system including a plurality of storage nodes; accessing (760) a systematic linear generator matrix G', wherein the generator matrix is sparse and is equivalent to an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix; and encoding (770) the data message according to the systematic linear generator matrix G' of the regenerating code, wherein the encoded data Y is Y = G'X and is distributed over 71 nodes, and G'X is a matrix product.

[0016] According to one embodiment of the present principles, a method of encoding data is provided including: accessing (810) a data message X including ka symbols to be distributed over k nodes in a distributed storage system including a plurality of storage nodes; accessing (820) an index matrix L of dimension 2a X a; accessing (830) an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; transforming (840) the encoding matrix Ψ (840) into a linear generator matrix G; and encoding (860) the data message according to the linear generator matrix G of the regenerating code, wherein the encoded data Y is Y = GX and is distributed over 71 nodes, and GX is a matrix product. [0017] According to one embodiment of the present principles, the linear generator matrix G includes:

wherein L is an index matrix of dimension 2a X a.

[0018] According to one embodiment of the present principles, the method further includes: transforming (850) the generator matrix G into an equivalent systematic matrix prior to encoding the data message.

[0019] According to one embodiment of the present principles, a method of regenerating data is provided including: accessing (910) an encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; receiving (920) at least d data values from at least d nodes in a distributed storage system including a plurality of storage nodes; and regenerating (930) a failed storage node in the distributed storage system based on the data values and the encoding matrix Ψ.

[0020] According to an embodiment of the present principles, the data is encoded using an equivalent linear systematic generator matrix derived from the encoding matrix Ψ.

[0021] According to an embodiment of the present principles, regenerating further includes: grouping (932) the data values into a data vector (V R1 ); determining (934) an intermediate matrix (V R2 ) from the data vector; and determining (936) the data stored in the failed storage node from the intermediate matrix, wherein the data includes a symbols.

[0022] According to one embodiment of the present principles, a method of decoding data is provided including: receiving (1010) a encoding matrix Ψ for a (71, k, d, a) product matrix minimum storage regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; receiving (1020) at least k data vectors from at least k nodes; and decoding (1030) the data of the 71 storage nodes in the distributed storage system based on the data vectors and the encoding matrix Ψ.

[0023] According to one embodiment of the present principles, the method of decoding data further includes: post-processing (1040) the decoded data resulting from the decoding to recover an original data, the post-processing including applying the inverse of a systematic transformation, wherein the original data was pre-processed by the systematic transformation prior to encoding the original data.

[0024] According to one embodiment of the present principles, decoding data further includes: grouping (1032) the data vectors into a data matrix (V D1 y, determining (1034) an intermediate matrix (V D2 ^ from the data vector; determining (1036) the symbols of data sub- matrices (P and Q) from the intermediate matrix; and determining (1038) a message matrix M of the distributed storage system from the data sub-matrices and the encoding matrix Ψ.

[0025] According to one embodiment of the present principles, an apparatus (1100) for encoding data is provided including a processor (11 10) that communicates with at least one input/output interface (1120); and at least one memory (1130, 1140) in signal communication with the processor, the processor being configured to perform any of the methods of encoding data.

[0026] According to one embodiment of the present principles, an apparatus (1100) for regenerating data is provided including a processor (1110) that communicates with at least one input/output interface (1120); and at least one memory (1130, 1140) in signal communication with the processor, the processor being configured to perform any of the methods of regenerating data.

[0027] According to one embodiment of the present principles, an apparatus (1100) for decoding data is provided including a processor (11 10) that communicates with at least one input/output interface (1120); and at least one memory (1130, 1140) in signal communication with the processor, the processor being configured to perform any of the methods of decoding data.

[0028] According to one embodiment of the present principles, a computer readable memory is provided including a product matrix minimum storage regenerating code for providing fault-tolerance in a distributed storage system, the code including an encoding matrix Ψ based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse.

[0029] According to one embodiment of the present principles, a computer readable memory is provided including a code for providing fault-tolerance in a distributed storage system, the code including a linear generator matrix G characterized by: if there exists / such that /' = LJJ

0, otherwise

wherein L is an index matrix of dimension 2a X a and Ψ is an encoding matrix of a product matrix minimum storage regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse.

[0030] According to one embodiment of the present principles, a computer readable memory is provided, including a code for providing fault-tolerance in a distributed storage system, the code including a systematic generator matrix G', wherein G' is a systematic transformation of a linear generator matrix G characterized by:

if there exists / such that /' = LJJ

0, otherwise

wherein L is an index matrix of dimension 2a X a and Ψ is an encoding matrix of a product matrix minimum storage regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse.

[0031 ] Additional features and advantages of the present principles will be made apparent from the following detailed description of illustrative embodiments which proceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0032] The present principles may be better understood in accordance with the following exemplary figures briefly described below:

Figure 1 illustrates the trade-off curve of regenerating codes;

Figure 2 (A, B, C, D) illustrates the evaluation of encoding performance for various implementations of MSR Product-Matrix codes and RS codes according to the present principles: A - Throughput for non-systematic codes; B - Throughput for systematic codes; C - Initialization for non- systematic codes and D - Initialization for systematic codes;

Figure 3 (A, B) illustrates the evaluation of the impact of optimizations for encoding systematic product- matrix codes according to the present principles: A - Specific and linearized Vanilla codes and B - Specific and linearized Sparse-PM codes;

Figure 4 (A, B) illustrates the evaluation of the impact of block size on encoding performance for various implementations of MSR Product-Matrix codes and RS codes according to the present principles: A - Non- systematic codes and B - Systematic codes;

Figure 5 (A, B) illustrates the evaluation of regenerating performance for various implementations of MSR Product-Matrix codes and RS codes according to the present principles: A - Throughput for non- systematic codes; and B - Initialization for non- systematic codes;

Figure 6 (A, B, C, D) illustrates the evaluation of decoding performance for various implementations of MSR Product-Matrix codes and RS codes according to the present principles: A - Throughput for non-systematic codes; B - Throughput for systematic codes; C - Initialization for non- systematic codes and D - Initialization for systematic codes;

Figure 7 A and B each illustrates a flowchart of a method of encoding data for fault- tolerance in a distributed storage system according to the present principles;

Figure 8 illustrates a flowchart of a method of encoding data for fault-tolerance in a distributed storage system according to the present principles;

Figure 9 illustrates a flowchart of a method of regenerating or repairing data for fault- tolerance in a distributed storage system according to the present principles;

Figure 10 illustrates a flowchart of a method of decoding data for fault- tolerance in a distributed storage system according to the present principles; and

Figure 11 illustrates a block diagram of a computing environment within which the methods of the present principles may be implemented and executed.

DETAILED DISCUSSION OF THE EMBODIMENTS

[0033] The present principles relate to fault tolerance (data encoding, regeneration and correction) in storage systems and in particular to product-matrix regenerating codes. Other than the inventive concept, several elements hereby discussed are well known and will not be described in detail. For example, other than the inventive concept, familiarity with error correcting codes, erasure codes, Reed-Solomon codes, regenerating codes and product-matrix codes is assumed and not described herein in detail. It should also be noted that the inventive concept may be implemented using conventional programming techniques, which, as such, will not be described herein. Throughout this specification, the terms original data, input data, input message, message data, data input or data message are used interchangeably.

[0034] To provide high storage capacity, large-scale distributed storage systems have been widely deployed in enterprises such as Google File System, Amazon Dynamo and Microsoft Azure. In such systems, data is striped across multiple nodes that offer local storage space. Nodes are interconnected over a networked environment, in the form of either clustered or wide-area settings, with wired, wireless connections or a combination of both. Ensuring data availability in distributed storage systems is critical, given that node failures are prevalent.

[0035] Traditional distributed storage systems support failures of individual devices by the use of replication or erasure correcting codes. An erasure correcting code encodes k blocks of original data (column vector X) to n blocks of encoded data (column vector Y); so that the k original blocks can be recovered from any k encoded blocks. The code can be defined by its generator matrix G of dimension 71 X k. The encoding operation is then mathematically expressed as Y = GX; and the decoding operation as X = G ~1 Y where G (respectively Y) are the rows of G (respectively Y) that correspond to the remaining encoded blocks. Each device or node in the distributed storage system stores one block of data.

[0036] A key feature of distributed storage systems is their ability to self-repair {i.e., to recreate lost encoded blocks whenever some device fails). Erasure correcting codes support such repair by decoding k blocks and encoding them again. This implies the transfer of k blocks to recreate a single lost block, leading to high network-related repair cost. In fact, while erasure correcting codes offer better storage efficiency than replication for similar fault tolerance, they incur higher CPU consumption, higher network consumption and higher disk I/Os. To address these issues, codes specific to storage systems have been designed. Their main feature is the ability to repair a single lost disk efficiently.

[0037] Regenerating codes have been designed as codes offering an optimal tradeoff between storage and network bandwidth {i.e., network-related repair cost). While the original work relied on randomized codes construction, it has been followed by numerous studies on how to build such codes including product-matrix regenerating codes which are applicable to a wide set of parameters.

[0038] Regenerating codes achieve the optimal tradeoff between storage and bandwidth with two main variants: MSR codes which favor storage, and MBR codes which favor bandwidth. Figure 1 shows the trade-off curve of storage versus repair bandwidth with the two extreme cases. MSR codes encode ka blocks to na blocks, wherein each device stores a blocks and a = A = d— k + 1. During repair, d blocks are transferred over the network. MBR codes are similar but have a = d and Δ = (2d— k + V) /2, thus resulting in higher storage costs but lower network bandwidth during repairs. [0039] Regenerating codes can be implemented using linear codes. Similarly to regular erasure correcting codes, the code can be defined by its generator matrix G of dimension TiOL X kA. The encoding operation is then expressed as Y = GX (where X is a column vector of kA blocks and Y is a column vector of TiOL blocks); and the decoding operation as X = G ~1 Y where G (respectively Y) are the rows of G (respectively Y) that correspond to the remaining encoded blocks. The factor OL in dimensions comes from the fact that each device stores a instead of 1 block for erasure correcting codes. This allows devices using regenerating codes to repair a blocks by downloading 1 block from d devices thus saving bandwidth when compared to erasure correcting codes that repair 1 block by downloading 1 block from k devices. As a consequence, the encoding and decoding complexity come with an additional factor a A. Given that in a typical setting a oc k and Δ o k, the encoding and decoding complexity, which is Ω(/ί 2 ) for erasure correcting codes, become Ω(/ί 4 ) for regenerating codes.

[0040] Product-matrix regenerating codes rely on specific encoding and decoding algorithms and are not defined using a generator matrix G as usual linear codes. The kA original blocks of X are arranged (with repetitions and zeros) into a message matrix M of dimension d X a in a way specific to MSR or MBR. The code is defined by its encoding matrix Ψ of dimension 71 X d, whose I th row ij f is the named the encoding vector of node i, and the superscript t represents the transpose operation. The encoding vector is used to encode the message into the form in which it is stored within the 'th

I node. The encoding operation is thus defined as C = ΨΜ such that C is a 71 X OL code matrix containing the TiOL encoded blocks, whose 'th t stored by the ' th t

I row Q contains the CL symbols I node, that is, j = xJjfM.

[0041] The common structure of the code matrices leads to common architectures for both data-reconstruction and exact-regeneration. Data-reconstruction or decoding amounts to recovering the message matrix M from the k symbols obtained from an arbitrary set of k storage nodes. By denoting the set of k nodes to which the data collector connects as I2,■■■ , ife}, the j ttl node in this set passes on the message vector ijjj j M to the data collector. The data collector thus obtains the product matrix ¥ DC M, where Ψ /j c is the sub- matrix of Ψ consisting of the k rows I2, ... , i/J. It then uses the properties of the matrices Ψ and M to recover the message. The precise procedure for recovering M is a function of the particular construction of the type of product-matrix regenerating code used (MSR or MBR) and are described in the seminal paper by Rashmi et al.

[0042] As noted above, each node in the network is associated to a distinct d X 1 encoding vector In the regeneration process, one needs to call upon a related vector j of length a that contains a subset of the components of To regenerate a failed node , the node replacing the failed node connects to an arbitrary subset {hi, h2, ... , h^}, of d storage nodes which we will refer to as the d helper nodes. Each helper node passes on the inner product of the a symbols stored in it with , to the replacement node: the helper node hj passes l/jfr M[ f . The replacement node thus obtains the product matrix where

^repair is the sub-matrix of Ψ consisting of the d rows {hi, l2>■■■ , h^}. From this it turns out that one can recover the desired symbols. Here again, the precise procedure is dependent on the particular construction.

Systematic Codes

[0043] A key feature of codes for storage, being erasure correcting codes or regenerating codes, is their ability to be transformed into systematic codes. A systematic erasure correcting code (respectively, systematic regenerating code) is a code such that the k (respectively, kA) first blocks of Y are equal to the original input (or message) data X. Systematic codes have two main advantages:

• Accessing data does not require decoding provided that none of the k first devices has failed thus enhancing the performance of the common case. Even if the system is designed to tolerate failures, most of the time, the device storing the data accessed is likely to be available; and

• Encoding tends to be less costly as we only need to compute the 71— k (respectively, ICX— kA) last rows of Y, the k (respectively, kA) first being equal to X by construction.

[0044] The construction of systematic codes is based on the fact that encoding and decoding operations are commutative. Hence, one can pre-process the original data X by applying the decoding to the original data X to obtain precoded data Z and then encoding Z to obtain Y such that Y[i ___] a] = r li near codes, the encoding is thus defined as Y = GG ~1 X, wherein G corresponds to the k (respectively, kA) first rows of G and 6r 1 is hereby referred to as a systematic transformation. Thus, the resulting systematic code is the result of applying a systematic transformation to the original generator matrix G and has a generator matrix G' = GG -1 , such that G ' = [/ where / is the identity matrix of order k (respectively, kA).. By construction, G ' inherits appropriate properties {e.g. , all k X k (respectively, kfX X ko ) sub-matrices are full-rank) from G ensuring that it is an appropriate encoding matrix. An alternative could be to choose G " such that G ' = [/ has the needed properties.

[0045] A skilled artisan will appreciate that decoding the systematic codes described above may be accomplished by applying the inverse of the systematic generator matrix G' . Equivalently decoding may be accomplished by obtaining the inverse of the generator matrix G to recover Z, followed by a post-processing step of applying the inverse systematic transformation G to Z in order to recover the original input or message data X.

[0046] While building G' directly is possible for some codes, this is not easy for all codes. For MBR product-matrix codes, since the matrix Ψ is less constrained, Rashmi et al. give a direct construction for a systematic encoding matrix Ψ = [/ ψ"*·]* · thus leading to an efficient implementation. However, for MSR product-matrix regenerating codes, matrix Ψ (from which G can be derived) must satisfy several constraints to allow efficient repair. Hence, Rashmi et al. build systematic codes by successively decoding and encoding, that is, the message matrix M must be transformed in a similar way as in the linear systematic codes described above. The message data is first pre-decoded such that, when encoding it, the end result will be a systematic code. However, such indirect construction has a significant impact on the computing cost and optimization beyond this construction is highly desirable.

[0047] Regenerating codes have been mainly studied for their saving in bandwidth, leaving aside computational performance. The prior art has implemented and measured the performance of various regenerating codes such as randomized codes achieving throughputs around 50KBps using a pure Java-based implementation {k = 32, d = 63), 100 MBps for very small codes {k = 2, d = 3). Implementations of deterministic codes have also been ,„

14 reported which achieve relatively higher speed such as 0.6 MBps (k = 16, d = 30) using a pure Java-based implementation. These reported encoding throughputs are a factor that is likely to limit the deployment of regenerating codes, since saving network bandwidth requires a significant increase in processing costs.

[0048] The present principles provide techniques and optimizations for implementing fast product matrix regenerating codes, therefore focusing on computational costs.

Linearization of Product- Matrix Codes

[0049] Regarding the implementation of erasure correcting codes, highly optimized libraries have been recently released and provide fast arithmetic on Galois Field, namely GF- Complete or linear codes, namely Jerasure 2.0. These two libraries are of interest for the implementation of product- matrix regenerating codes. Being deterministic codes, small finite fields (e.g., 6r (2 8 )) are sufficient for product-matrix codes in typical settings (e.g. , k =

8); as a consequence the libraries can leverage the split- table technique [relying on SIMD (Single Structure Multiple Data) for efficient multiplications in Galois Field.

[0050] When implementing product matrix regenerating codes, two alternatives are possible, namely:

• Applying the algorithms described by Rashmi et. al or

• Transforming them to linear codes and using generic algorithms for linear codes such as the ones implemented in Jerasure.

[0051 ] A linear code (n, k) is generally defined by Y = GX, where G is the generator matrix of dimension 71 X k, X is the input or data message of dimension k X 1 and Y is the encoded output of dimension 71 X 1. In order to obtain the linearized version of the product matrix code, one needs to create a TiOL X kA generator matrix G for a given code. According to the present principles, linearization may be accomplished as described below.

[0052] First, construct an index matrix L according to the definition of the message matrix M, so that Mi = Xn j i since the message matrix M is a transformation of the data message X.

Alternatively, a skilled artisan may generate L as the concatenation of two a X a symmetric matrices containing an arrangement of integers from 1 to ka. For example, for an MSR code with k = 3, a = A = 2, the index matrix L of dimension 2a X a may be: 1 2 4

L = (1)

2 3 5 6 J

[0053] Permutations of this case are also possible with similar results. For example, if 4 and 2, as well as 3 and 5 are exchanged the index matrix will be:

1 4 2 3

L =

4 5 3 6J (2)

[0054] Encoding by the product- matrix regenerating code is defined as C = ΨΜ. For a linear regenerating code defined as Y = GX, the element j and the corresponding element Y a i+j are computed using: ¾¾ j Yai+j -∑ 'ii G ai+j,l' ( 3 )

[0055] For all i,j, we have = Y a i+j .Applying a change of variable /' = LJJ, and noticing that:

(i) By construction of M and L, no row nor column of L has duplicate values;

(ii) The equality can hold only if G ai+ j ^ — 0 for any not present on the left- hand side, resulting in:

∑l=l ^ l^Li -∑l=l G ai+j,Li ^Li ( 4 )

[0056] This implies that the generator matrix G as described below is equivalent to a product matrix code (MSR or MBR) defined by Ψ and L .

_ (Wj i if there exists / such that /' = L i ;

G ai+ i l' = \ n (5)

v 0, otherwise where l≤ i≤n, 1≤ I≤ d (= 2a) , 1 < < a, 1 < ai + j≤ na, 1≤l ≤ ka.

[0057] As it will be later explained, there are additional benefits for linearizing product matrix codes since it also leads to significant improvement when using systematic codes. In addition, linearization allows for: (i) re-using libraries designed for regular erasure codes; and (ii) simpler zero-copy/lazy implementation (i.e., decoding only lost blocks) as it is a single step transformation. Sparse-PM Codes

For a general PM MSR regenerating code with d = 2k— 2, at the MSR point, a = d— k + 1 = k— 1 and hence, d = 2a and B = ka = a(a + 1). This particular setting is the most interesting since one generally wants the lowest possible n (lowest amount of redundancy which is still high when 71 = 2k— 1 and d = 2k— 2 ), and MSR codes are particularly interesting since they have properties close to traditional error correcting codes except for the lower repair bandwidth. Rashmi et al. describes the following constraints on the encoding matrix Ψ for (71, k, d, a) MSR codes with d = 2k— 2:

a) Ψ = [Φ ΛΦ], where Φ is a 71 X a matrix and Λ is a 71 X 71 diagonal matrix. b) Any d rows of Ψ are linearly independent.

c) Any a rows of Φ are linearly independent.

d) All values of Λ are distinct.

[0058] The suggested construction is to take Ψ as a Vandermonde matrix (i.e., = ^(ί _ 1 )0 _ 1 )) w ich satisfies all needed properties. For example, if 71 = S, k = 3, d = 4, a = 2 and the finite field used has a generator element g, this gives the following matrices:

[0059] Interestingly, such construction based on dense encoding matrix Ψ leads to sparse generator matrix G as shown on Table 1 (e.g., G contains 75% of zeros for k = 8). Sparsity is important as multiplications by zero can be skipped resulting in lower computational costs. However, when the code is turned into systematic form, sparsity is lost and the lower part of the resulting generator G" contains 0% of zeroes.

[0060] In order to reduce the computational cost of product-matrix MSR codes, the European patent application Serial No. 14306444.2 proposed an alternative construction which satisfies the conditions aforementioned but gives slightly sparser Ψ and G and much sparser systematic generator G" . It takes Φ as a Vandermonde matrix {i.e., Oj j = ^ - 1 )^ '- 1 )) but sets Oj j = 0 for all i≠ j, i≤ a. A is left unchanged with = g (l~1 . For n =

S, k = 3, d = 4, a = 2 and a finite field havinga generator element g, this gives the following matrices:

[0061] One with skill in the art may observe (or computationally check) that the above conditions a-d on Φ, Ψ and Λ are still satisfied, so that the resulting codes, hereby called Sparse-PM regenerating codes are valid product- matrix MSR codes. The advantage of such codes can be better observed when one generates their linear and/or systematic equivalents, as previously described. Table 1 below shows the percentage of 0 of the corresponding linear generator matrix {G for non-systematic and G" for systematic) of the original PM (or vanilla) and the sparse-PM regenerating codes. The percentage of zero in a matrix represents the sparsity of the matrix, which is inversely proportional to the computational complexity. Therefore, the more zeroes, the faster the computation.

[0062] As shown in Table 1, the sparse-PM non- systematic codes are slightly sparser than the non-systematic vanilla {i.e. original) codes {e.g., 85 % of zeroes for k = 8). However, the sparse-PM systematic codes are significantly sparser than systematic vanilla codes {e.g., 75% instead of 0% for k = 8) thus leading to faster computation. Hence, an equivalent systematic linear generator matrix to the sparse-PM encoding matrix based on the Vandermonde matrix is a sparse matrix, particularly when compared to the original PM regenerating codes (vanilla) codes. This is important since most codes deployed are systematic. It gives a strong indication that the sparse-PM regenerating code based on the Vandermonde matrix is faster than the original PM regenerating codes.

[0063] For a general PM MSR regenerating code satisfying the properties above there are many possible ways of designing the message matrix M. An exemplary message matrix M is defined as:

M = (8)

J2J where entries in the upper-triangular part of each of the two matrices are filled up by ^ ^ distinct message symbols. Thus, all the B = <x(jx + 1) message symbols constituting the data message are contained in the two matrices S 1 and S 2 . The entries in the strictly lower-triangular portion of the two matrices are chosen so as to make the matrices symmetric. Hence, the message matrix includes the data message (more than once) and can be seen as a transformation of the data message.

Table 1

[0064] According to the present principles, the general concept of sparsity may also be applied for codes based on matrices other than the Vandermonde matrix. A second embodiment of Sparse-PM codes relies on Cauchy matrices, which require smaller finite fields and guarantee that Φ is invertible even on a finite field. All values of Λ are distinct and any matrix formed from d rows is invertible: it is computationally easy to check that all 71 such matrices are invertible. Let us define £Ij = g l+ , bj = g J_ 1 . Then, = (cij — b 1 )/(aj— b 1+a ) and i = 0, for all i≠ j . In addition, Φ is defined as an identity matrix concatenated to a specific Cauchy matrix, that is:

[0065] For the case of k = 3, 71 = 5, d = 4, a = 2, the construction above results following Cauchy matrix based Sparse-PM matrices:

Ψ

[0066] By construction, any k rows of Φ are invertible; and for a given finite field and k, it is easy to check computationally that all 71 values of Λ are different and that all 71 possible d X d sub-matrices of Φ are invertible. The construction above is valid for k 6 [2, ... ,39] in the default Galois Field GF(2 8 ) from GF-Complete and k E [2, ... ,64] in GF(2 16 ). Notice that the number of matrices to computationally check is small so that checking all the conditions for all k takes 0.6 seconds in GF(2 8 ) and 19 seconds for checking up to k = 64 in GF(2 16 ).

[0067] The Cauchy matrix based Sparse-PM regenerating codes have the same performance as the Vandermonde matrix based construction. The main advantage is that Cauchy-based codes have stronger theoretical guarantees.. For example, the Vandermonde matrix based codes do not exist (may not be repaired in some cases) for k = 9 and W = 8 (where W defines a finite field Fq of size q = 2 W ). due to properties specific to Vandermonde matrices in finite fields. In addition, Table 1 also applies to Cauchy-based sparse-PM codes. Hence, an equivalent systematic linear generator matrix to the sparse-PM encoding matrix based on the Cauchy matrix is a sparse matrix, particularly when compared to the original PM regenerating (vanilla) codes. This is important since most codes deployed are systematic. It gives an indication that the sparse-PM regenerating code based on the Cauchy matrix is faster than the original PM regenerating codes.

Regeneration/Repair

[0068] For the codes presented in equations (6)-(10) above, exact-regeneration or repair of any failed node can be achieved by connecting to any d of the remaining 71—1 nodes, called helper nodes, {hi, K2, ... , hrf}. Assuming that φ^ is the row of corresponding to the failed node , the a symbols previously stored in the failed node defined by:

V R3 = [Φ λ φ ] Μ = φ 5 χ + λ φ 5 2 (11)

[0069] Upon being contacted by the replacement node, the helper node hj computes the inner product V R i = τ ; Μφ^- for j = l, ... , d and passes on this data value to the replacement node. Thus in the present construction, the vector equals φ^. The replacement node thus obtains the d symbols and groups them into a data vector V R i = ^repair fr° m the helper nodes, where:

repair (12)

[0070] By construction, the dxd matrix Ψ, repair is invertible, i.e., Ψ- repair exists. Thus the replacement node now has access to an intermediate matrix:

[0071] Since Si and S 2 are symmetric matrices, the replacement node has thus acquired through transposition, both to obtain the data V R3 previously stored in the failed node, as described in equation (11) above.

[0072] One skilled in the art will recognize that the operation of regeneration described above also applies to the linear and systematic versions of a regenerating PM code, since it consists in repairing the stored encoded data in one or more storage nodes. Although the systematic version of the code creates an equivalent code, it corresponds to pre-processing the original input data X to obtain precoded data Z followed by encoding Z to obtain Y, such that Y[i___ka] = X· Encoding Z is performed with the non- systematic version of the code, by utilizing the generator matrix G (in the linear version) or by utilizing the encoding matrix Ψ and the message matrix M. Therefore, regenerating or repairing one or more nodes of Y can still be accomplished through the regeneration method described above.

Reconstruction/Decoding

[0073] Similarly, regarding reconstruction or decoding for the codes presented in equations (6)-(8) above, all the B message symbols can be recovered/reconstructed/decoded by connecting to any k nodes, i.e., the message symbols can be recovered through linear operations on the entries of any k rows of the code matrix C. Assuming that Ψ /j c = [O / JC A / J, C O / J, c ] is the k X d submatrix of Ψ, containing the rows which correspond to the nodes to which the data collector connects, the data collector obtains the symbols of a k X a data matrix:

VDI = ^DCM = ^ocSt A DC <5> DC S 2 ] (12)

[0074] The data collector can post-multiply this term with to obtain an intermediate matrix:

V D2 = VDI <$>DC = P + A DCQ. = ^DC^I <$>DC + A DC 0 DC 5 2 C (13)

[0075] Since S 1 and S 2 are symmetric, the same is true for the intermediate sub-matrices P and Q. The data collector has access to the symbols of the matrix above. Due to the symmetry of P and Q, the (i,j) th , 1≤ i,j≤ k element of this matrix is:

[0076] By construction, all the Aj are distinct, so the data collector can use the equations above to solve for the values of Pj j , Qj j for all i≠ j . Hence, all the non-diagonal elements of P and Q are known.

[0077] Assuming O dc = [φ φί > · · · Φα+ι] ί , the element in the I th row of P (excluding the diagonal elements) is given by:

VDS = Φί S-L [φι - φί_ι φί+ι - φ α +ι] (15) ^

[0078] However, the matrix to the right is non-singular by construction and hence the data collector can obtain V D4 = φ S-^ for 1≤ i≤ a + 1. By selecting the first of these, the data collector has access to V D c, = [φ φ 2 * " ΦαΥ $ι· Since the matrix on the left of S-^ is also non-singular by construction, the data collector can recover S-^ . Similarly, by using the values of the non-diagonal elements of Q , the data collector can recover S 2 . Hence, the entire message matrix M is recovered/reconstructed.

[0079] Rashmi et al. has proven that MSR PM regenerating codes for d = 2k— 2 can be used to obtain MSR codes for all d > 2k— 2. Therefore, a skilled artisan will appreciate that this fact is also applicable to all sparse-PM regenerating codes of the present principles

[0080] For a linearized version of the code, reconstructing large amounts of data through the inverse generator matrix 6r _ 1 may be more efficient than the reconstruction described above, even if the initialization cost is higher. Obtaining the inverse generator matrix may be accomplished, for example, through Gaussian Elimination (GE). Hence as long as what is gained on reconstructing exceeds the initialization cost, the linearized reconstruction is more efficient.

[0081] GE consists in performing elementary operations in the rows and columns of a generator matrix in two major steps: triangularization step and back-substitution step. In the triangularization step, the goal is to convert the given matrix using row operations and row and column re-orderings into an upper triangular square matrix having l's along the diagonal and 0's below the diagonal after discarding bottom rows if any. If the triangularization step is successful, then the back-substitution step can proceed by converting the triangular matrix into the identity matrix after which the GE is successfully finished. Once the identity matrix is obtained the corresponding symbols can be obtained.

[0082] Although there is an equality between a regenerating code and its linear generator matrix G in equation (5), the same is not the case for the systematic version of the code derived from G . The linear operations necessary to transform G into a systematic generator matrix G' create an equivalent code with similar properties, but not the same code. However, as previously described, the operation of obtaining a systematic linear code consists in the concatenation of two operations: pre-processing the original input data X by pre-decoding it by the linear code to obtain precoded data Z followed by encoding by G to obtain the encoded data Y = GG ~ 1 X, wherein G corresponds to the kA first rows of G . The n

24 systematic code is then described by the generator matrix G ' = GG - 1 = [/ As a result, decoding becomes simple, since the input data X is directly available in the encoded message. However, if the data X cannot be completely and directly recovered/accessed due to node failures, it can still be retrieved from decoding some of the remaining encoded symbols encoded by G ' (or more specifically, G").

[0083] A skilled artisan will understand that decoding the systematic codes described above may be accomplished by applying the inverse of the systematic generator matrix G' . Equivalently decoding may be accomplished by obtaining the inverse of the generator matrix G to recover Z, followed by a post-processing step of applying the inverse systematic transformation G to Z in order to recover the original data X. Furthermore, decoding of a systematic regenerating code above may be accomplished by the traditional decoding method of equations (14)-(17) above followed by the above described post-processing step.

Evaluation

[0084] An evaluation of the computational cost of product- matrix regenerating codes was performed by implementing them using the libraries GF-Complete (for the specific version) and Jerasure 2.0 (for the linearized version). Benchmarks were run on a Core Xeon E5-2640 (2.5 GHz) which supports SSE (up to 4.2) and AVX instructions set. Each result shown is the average over 1000 runs. All benchmarks correspond to single threaded computation {i.e., use a single core).

[0085] For each operation, the total running time was separated into: (i) initialization time of encoding/decoding procedures {e.g., building generator or encoding matrix, inverting it) and (ii) time needed to apply the encoding/decoding procedures to the data (reported as the corresponding throughput in MBps - Mega Bytes per second). This distinction is important, as in a practical deployment, depending on the operation, the initialization phase could be pre-computed and its result stored once for all (e.g., encoding), or must be computed at for each operation as it depends on the failure pattern (e.g., decoding) but can be reused across stripes for large objects. The encoding/decoding throughput is the time taken to perform the operation divided by the amount of data to encode/decode (i.e., ka). The repair throughput is the time taken to perform the operation divided by the amount of data to repair (i.e., a).

[0086] The following codes were evaluated: Specific Vanilla Product-Matrix (PMXSVan = systematic; PMXNVan = non- systematic) are the prior art codes of equation (6).

Specific Sparse Product-Matrix (PMXSSpa = systematic; PMXNSpa = non- systematic) are the Cauchy-based sparse PM regenerating codes according to the present principles with encoding based on the matrices Ψ and M (equation 7).

Linear Vanilla Product-Matrix (PMLSVan = systematic; PMLNVan = non- systematic) are the linearized version of the prior art codes of equation (6).

Linear Sparse Product-Matrix (PMLSSpa = systematic; PMLNSpa = non- systematic) are the linearized version the Cauchy-based sparse PM regenerating codes according to the present principles (equation 7).

• Reed-Solomon Codes (RSLSSpa) are Vandermonde-based Reed-Solomon codes implemented in Jerasure and are used as a baseline for comparison.

[0087] Figure 2 shows the performance of encoding operations. Overall, for typical settings such as k = 8, an appropriate choice of implementation (i.e., systematic, sparse and linearized) allows encoding at about 790 MBps using product-matrix codes, when Reed- Solomon codes for the same setting achieve about 1640 MBps and the systematic vanilla PM codes achieve 210 MBps. Hence, the performance of the optimized product- matrix codes is good and they are a reasonable alternative to Reed-Solomon codes for practical deployment. Converting to a bit-matrix and computing a schedule is less efficient than using SIMD for Galois Field multiplication, leading to slower systematic encoding at 263 MBps for vanilla PM codes, and 314 MBps for sparse PM codes for k = 8 and blocks of 16 KB.

[0088] Using sparser codes has a positive impact on both non- systematic and systematic codes doubling the throughput. Linearizing the code is useful for systematic codes; it allows encoding directly without performing the pre-decoding step thus doubling again the throughput. Notice that the performance of sparse systematic codes is higher than sparse non- systematic codes because the kfX first symbols need not be computed for systematic codes as they are equal to uncoded symbols.

[0089] As can be seen on Figures 2C and 2D, the gain observed in throughput for using linearized systematic product-matrix codes comes at the price of a higher initialization cost (i.e., time needed to compute the systematic generator matrix). However, since the generator (encoding) matrix is constant; it can be computed once and reused for all other encoding. Figures 2C and 2D show that, for non-systematic and systematic codes, the specific codes n r

26

have a negligible initialization time compared with the linear ones. Also, although not shown in Figure 2D, the RS codes also have negligible initialization time.

[0090] The individual effects of the various optimizations can be seen on Figure 3. From Figures 3A and 3B, it can be observed that the linearized version is less efficient than the specific for large values of k when using the vanilla code. In addition, the linearized and specific versions remain very close in performance throughout, except for small values of k. When using the sparse-PM code, the performance is substantially increased by linearization, except for large values of k. This is due to the fact that the generator matrix is not sparse for the systematic vanilla code. Hence, the two optimizations described (sparsity and linearity of the code) should be applied together: Even if they increase the performance on their own, their combination provides a significant enhancement, since sparsification allows to better leverage the gains from linearization. The performance of the linearized and specific versions converge to similar values for large values of k

[0091] Figure 4 shows the impact of block size on encoding performance for various implementations of MSR PM regenerating codes and RS codes and k = 8. Figure 4A is for non-systematic codes and 4B is for systematic codes. These curves show a preferred block size around 16KB to 32KB for all codes. This is consistent with the size of the LI cache of the processor, which is 32 KB. Similar results can be observed for decoding, repair and also for MBR codes. Hence, all other experiments use 16 KB as the block size. The figures also show the advantage of utilizing systematic and linear sparse-PM codes and how close these codes come to the RS codes in computational performance.

[0092] Figure 5 shows the performance of the repair operation for a single failure. In order to measure the performance, the computation, which consists of several independent sub- computations, is performed sequentially on a single core. Sparse PM codes are slightly faster than vanilla codes and are operating at 480 MBps when Reed-Solomon codes operate at 1830 MBps

[0093] Figure 6 shows the performance of decoding (averaged over many failure patterns with 1 to 71— k failures) using data from systematic devices if available. The initialization cost does not include the generation of the generator matrix, which correspond to the initialization of the encoding operation and can be pre-computed once for all. The sparse PM code are faster than the vanilla PM codes. Using the linearized version of sparse PM codes is more interesting for low values of k (k < 12 for non- systematic codes, k < 21 for systematic codes), and for large data (due to the initialization cost). Notice that the decoding method is independent of the encoding method. Therefore, it is possible to encode using the linear algorithm and to decode using the specific algorithm of the seminal paper.

[0094] Figure 7A shows a flowchart 700A of a method of encoding data for fault-tolerance in a distributed storage system according to the present principles. The distributed storage system contains a plurality of storage nodes or devices connected in a network. The method includes: accessing 710 a message matrix M of dimension d X a including ka symbols to be distributed over k nodes; accessing 720 a sparse-PM encoding matrix Ψ for a (71, k, d, a) MSR sparse-PM regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; and encoding 730 the data message according to the message matrix M and the sparse-PM encoding matrix Ψ of the sparse-PM regenerating code, wherein the encoded data is C = ΨΜ and is distributed over 71 nodes. The message matrix M may generate a systematic regenerating code. The sparse- PM code is such that data message has ka symbols, the encoded data has Tia symbols and d is the number of symbols required for regeneration of a node.

[0095] Figure 7B shows a flowchart 700B of another method of encoding data for fault- tolerance in a distributed storage system according to the present principles. The distributed storage system contains a plurality of storage nodes or devices connected in a network. The method includes: accessing 750 a data message X including ka symbols to be distributed over k nodes; accessing a systematic linear generator matrix G' which is sparse and equivalent to an encoding matrix Ψ for a (n, k, d, a) Sparse-PM MSR code based on the Cauchy matrix; encoding the data message according to the systematic liner generator matrix G', wherein the encoded data is Y = G'X and is distributed over 71 nodes.

[0096] Figure 8 shows a flowchart 800 of yet another method of encoding data for fault- tolerance in a distributed storage system according to the present principles. The distributed storage system contains a plurality of storage nodes or devices connected in a network. The method includes: accessing 810 a data message X including ka symbols to be distributed over k nodes; accessing 820 an index matrix L of dimension 2a X a; accessing 830 a sparse-PM encoding matrix Ψ for a (71, k, d, a) MSR sparse-PM regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; transforming 840 the encoding matrix Ψ into a sparse-PM linear generator matrix G; and encoding 860 the message data according to the sparse-PM generator matrix G of the regenerating code, wherein the encoded data is Y = GX and is distributed over 71 nodes. The method may also include transforming 850 the sparse-PM generator matrix G into a systematic matrix prior to encoding the data message. The matrix G is a matrix of dimension TiOL X koL. The data message X has dimension ka X 1 and the encoded data Y has dimension TiOL X 1.

[0097] Figure 9 shows a flowchart 900 of a method of regenerating or repairing data for fault- tolerance in a distributed storage system according to the present principles. The distributed storage system contains a plurality of storage nodes or devices connected in a network. The method includes: accessing 910 a sparse-PM encoding matrix Ψ for a (71, k, d, CL) MSR sparse-PM regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; receiving 920 at least d data values from at least d nodes; regenerating 930 a failed storage node in the distributed storage system based on the data values and the sparse-PM encoding matrix Ψ. The data may have been encoded using an equivalent linear systematic code derived from the sparse-PM regenerating code. The regenerating may further include: grouping 932 the data values into a data vector V R1 ; determining 934 an intermediate matrix V R2 from the data vector; and determining 936 the data stored in the failed storage node from the intermediate matrix, wherein the data includes a symbols.

[0098] Figure 10 shows a flowchart 1000 of a method of decoding data for fault- tolerance in a distributed storage system according to the present principles. The distributed storage system contains a plurality of storage nodes or devices connected in a network. The method includes: accessing 1010 a sparse-PM encoding matrix Ψ for a (71, k, d, CL) MSR sparse-PM regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse; receiving 1020 at least k data vectors from at least k nodes; decoding 1030 the data of the 71 storage nodes in the distributed storage system based on the data vectors and the sparse-PM encoding matrix Ψ. The code may be a systematic sparse-PM regenerating code. The regenerating may further include: grouping 1032 the data vectors into a data matrix V D1 ; determining 1034 an intermediate matrix V D2 from the data vector, wherein V D2 includes data sub-matrices P and Q ; determining 1036 the symbols of data sub-matrices P and Q from the intermediate matrix; and determining 1038 a message matrix M of the distributed storage system from the data sub-matrices and the sparse-PM encoding matrix Ψ. The method may further include post-processing 1040 the decoded data to recover an original data, by applying the inverse of a systematic transformation, wherein the original data was pre-processed by the systematic transformation prior to encoding.

[0099] It is to be understood that the present principles may be implemented in various forms of hardware, software, firmware, special purpose processors, computer-readable medium or a combination thereof. Preferably, the present principles are implemented as a combination of hardware and software. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.

[00100] Figure 11 shows a block diagram of a minimum computing environment 1100 within which the present principles can be implemented and executed. The computing environment 1100 includes a processor 1102, and at least one (and preferably more than one) I/O interface 1104. The I/O interface can be wired or wireless and, in the wireless implementation is pre-configured with the appropriate wireless communication protocols to allow the computing environment 1100 to operate on a global network (e.g., internet) and communicate with other computers or servers (e.g., cloud based computing or storage servers) so as to enable the present principles to be provided, for example, as a Software as a Service (SAAS) feature remotely provided to end users. One or more memories 1106 and/or storage devices (HDD) 1108 are also provided within the computing environment 1100. The computing environment may be used to implement a node or device, and/or a controller or server who operates the storage system. [00101 ] Furthermore, aspects of the present principles can take the form of a computer readable storage medium. Any combination of one or more computer readable storage medium(s) may be utilized. A computer readable storage medium can take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein is considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information therefrom. A computer readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.

[00102] It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present principles can be applied, is merely an illustrative and not exhaustive listing as is readily appreciated by one of ordinary skill in the art: a portable computer diskette, a hard disk, a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

[00103] According to one embodiment of the present principles, a computer readable memory is provided including a product matrix minimum storage regenerating code for providing fault-tolerance in a distributed storage system, the code including an encoding matrix Ψ based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse.

[00104] According to one embodiment of the present principles, a computer readable memory is provided including a code for providing fault-tolerance in a distributed storage system, the code includin a linear generator matrix G characterized by:

wherein L is an index matrix of dimension 2a X a and Ψ is an encoding matrix of a product matrix minimum storage regenerating code based on the Cauchy matrix, the encoding matrix having an equivalent systematic linear generator matrix that is sparse.

[00105] According to one embodiment of the present principles, a computer readable memory is provided including a code for providing fault-tolerance in a distributed storage system, the code including a systematic linear generator matrix G', wherein G' is a systematic transformation of a linear generator matrix G characterized by:

ίΨϊ,ί, if there exists / such that /' = LJJ

A I+J ' 1 \ 0, otherwise

wherein L is an index matrix of dimension 2a X a and Ψ is an encoding matrix of a product matrix minimum storage regenerating code based on the Cauchy matrix, wherein the systematic linear generator matrix is sparse.

[00106] It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the manner in which the present principles are programmed. Given the teachings herein, one of ordinary skill in the pertinent art will be able to contemplate these and similar implementations or configurations of the present principles.

[00107] Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present principles is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope of the present principles. All such changes and modifications are intended to be included within the scope of the present principles as set forth in the appended claims.