Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ENCODING AND DECODING OF GENERALIZED CONCATENATED CODES WITH INNER PIGGYBACKED CODES FOR DISTRIBUTED STORAGE SYSTEMS
Document Type and Number:
WIPO Patent Application WO/2017/061892
Kind Code:
A1
Abstract:
The invention relates to the encoding of data for storage on storage devices of n storage nodes such that the data is recoverable after a failure of up to r storage nodes and a failure of up to s failures of storage devices of the storage nodes, wherein the approach is based on the construction of generalized concatenated codes (GCC) where inner codes are piggybacked codes. The GCC construction enables one to provide protection against node and device failures, while inner piggybacked codes enable one to reduce the amount of data transmitted over the network during a node rebuild phase. Another aspect of the invention relates to the recovering of part-erased encoded data, wherein the encoded data has been encoded with said GCC construction using inner piggybacked codes.

Inventors:
TRIFONOV PETER VLADIMIROVICH (CN)
WANG YUANGANG (CN)
CHEN CHEN (CN)
Application Number:
PCT/RU2015/000656
Publication Date:
April 13, 2017
Filing Date:
October 09, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
International Classes:
H03M13/03; G06F11/10; H03M13/29; H03M13/37; H03M13/15
Domestic Patent References:
WO2013164228A12013-11-07
Foreign References:
US20140380114A12014-12-25
US20100138717A12010-06-03
Other References:
RASHMI K V ET AL: "A piggybacking design framework for read-and download-efficient distributed storage codes", PROC., IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, ISIT 2013 IEEE, 7 July 2013 (2013-07-07), pages 331 - 335, XP032496947, ISSN: 2157-8095, [retrieved on 20131003], DOI: 10.1109/ISIT.2013.6620242
ZHANG YONGZHE ET AL: "PCM: A Parity-Check Matrix Based Approach to Improve Decoding Performance of XOR-based Erasure Codes", PROC., IEEE 34TH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRD, SRDS 2015, 28 September 2015 (2015-09-28), pages 182 - 191, XP032846075, DOI: 10.1109/SRDS.2015.15
RASHMI K V ET AL: "A hitchhiker's guide to fast and efficient data reconstruction in erasure-coded data centers", PROC., SIGCOMM, ACM, NEW YORK, 17 August 2014 (2014-08-17), pages 331 - 342, XP058053871, ISBN: 978-1-4503-2836-4, DOI: 10.1145/2619239.2626325
SIDDHARTHA KUMAR ET AL: "A Family of Erasure Correcting Codes with Low Repair Bandwidth and Low Repair Complexity", 13 May 2015 (2015-05-13), pages 1 - 7, XP055286364, Retrieved from the Internet [retrieved on 20160706]
K. V. RASHMI; N. B. SHAH; K. RAMCHANDRAN: "A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes", PROC. OF IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2013
K. V. RASHMI; N. B. SHAH; D. GU; H. KUANG; D. BORTHAKUR; K. RAMCHANDRAN: "A ''Hitchhiker's'' Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers", ACM SIGCOMM, August 2014 (2014-08-01)
K. V. RASHMI; N. B. SHAH; K. RAMCHANDRAN: "A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes", IN PROC. OF IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2013
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY & PARTNERS" LTD (RU)
Download PDF:
Claims:
CLAIMS

1. Method for encoding data for storage on storage devices (222, 224, 232, 234, 360) of n storage nodes (220, 230) such that the data is recoverable after a failure of up to r storage nodes and a failure of up to s storage devices, wherein the method comprises:

outer-coding (110, 320) the data with one or more outer codes to obtain outer-coded data, and

encoding (120, 340) the outer-coded data with one or more piggybacked inner codes to obtain encoded data.

2. The method of claim 1, wherein the one or more outer codes and/or the one or more inner codes are maximum-distance separable codes.

3. The method of claim 1 , wherein the outer codes are Ct (v, Kit v— Kt + 1) codes over GF(2OT), v≤2m for 0 < i < n , wherein Kx = ... = Kr = 0 , Kj -

[v - + lj , r + 1≤j < n and (v - Kj + - r) > s. 4. The method of one of the previous claims, wherein the encoded data

(350) is representable as a n xmv matrix (400), and wherein the method comprises storing an z-th row (410) of the matrix on an z'-th storage node and storing one or more symbols from a (pm+j)-t column (422, 424) in a -th block of a p-ih storage device.

5. Method for recovering data from part-erased encoded data, wherein the encoded data has been encoded using a method according to one of the previous claims.

6. The method of claim 5, wherein the method comprises, if ts + tb' ≤ r, where Ζ$ is the number of failed storage nodes and tb' = max0≤i<v and is the number of storage nodes where storage device z has failed: applying an inner code decoding algorithm to each column of a codeword of the part-erased encoded data.

7. The method of claim 5 or 6, wherein the method comprises

constructing a check matrix H = (I\A)P that corresponds to a concatenation of the inner and outer codes, where P is a permutation matrix which maps an identity submatrix onto positions of one or more erased symbols of the part-erased data; and

- recovering one or more erased symbols c; of a codeword c as Cj—

∑i≠j ciHji-

8. The method of claim 7, wherein the check matrix is obtained by Gaussian

elimination from the equation GHT = 0 , where G = C(1) Θ Glf_ , C« i is a

generator matrix of an -th outer code, and Gi . is an i-th row of a generator matrix of the one or more piggybacked inner codes.

9. An apparatus (210) for encoding data for storage on storage devices (222,

224, 232, 234) of n storage nodes (220, 230) such that the data is recoverable after a failure of up to r storage nodes and a failure of up to 5 storage devices, the apparatus comprising:

a first encoder (212) for outer-coding the data with one or more outer codes to obtain outer-coded data and

a second encoder (214) for encoding the outer-coded data with one or more piggybacked inner codes to obtain encoded data,

wherein in particular the apparatus is configured to carry out the method of one of claims 1 to 4.

10. The apparatus of claim 9, wherein the first encoder and/or the second encoder is implemented in hardware, in particular as ASIC and/or FPGA.

11. An apparatus for recovering part-erased encoded data, comprising a decoder for decoding the part-erased encoded data, wherein the encoded data has been encoded using a concatenation of one or more outer codes and one or more inner codes, wherein the one or more outer codes and the one or more inner codes are piggybacked maximum distance separable codes, wherein in particular the apparatus is configured to carry out the method of one of claims 5 to 8.

12. A computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of one of claims 1 to 8.

Description:
ENCODING AND DECODING OF GENERALIZED CONCATENATED CODES WITH INNER PIGGYBACKED CODES FOR DISTRIBUTED STORAGE SYSTEMS

TECHNICAL FIELD

The present invention relates to an apparatus and a method for encoding data for storage on storage devices of n storage nodes such that the data is recoverable after a failure of up to r storage nodes and a failure of up to s storage devices. The present invention also relates to a method and an apparatus for recovering data from part-erased encoded data.

The present invention also relates to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out a method for encoding data or for recovering data from part-erased encoded data.

BACKGROUND

Consider a storage system consisting of n servers (nodes), where each server includes v storage devices. Both servers and devices may fail. Erasure coding techniques are commonly used to protect the data against such failures. In order to recover the data after failures, erasure decoding is performed, which involves reading data from operational devices, and computing some linear combinations of it. This involves transmission of the data over the network. Network data transfer is the most expensive operation in such systems.

Minimum storage regenerating codes provide the lowest possible redundancy for a given number of recoverable server (node) failures and minimize the amount of data to be transmitted over the network during the rebuild phase. However, these codes do not provide protection against device failures.

K. V. Rashmi, N. B. Shah and K. Ramchandran, "A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes," In Proc. of IEEE International Symposium on Information Theory, 2013 and K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur and K. Ramchandran, "A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers," ACM SIGCOMM, Aug, 2014 introduced a so-called piggybacking framework. Accordingly, a number of codewords of a systematic maximum distance separable codes are taken and combined so that substripe i stores a sum of its codeword and some linear functions of the data in substripes 1,2,... , t— 1. These linear combinations are selected in such a way that check symbols within a substripe depend on all information symbols within the corresponding substripe, and a few information symbols in other substripes.

Neither the piggybacked codes nor the existing constructions of minimum storage regenerating codes, which are designed to provide protection against node (server) failures, provide protection against device failures.

Further, some of the prior art methods involve a large amount of data being transmitted over the network during a repair operation.

SUMMARY OF THE INVENTION

The objective of the present invention is to provide an apparatus and a method for encoding data for storage on storage devices, wherein the apparatus and the method overcome one or more of the above-mentioned problems of the prior art.

A first aspect of the invention provides a method for encoding data for storage on storage devices of n storage nodes such that the data is recoverable after a failure of up to r storage nodes and a failure of up to s storage devices, wherein the method comprises:

outer-coding the data with one or more outer codes to obtain outer-coded data, and

encoding the outer-coded data with one or more piggybacked inner codes to obtain encoded data.

The proposed approach is based on the construction of generalized concatenated codes (GCC), where inner codes are piggybacked codes. The parameters of the inner and outer codes can be selected so that the desired protection level is achieved. The GCC construction enables one to provide protection against node and device failures, while inner piggybacked codes enable one to reduce the amount of data transmitted over the network during a node rebuild phase.

A codeword of a piggybacked code of length n over GF(q) m is obtained by taking m codewords of a base code of length n, and adding to the i-th codeword of the base code some linear combination of the information symbols of base code codewords 0,...,i-l . The particular coefficients are selected in such way, so that the amount of data to be transmitted over the network in case of repair of a node is minimized.

In a first implementation of the method according to the first aspect, the one or more outer codes and/or the one or more inner codes are maximum-distance separable codes.

Maximum-distance separable (MDS) codes are linear block codes that achieve equality in the Singleton bound. Examples of relevant MDS codes that can be used in the first implementation include Reed-Solomon codes, generalized Reed-Solomon, Cauchy Reed-Solomon and their extended versions. It is also possible to apply the GCC obtained as described here in other concatenated constructions, in order to provide pro- tection against other types of failures.

In a further implementation of the method according to the first aspect, the inner codes are

Ci(n, n— i, i + 1) nested codes over GF(2" ! ) for 0 < i≤ n, i.e. inner codes are maximum distance separable ones

In a second implementation of the method according to the first aspect, the outer codes are Ci(v, /Q, - K t + 1) codes over GF(2 OT ), v < 2 m for 0 < i < n , wherein

K x = ... = K r = 0, K j - [v - + lj , r + 1 < < n and (v - K j + - r) > s.

This has the advantage that protection against r server failures and s further device failures is provided.

In a third implementation of the method according to the first aspect, the encoded data is representable as a n xmv matrix, and the method comprises storing an z ' -th row of the matrix on an z ' -th storage node and storing one or more symbols from a column pm+j in a 7-th block of a p-t storage device.

This represents a particularly efficient and practical implementation of the method of the first aspect.

A second aspect of the invention refers to a method for recovering data from part-erased encoded data, wherein the encoded data has been encoded using a method according to the first aspect or one of the implementations of the first aspect.

In a first implementation of the method of the second aspect, the method com- prises, if t s + t b '≤ r , where t s is the number of failed storage nodes and t b ' = max 0≤ j <v and is the number of storage nodes where storage device z has failed: applying an inner code decoding algorithm to each column of a codeword of the part-erased encoded data.

This has the advantage that the inner code is a piggybacked MDS code. Erasure decoding of such code can be performed using a procedure with reduced network traffic.

In a second implementation of the method of the second aspect, the method comprises constructing a check matrix H = (I\A)P that corresponds to a concatenation of the inner and outer codes, where P is a permutation matrix which maps an identity submatrix onto positions of one or more erased symbols of the part-erased data; and

recovering one or more erased symbols Cj of a codeword c as Cj =

This has the advantage that all possible erasure patterns which are recoverable by this code can be reconstructed by this method.

The identity matrix / does not need to occupy first r columns of r x (nv) matrix H. Instead, a permutation can be applied, so that a j-ih column of the identity matrix is placed on the position of the y-th erased symbol.

The permutation matrix P can be constructed such that the columns of the identity submatrix in the check matrix Hare placed on the positions corresponding to erased symbols. Thus, although formally the sum in the expression Cj =∑i≠j CiHji may in- elude erased symbols, they are multiplied by H ,=0, so that the value of erased symbols is not required.

In a third implementation of the method of the second aspect, the check matrix is obtained by Gaussian elimination from the equation GH T = 0 , where G =

is a generator matrix of an z ' -th outer code, and G ir . is an i-t row of a generator matrix of the one or more piggybacked inner codes.

The method of the first and/or the second aspect can be implemented within a controller of a storage system. In particular, the apparatus of the third aspect can be a controller of a storage system. To this end, the controller can be implemented either in software, or in hardware (e.g. ASIC, FPGA).

The controller can be directly connected to the storage devices or it can be connected to the storage devices through a network connection, wherein e.g. the storage devices are connected to the network through a further controller.

A third aspect of the invention relates to an apparatus for encoding data for storage on storage devices of n storage nodes such that the data is recoverable after a failure of up to r storage nodes and a failure of up to s storage devices, the apparatus comprising:

a first encoder for outer-coding the data with one or more outer codes to obtain outer-coded data and

- a second encoder for encoding the outer-coded data with one or more piggybacked inner codes to obtain encoded data,

wherein in particular the apparatus is configured to carry out the method of the first aspect or one of its implementations.

In a first implementation of the apparatus of the third aspect, the first encoder and/or the second encoder is implemented in hardware, in particular as ASIC and/or FPGA.

This has the advantage that certain computation operations that occur often during encoding can be implemented more efficiently in hardware.

A fourth aspect of the invention relates to an apparatus for recovering part-erased encoded data, comprising a decoder for decoding the part-erased encoded data, wherein the encoded data has been encoded using a concatenation of one or more outer codes and one or more inner codes, wherein the one or more outer codes and the one or more inner codes are piggybacked maximum distance separable codes, wherein in particular the apparatus is configured to carry out the method of the second aspect or one of its im- plementations.

A fifth aspect of the invention refers to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of the second aspect or one of the implementations of the second aspect.

BRIEF DESCRIPTION OF THE DRAWINGS

To illustrate the technical features of embodiments of the present invention more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present invention, but modifications on these embodiments are possible without departing from the scope of the present inven- tion as defined in the claims.

FIG. 1 is a flow chart of a method for encoding data according to an embodiment of the invention, FIG. 2 is a schematic illustration of an apparatus for encoding data for storage on storage devices of storage nodes according to an embodiment of the invention,

FIG. 3 is a schematic illustration of a method according to an embodiment of the present invention,

FIG. 4 is a schematic illustration of a matrix that stores encoded data,

FIG. 5A presents the ratio of the amount of data which needs to be transmitted over the network in the case of piggybacked and non-piggybacked codes, wherein the base code redundancy is varied, and

FIG. 5B presents the ratio of the amount of data which needs to be transmitted over the network in the case of piggybacked and non-piggybacked codes, wherein m is varied.

Detailed Description of the Embodiments

FIG. 1 is a flow chart of a method according to an embodiment of the invention. The method comprises a first step 110 of outer-coding data with one or more outer codes to obtain outer-coded data. The method further comprises a second step 120 of encoding the outer-coded data with one or more piggybacked inner codes to obtain encoded data.

FIG. 2 is a schematic illustration of a storage system 200 comprising an apparatus 210 in accordance with an embodiment of the invention. The apparatus is configured to encode data for storage on a plurality of storage devices. To this end, the apparatus 210 is optionally connected to a first storage node 220 and a second storage node 230, indicated with dashed lines in FIG. 2. The first storage node 220 comprises a first storage device 222 and a second storage device 224. The second storage node 230 is connected to external storage devices 232, 234. The apparatus 210 can be connected to the storage nodes 220, 230 e.g. via a network connection. The storage devices 222, 224, 232, 234 can be connected to their storage nodes 220, 230 e.g. via a storage bus, such as SCSI, SAS, S-ATA or PCIe. The storage devices 222, 224, 232, 234 can be configured to store blocks of data, e.g. blocks of data of a predetermined size.

The apparatus 210 comprises a first encoder 212 and a second encoder 214. Preferably, the first encoder is configured to carry out the first step 110 of the method shown in FIG. 1 and the second encoder 214 is configured to carry out the second step 120 of the method shown in FIG. 1.

Optionally, the one or more inner codes that are used in second step 120 are a family of piggybacked nested inner codes (n, n— i, d ), i.e. C i+1 3 Q and the outer codes are a family of outer codes Q(v, K ir D t ) over GF(q), 0 < i < n.

A generalized concatenated code (GCC) is defined by a family of nested inner codes Cj (n, n— i, d ) , i.e. C i+1 => Q and a family of outer codes Ci(y, Ki, Di) over GF(q), 0 < i < n. A codeword of the code can be obtained by arranging the data into a n x v rectangular table, so that /q symbols of data are stored in the /-th row, encoding each row with the corresponding outer code, and encoding each column with inner code C 0 . The dimension of the obtained code is given by K , wherein the length is N = v n, and the minimum distance is δ≥ minjdiDj.

Assuming that both inner and outer codes are maximum distance separable ones with di = i + 1 and Z¾ = v— /Q + 1, one obtains that the minimum distance of the corresponding GCC is given by d = min

Generalized concatenated codes can be naturally used to implement protection against node and device failures. Assume that the system should survive r node failures and s device failures. This can be achieved by employing generalized concatenated code of length vn with inner piggybacked MDS codes Cj (n, n— + 1), and outer codes Q(v , K b v - K t + 1), 0 < i < n, where Kj = 0, 0 < < r, and

(v - Kj + 1)0 ' - + 1) > s, r < j < n (1) The /-th row of a table corresponding to a GCC codeword should be stored in the i-th node. Generalized concatenated codes and, in particular, 2-dimensional Reed-Solomon codes, can be naturally used to implement protection against device and block failures. Assume that the system should survive r device failures and s block failures. This can be achieved by employing a generalized concatenated code of length vn with inner piggybacked Reed-Solomon codes <Ci (n, n— i, i + 1), and outer codes C f (v , K v - Ki + 1), 0 < i < n, where Kj = 0, 0≤j < r, and (v — Kj + i)(j - r + 1) > s, r < j < n. The i-t row of a table corresponding to a GCC codeword should be stored in the i-t device, as illustrated in FIG. 3.

FIG. 3 is a schematic illustration of a method according to an embodiment of the present invention. Payload data, indicated with reference number 310, comprises symbols K0, KI, K2. In a first processing step, indicated with arrows 320 in FIG. 3, these symbols are outer-encoded with outer codes. This outer-encoding step is per- formed in a systematic manner, i.e., the payload symbols are contained in the resulting codewords 330 of the outer codes. In other words, the matrix 330 of outer-coded codewords comprises the payload data 332 and parity data 334.

In a step indicated with reference number 340, the codewords of the outer codes are encoded with piggybacked inner codes to obtain codewords 350 of a generalized concatenated code. These codewords are then stored on a plurality of storage devices

360.

Indeed, if r devices fail, i.e. r row erasures occur, then the minimum distance of the inner codes drops to j— r + l,j≥ r. Hence, the codewords with r erased columns still differ in min ≥r (v— K j + 1)( — r + 1) > s positions, i.e. the code is able to recover at least s block failures.

Consider one of the piggybacking schemes, and apply it to some base MDS (e.g. Reed-Solomon) code {n, n - r, r + 1) over GF(q). Let G b be the generator matrix of base code. This matrix must be chosen in such way, so that its bottom n-i rows generate (n, n-i, i+1) MDS codes. We take m codewords of the base code and apply the piggybacking scheme to them. In other words, the parameter m indicates the number of instances in the piggybacked code. The piggybacked codewords can be considered as a

trix, which maps i+nj onto j+mi, 0 < i < n, 0 <j < m, and M P is the linear transformation matrix, which defines the piggybacking linear transformation.

Since the piggybacking transformation preserves the MDS property both of the code being transformed and its subcodes, one can use this code over GF(q) m as an inner code for construction of (N, K, D) GCC. Namely, one should select outer ( , K if v— Ki + 1) MDS codes with dimensions satisfying equation (1).

The codewords of the obtained GCC can be represented as an n *mv matrix 400 with elements over GF(q) as shown in FIG. 4. Here, columns im, im + 1, ..., im + m - 1 of the codeword represent piggybacked codewords of the base code. In particular, FIG. 4 shows the case of m=2, i.e., a stripe 420 comprises a first substripe 422 and a second substripe 424. The first substripe 422 is stored on a first block of a device, and the second substripe 424 is stored on a second block of the device.

An z ' -th row, indicated in FIG. 4 with reference number 410, of the codeword table should be stored on an z ' -th server, and the symbols from column sm + j should be stored on an 5-th device in a y ' -th block.

The MDS codes used in this scheme can be, for example, Reed-Solomon codes. One particular way to construct matrix G b , so that it satisfies the nested code requirements, is to take it equal to n-r last rows and n first columns of matrix

g ij

9 i ( ) = U' j i i* - < * ») = Σ½= 0 < ¾^ and a i s a primitive element of GF(2"), 2" > n. Observe that this matrix corresponds to a non-systematic encoding of Reed-Solomon code, while the piggyback construction assumes systematic encoding. Nevertheless, one can still use the transformations described in K. V. Rashmi, N. B. Shah and K. Ramchandran, "A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes," In Proc. of IEEE International Symposium on Information Theory, 2013 and K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur and K. Ramchandran, "A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers," ACM SIGCOMM, Aug, 2014.

For erasure recovery, i.e. decoding, one can proceed as follows:

• If t s + t b ' ≤ r , where t s is the number of failed servers and t b ' = maxo < i <v and t is the number of servers where device z has failed, then apply the inner code decoding algorithm to each column of the codeword. This algorithm can be implemented the same way as in the above-mentioned publications by Rashmi et al.

Otherwise, construct a check matrix H = Q\A)P for the obtained GCC, where P is a permutation matrix which maps the identity submatrix onto the positions of erased symbols. Then the erased symbols of codeword c can be recovered as Cj = ∑ ≠i CiHji . The check matrix can be obtained by Gaussian elimination from the equation GH T

code, and G t - _ is the z-th row of the piggybacked generator matrix of the inner code.

FIGs. 5a and 5b present the ratio of the amount of data which needs to be transmitted over the network in the case of piggybacked and non-piggybacked (i.e. m = 1) codes in the event of recovery after a failure of a single server. The results are presented for codes of various length and dimension. In FIG. 5a, the base code redundancy is varied, and in FIG. 5b the code rate is kept approximately constant and m is varied. It can be seen that in both cases increasing the absolute number of check symbols in the code results in lower amount of data to be transmitted during the rebuild phase.

The foregoing descriptions are only implementation manners of the present invention, the protection of the scope of the present invention is not limited to this. Any variations or replacements can be easily made through person skilled in the art. Therefore, the protection scope of the present invention should be subject to the protection scope of the attached claims.