Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEMATIC CODING TECHNIQUE FOR ERASURE CORRECTION
Document Type and Number:
WIPO Patent Application WO/2017/109220
Kind Code:
A1
Abstract:
Disclosed herein is a method for determining how to encode data in accordance with a systematic coding technique and encoding data in accordance with the determined systematic coding technique, the method comprising: determining the code parameters n, k, r and α, wherein n is the total number of nodes, k is the total number of source data nodes, r is the total number of redundant nodes, such that n = k + r, α is the number of substripes of data in one of the nodes and each of the source data nodes and redundant nodes comprise the same number of substripes, and wherein α is determined so that it satisfies either the condition <α < rm or both of the conditions α = rm and (k/r) is not an integer, where m = ceiling(k/r); determining source data nodes that comprise source data that is not encoded by the systematic coding technique; for each of the redundant nodes, determining to generate each of the substripes of data in dependence on a combination of a different substripe from each of the source data nodes such that each of the substripes is generated in dependence on a combination of k substripes of source data and the α substripes of the redundant node are generated in dependence on all of the (α x k) substripes of source data; and determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, wherein said determination comprises selecting a further substripe of source data for a redundant node to be further dependent on with the further substripe being selectable from any one of the k source nodes; and encoding data in accordance with the determined systematic coding technique. Embodiments of the invention provide a systematic coding technique with one or more of the advantageous properties of being Maximum-Distance Separable, systematic, having a flexible sub- packetisation level, providing minimum repair bandwidth, access optimality and fast decoding.

Inventors:
GLIGOROSKI DANILO (NO)
KRALEVSKA KATINA (NO)
Application Number:
PCT/EP2016/082656
Publication Date:
June 29, 2017
Filing Date:
December 23, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NORWEGIAN UNIV OF SCIENCE AND TECH (NO)
International Classes:
H03M13/03; G06F11/10; H03M13/13; H03M13/37
Other References:
AGARWAL GAURAV KUMAR ET AL: "An alternate construction of an access-optimal regenerating code with optimal sub-packetization level", PROC., TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), IEEE, 27 February 2015 (2015-02-27), pages 1 - 6, XP032763750, DOI: 10.1109/NCC.2015.7084873
KRALEVSKA KATINA ET AL: "General Sub-Packetized Access-Optimal Regenerating Codes", IEEE COMMUNICATIONS LETTERS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 20, no. 7, 1 July 2016 (2016-07-01), pages 1281 - 1284, XP011616340, ISSN: 1089-7798, [retrieved on 20160708], DOI: 10.1109/LCOMM.2016.2561287
VIVECK R CADAMBE ET AL: "Polynomial length MDS codes with optimal repair in distributed storage", PROC., IEEE 45TH ASIMOLOAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2011, 6 November 2011 (2011-11-06), pages 1850 - 1854, XP032172399, ISBN: 978-1-4673-0321-7, DOI: 10.1109/ACSSC.2011.6190343
BIRENJITH SASIDHARAN ET AL: "A High-Rate MSR Code With Polynomial Sub-Packetization Level", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 27 January 2015 (2015-01-27), XP080675355
ZHIYING WANG ET AL: "Long MDS codes for optimal repair bandwidth", PROC., IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, ISIT 2012, 1 July 2012 (2012-07-01), pages 1182 - 1186, XP032225451, ISBN: 978-1-4673-2580-6, DOI: 10.1109/ISIT.2012.6283041
ITZHAK TAMO ET AL: "Zigzag Codes: MDS Array Codes With Optimal Rebuilding", IEEE TRANSACTIONS ON INFORMATION THEORY, IEEE PRESS, USA, vol. 59, no. 3, 1 March 2013 (2013-03-01), pages 1597 - 1616, XP011493657, ISSN: 0018-9448, DOI: 10.1109/TIT.2012.2227110
Attorney, Agent or Firm:
GILL JENNINGS & EVERY LLP (GB)
Download PDF:
Claims:
Claims:

1. A method for determining how to encode data in accordance with a systematic coding technique and encoding data in accordance with the determined systematic coding technique, the method comprising: determining the code parameters n, k, r and a, wherein n is the total number of nodes, k is the total number of source data nodes, r is the total number of redundant nodes, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant nodes comprise the same number of substripes, and wherein a is determined so that it satisfies either the condition 1 < a < rm or both of the conditions a = rm and (k/r) is not an integer, where m = ceiling(k/r); determining source data nodes that comprise source data that is not encoded by the systematic coding technique; for each of the redundant nodes, determining to generate each of the substripes of data in dependence on a combination of a different substripe from each of the source data nodes such that each of the substripes is generated in dependence on a combination of k substripes of source data and the a substripes of the redundant node are generated in dependence on all of the (a x k) substripes of source data; and determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, wherein said determination comprises selecting a further substripe of source data for a redundant node to be further dependent on with the further substripe being selectable from any one of the k source nodes; and encoding data in accordance with the determined systematic coding technique. The method according to claim 1 , wherein: each of the substripes within each node is identifiable by substripe index level, i, where 1≤ i≤ a and i is an integer; for at least one of the redundant nodes, said determining to generate each of the substripes of data in dependence on a combination of a different substripe from each of the source data nodes comprises determining, for each substripe of said at least one of the redundant nodes, the substripe of the redundant node to be a combination of a single substripe from all of the source data nodes with the substripe of the redundant node and the substripes from each source data nodes all having the same substripe index level; and in said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, the determination comprises selecting a further substripe of source data for a redundant node to be further dependent on with the further substripe being selectable from any one of the a substripe index levels; and, preferably, the determination comprises selecting substripes from the source data nodes as further substripes of source data that one or more of the redundant nodes are further dependent on with the selection comprising at least one substripe from each of the a substripe index levels.

The method according to claim 2, wherein, for each substripe index level, there is at least one substripe of a redundant node that is dependent on an additional substripe of source data with a different substripe index level from the substripe of said at least one substripe of the redundant node. The method according to any preceding claim, wherein in said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, the determination comprises, for all but one of redundant nodes, the redundant nodes to be determined in dependence on at least one additional substripe of source data.

The method according to any preceding claim, wherein in said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, the determination comprises selecting at least one substripe from each of the k source data nodes as further substripes of source data that one or more of the redundant nodes are further dependent on.

The method according to any preceding claim, wherein in said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, the determination comprises selecting at least one substripe from each of the k source data nodes as further substripes of source data that one of the redundant nodes is further dependent on. The method according to any preceding claim, wherein in said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, the determination comprises, for each of two or more of the redundant nodes, selecting at least one substripe from each of the k source data nodes as further substripes of source data that the redundant node is further dependent on.

8. The method according to any preceding claim, wherein (k/r) is not an integer.

9. The method according to any preceding claim, wherein either r is 2 or more, or r is 3 or more.

10. The method according to any preceding claim, wherein either a is 2 or more, or a is 3 or more. 1 1. The method according to any preceding claim wherein a is determined so that it satisfies the condition 1 < a < rm and/or (k/r) is not an integer.

12. The method according to any preceding claim, wherein one of the redundant nodes is determined to have each of its substripes dependent on exactly k substripes of source data and the substripes of the redundant node are generated in dependence on all of the (a x k) substripes of source data.

13. The method according to any preceding claim, wherein said determining of each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on is performed for r-1 redundant nodes. 14. The method according to any preceding claim, wherein determining each of one or more of the substripes of a redundant node to be further dependent on at least one further substripe of source data that it is not currently dependent on is performed for all of the substripes of the node. 15. The method according to any preceding claim, the method comprising performing a balanced selection of the substripes that the redundant nodes are further dependent on such that substantially the same number of read operations are required to recover each source node.

16. The method according to any preceding claim, wherein the method is computer-implemented.

17. The method according to any preceding claim, wherein the determined systematic coding technique is MDS.

18. The method according to any preceding claim, wherein the combining of substripes of source data to generate a substripe of a redundant node is by linear combinations over finite fields.

19. The method according to any preceding claim, wherein the systematic coding technique is an erasure resilient systematic coding technique.

20. The method according to any preceding claim apart from claim 8, wherein (k/r) is an integer.

21. The method according to any preceding claim, wherein said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on is performed in accordance with any of a random determination, a pseudo-random determination, a pre-determined technique and/or an algorithm.

22. The method according to any preceding claim, wherein, when determining each of one or more of the substripes of a redundant node to be further dependent on at least one further substripe of source data that it is not currently dependent on, the selection of each further substripe of source data is independent from the order of writing and reading the previous substripes of source data.

23. The method according to any preceding claim, wherein: said step of determining source data nodes comprises determining k source data nodes d2, ... , dk} where each data node d, comprises an indexed set of a substripes { 1; ,a2 ,— , a ) as a two-dimensional array

Data with a rows and k columns such that Data

and the generation of the redundant nodes comprises: determining r redundant data nodes {p1,p2, -,pr} where each redundant node pi, where 1≤ I≤r, comprises of an indexed set of a substripes determining r two-dimensional index arrays Pi,..., Pr; determining the index array for Pi to have a rows and k columns, where each cell in Pi is a pair of indexes with the following values:

(1,1) (1,2) ... (l.fc)

(2,1) (2,2) ... (2,k)

( ,Ι) (a, 2) (a, k determining the index arrays P2,..., Pr to have a rows and k + m columns, and where each cell in Pl where 2≤ /≤ r, is a pair of indexes with the following values:

(1,1) (1,2) d,k (?,?) (?.?)

(2,1) (2,2) (2,k) (?,?) (?.?)

where the pairs with (?.?)

( ,Ι) (a, 2) (oU) (?,?) (?,?)

values (?,?) are further determined according to an algorithm; and for each of the redundant nodes {p1( p2, ...,pr}, determining to generate each of the substripes pi where 1≤ i≤ a and 1 < Z < r, in dependence on a combination of different source data substripes d(Ji j2), where the pair 1J2) is present in the t-th row of the index array P;.

24. A method for determining how to generate a systematic code and encoding data in accordance with the determined systematic code, the method comprising: receiving the code parameters n, a, k and/or r; configuring an algorithm with the received code parameters; determining, by the algorithm, how to generate a systematic code in accordance with the method of any preceding claim; and encoding data in accordance with the determined systematic code. 25. The method according to claim 24 when dependent on claim 23, wherein n, k, a are inputs to the algorithm and index arrays P1 ..., Pr are outputs that define how to generate each of the redundant nodes, and wherein the algorithm performs the steps of: initialising P1 ; ..., Pr as arrays P = . i,f) axk; appending additional m = [Vrj columns to P2 , - - -, Pr all initialized to (0,0); setting portion <- setting ValidPartitions <- 0; setting j <- 0; repeating the steps of: setting ;' <- ;' + 1; setting v j/r

setting run

setting step run; determining

Dd . = ValidPartitioning (ValidPartitions, k, r, portion, run, step, J v); setting ValidPartitions = ValidPartitions U Dd ; and determining one DpA . e Dd . such that its elements correspond to row indexes in the (/c + v)-th column in one of the arrays P2, ..., Pr, that are all zero pairs (0, 0), wherein the indexes in Dp d . are the row positions where the pairs (ί,;) with indexes i e D\Dp d . are assigned in the (k + v)-th column of P2, - - -, Pr', until (run > 1) AND (j ≠ 0 mod r); while j < k, performing the steps of: setting j <- j + 1;

setting v j/r

setting run <- 0; determining

Dd . = ValidPartitioning (ValidPartitions, k, r, portion, run, step,Jv); setting ValidPartitions = ValidPartitions U Dd ; determining one Dpd. £ Dd. such that its elements correspond to row indexes in the (/c + v)-th column in one of the arrays P2,...,Pr, that are all zero pairs (0, 0), wherein the indexes in Dpd. are the row positions where the elements (i,j) with indexes i e D\Dpd. are assigned in the (k + v)-th column of P2,---, Pr', and when the condition j < k is no longer satisfied, outputting the determined P1:..., Pr; wherein, the steps of the algorithm further comprise: partitioning the set Nodes = {dlt ... , dk] of k data disks in disjunctive

subsets ]x, where \JV\ = r and where if r does not divide k then the last subset has k mod r elements and where Nodes = ! wherein the function ValidP artitioning is called by the algorithm and r, portion, run, step, J v as inputs and outputs the ValidP artitioning function comprises the steps of: setting D = {1, 2, ...,a}; if run≠ 0 then finding Ddthat satisfies Condition 1 and Condition 2; else finding Dd .that satisfies Condition 2; where Condition 1 is that at least one subset Dp d .has portion elements with runs of run consecutive elements separated with a distance between the indexes equal to step, wherein the elements of that subset correspond to row indexes in the (k + v)-th column in one of the arrays P2, ..., Pr, that are all zero pairs (0, 0), and the distance between two elements in one node is computed in a cyclical manner such that the distance between the elements α_! and a2 is 2; and

Condition 2 is that a necessary condition for the valid partitioning of the elements in the systematic nodes to achieve the lowest possible repair bandwidth is Dd = ¾,2for all d7iand dj2 in Jv and Dd ≠ ¾,2for all d7iand dj2 systematic nodes in the system, and if portion divides a, then Dp,dj for all dj in the /v-th subset are disjunctive, i.e., D Dp d . = {1, 2, ... , a}.

26. The method according to claim 25, wherein the combining of substripes of source data to generate a substripe pM of the redundant node p; is by linear combinations over finite fields according to the index arrays p , ... , p wherein ρί ; =∑cl i 2 Ji j2 , where 1 < i < a , 1≤ I≤r and the pair O1J2) exists in the t-th row of the index array P; and co-efficient ¾lj2 is a non-zero element in the finite field.

27. A method for storing data in a data storage system, wherein n is the total number of data storage nodes of the data storage system, k is the total number of source data nodes of the data storage system, r is the total number of redundant data nodes of the data storage system, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant data nodes comprise the same number of substripes, and wherein a satisfies either the condition 1 < a < rm or both of the conditions a = rm and (k/r) is not an integer, where m = ceiling(k/r), the method comprising: determining how to encode the source data for storing in the data storage system in dependence on the systematic coding technique of any preceding claim; determining the redundant data by encoding the source data in accordance with the determined systematic coding technique; and storing the source data and the redundant data in the data storage nodes of the data storage system.

28. The method according to claim 27, further comprising performing a mapping operation on the source data and encoded source data such that one or more of the data storage nodes stores both source data and encoded source data.

29. A method of coding source data, the method comprising: obtaining source data; and encoding the source data in accordance with a systematic coding technique determined according to any of claims 1 to 26.

30. The method according to claim 29, further comprising transferring the source data and redundant data over a network.

31. A coding technique for generating coded data from source data, the coding technique being equivalent to generating the coded data in dependence on the method of any preceding claim.

32. A generator matrix for defining how to generate coded data from source data, the generator matrix defining a code that is equivalent to a code that has been generated by the method of any preceding claim.

33. A generator matrix, G, for defining how to generate coded data from source data, wherein G has (a x n) rows and (a x k) columns with elements in a finite field in accordance with claim 26, and G defines a code that is equivalent to a code that has been generated by the method of any preceding claim and where G has the following form G = ^ , where / is an identity matrix with dimensions (a k) x (a x k) and where P is a matrix with dimensions (a x r) x (a x k) .

34. A computing system configured to perform the method of any of claims 1 to 30.

35. A computer program that, when executed by a computing system, causes the computing system to perform the method of any of claims 1 to 30.

36. A data storage system, wherein n is the total number of data storage nodes of the data storage system, k is the total number of source data nodes of the data storage system, r is the total number of redundant data nodes of the data storage system, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant data nodes comprise the same number of substripes, and wherein a satisfies either the condition 1 < a < rm or both of the conditions a = rm and (k/r) is not an integer, where m = ceiling(k/r), the data storage system configured to store data in accordance with the method of claim 27.

37. A method of recovering a node that is one of a plurality of systematically coded source and redundant nodes, the method comprising applying a decoding method that is the inverse of the method according to any of claims 1 to 30.

38. A decoding technique for recovering a node that is one of a plurality of systematically coded source and redundant nodes, wherein the decoding technique is the inverse of the coding technique according to claim 31.

39. A method of recovering one of a plurality of nodes, wherein the plurality of nodes have been coded according to the method of any of claims 1 to 30, the method comprising: obtaining a set of |-| substripes of data from nodes of the plurality of nodes other than the node being recovered; obtaining one or more further substripes of data from nodes of the plurality of nodes other than the node being recovered; using the obtained set of ^ stripes to recover one or more substripes of the node being recovered; and recovering all of the other substripes of the node being recovered in dependence on the one or more further substripes and a re-use of the obtained set of substripes.

40. A method for recovering one of a plurality of nodes that have been coded according to the method of any of claims 1 to 30, the method comprising: receiving data defining how the plurality of nodes were coded; configuring an algorithm with the received data; and determining, by the algorithm, how to recover the node in dependence on data in the other of the plurality of nodes.

41. The method according to claim 40, wherein the algorithm recovers a node, dl by performing the following steps: accessing and transferring (k— 1) |-| elements ay from all k failed systematic nodes and [-1 elements p from px where i e repairing au e Dp4l; accessing and transferring (r - 1) |-| elements py from p2, ..., pr where i e DPid ; accessing and transferring from the systematic nodes the elements ay indexed in the i -th row of the index arrays P2 , ... , Pr where i e p>d that are different from said accessed and transferred (k - 1) ^ elements ay from all k - \ non-failed systematic nodes and ^ elements ρί from p and repairing αί ; where i e D\Dp dl ; wherein: Pi, ... , pr are redundant data nodes with respective index arrays Pl t ... , Pr; and the parameters of the code are as defined in claim 26. 42. The method of any of claims 37 to 41 , wherein the nodes are nodes of a data storage system.

43. A method of decoding data, the method being equivalent to decoding data in dependence on the inverse of the generator matrix generated according to claim 32 or 33.

44. A method of reading data from a data storage system, the method comprising reading data in dependence on the method of any of claims 37 to 41. 45. A method of recovering up to r failed nodes from a plurality of systematically coded source and redundant nodes, the method being equivalent to decoding data in dependence on the inverse of the generator matrix according to claim 32 or 33. 46. The method according to any of claims 37 to 45, wherein the method is computer-implemented.

47. A computing system configured to perform the method of any of claims 37 to 46.

48. A computer program that, when executed by a computing system, causes the computing system to perform the method of any of claims 37 to 46.

Description:
SYSTEMATIC CODING TECHNIQUE FOR ERASURE CORRECTION

Field

The field of the invention is the systematic coding of data. A particularly preferred application for coded data according to embodiments is in a data storage system. The flexible code construction allows the coding to be adapted to the underlying hardware of the storage system configuration. In addition, the amount of data that is accessed and transferred in order to reconstruct unavailable data is significantly reduced from other coding techniques, such as Reed-Solomon coding.

Background

An ever increasing amount of data is being stored in large capacity distributed data storage systems. It is normal for redundancy to be introduced into stored data. The redundancy allows data within a node to be obtained when the node is unavailable, for example due to maintenance of the node or failure of part of the data storage system. The traditional approach to redundancy is replication of the stored data and one or more copies of the original data are stored in other node(s). Triple replication of stored data is an accepted industry standard. However, a problem with replication is that it has a high storage overhead. There is therefore an increasing use of erasure codes to store data as the property of being able to recover data is maintained and the storage overhead greatly reduced. Reed-Solomon (RS) codes, for example, are a widely employed erasure coding technique.

In addition to the extent to which unavailable data can be reconstructed, there are other desirable properties in a distributed data storage system, such as a small repair bandwidth and access-optimality. Repair bandwidth is the amount of transferred data required to repair a data storage unit in a data storage system in which the data is unavailable, referred to herein as a failed node. Access- optimality is achieved when the amount of accessed and transferred data during the repair process is equal. There are two types of repair of a failed node, functional and exact repair. Under functional repair, a failed node is replaced by a new node such that the resulting system continues to possess the data- reconstruction and repair properties. With exact repair, the new node has exactly the same content as the lost data. Exact repair is preferred over functional repair from a practical point of view.

In a (n, k, d) systematic erasure code for a data storage system, there are n nodes that store data. Of the n nodes, k of the nodes store source data that is data that has not been encoded by the erasure coding technique. There are r redundant nodes, where r = n - k. Each of the r redundant nodes stores data that has been coded according to the erasure coding technique. The file size, i.e. amount of source data that is stored, is B. The amount of data stored in each node is A N , where A N = B/k. The data from a failed node is recovered by transferring data from d non-failed nodes (i.e. nodes that are available). The repair bandwidth β is greater than or equal to the lower bound that is A N .

A large number of erasure coding techniques exist with differing properties. The codes are designed to have one or more of the desired properties of being Maximum-Distance Separable (MDS), systematic, achieving optimal repair bandwidth, and offering access optimality.

An erasure coding technique is presented in the paper G.K. Agarwal, B. Sasidharan, and P. Vijay Kumar, 'An alternate construction of an access-optimal regenerating code with optimal sub-packetisation level', National Conference on Communications (NCC), pages 1-6, Feb 2015, referred to herein as the Agarwal paper. In the Agarwal paper, the data in each node is stored in a substripes (i.e. sub-packets). The reconstruction of unavailable data is performed by operations on substripes of data within available nodes. The sub-packetisation level represents the minimum dimension over which all operations are performed. When the sub-packetisation level is 1 , as it is for standard RS codes, then each node is recovered by transferring data of size B/k from k nodes, i.e., the total amount of the transferred data is B. Thus, the repair bandwidth is equal to the file size when RS codes are used. By using a sub-packetisation level a > 1 , the repair bandwidth can be decreased from B. The Agarwal paper discloses the use of a sub-packetisation level of a = f 1 where m = klr. An essential condition to be able to construct the disclosed codes in the Agarwal paper is that m is an integer. The Agarwal paper does not disclose any technique for generating codes with any other sub-packetisation level than f 1 . In addition, the Agarwal paper does not disclose any technique for generating a codes for which m is not an integer.

A different coding technique from that in the Agarwal paper is disclosed in the paper Ά "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centres' by K.V. Rashimi et al, SIGCOMM 2014, Computer Communication Review, August 2014, referred to herein as the Hitchhiker paper. The Hitchhiker paper discloses a technique for improving on RS coding by using a sub-packetisation level of exactly 2 only.

There is a need to improve known erasure coding techniques. Summary According to a first aspect of the invention, there is provided a method for determining how to encode data in accordance with a systematic coding technique, the method comprising: determining the code parameters n, k, r and a, wherein n is the total number of nodes, k is the total number of source data nodes, r is the total number of redundant data nodes, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant data nodes comprise the same number of substripes, and wherein a is determined so that it satisfies the condition 1 < a < r m , where m = ceiling(k/r); determining source data nodes that comprise source data that is not encoded by the systematic coding technique; for each of the redundant nodes, determining to generate each of the substripes of data in dependence on a combination of a different substripe from each of the source data nodes such that each of the substripes is generated in dependence on a combination of k substripes of source data and the a substripes of the redundant node are generated in dependence on all of the (a x k) substripes of source data; and determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on. Preferably, (k/r) is not an integer.

Preferably, r is 2 or more, and/or r is 3 or more.

Preferably, a is 2 or more, and/or a is 3 or more.

Preferably, a is determined so that it satisfies the condition 1 < a < r m and/or (k/r) is not an integer.

Preferably, one of the redundant nodes is determined to have each of its substripes dependent on exactly k substripes of source data and the substripes of the redundant node are generated in dependence on all of the a x k substripes of source data.

Preferably, said determining of each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on is performed for r-1 redundant nodes.

Preferably, determining each of one or more of the substripes of a redundant node to be further dependent on at least one further substripe of source data that it is not currently dependent on is performed for all of the substripes of the node.

Preferably, the method further comprises performing a balanced selection of the substripes that the redundant nodes are further dependent on such that substantially the same number of read operations are required to recover each source node.

Preferably, the method is computer-implemented. Preferably, the determined systematic coding technique is MDS.

Preferably, the combining of substripes of source data to generate a substripe of a redundant node is by linear combinations over finite fields.

Preferably, the systematic coding technique is an erasure resilient systematic coding technique.

Preferably, (k/r) is an integer.

Preferably, said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on is performed in accordance with any of a random determination, a pseudo random determination, a pre-determined technique and/or an algorithm.

Preferably, when determining each of one or more of the substripes of a redundant node to be further dependent on at least one further substripe of source data that it is not currently dependent on, the selection of each further substripe of source data is independent from the order of writing and reading the previous substripes of source data.

Preferably: said step of determining source data nodes comprises determining k source data nodes {d lt d 2 , ... , d k } where each data node d j comprises an indexed set of a substripes { 1 ; , a 2 ,— , a ) as a two-dimensional array

Data with a rows and k columns such that Data

and the generation of the redundant nodes comprises: determining r redundant data nodes {p , p 2 , ...,p r } where each redundant node pi, where 1≤ I≤ r, comprises of an indexed set of a substripes determining r two-dimensional index arrays P 1: ..., P r ; determining the index array for P t to have a rows and k columns, where each cell in P 1 is a pair of indexes with the following values:

(1,1) (1,2) (W

(2,1) (2,2) (2,k)

( ,Ι) (a, 2) (a, k) determining the index arrays P 2 ,..., P r to have a rows and k + m columns, and where each cell in P l where 2≤ /≤ r, is a pair of indexes with the following values:

(1,1) (1,2) ... (l.fc) (?,?) - (?,?)- (2,1) (2,2) ... (2,fc) (?,?) - (?,?)

where the pairs with

: : ··. : : ' ·· (?,?)

L(a,l) (a, 2) ... (a,k) (?,?) - (?,?).

values (?,?) are further determined according to an algorithm; and for each of the redundant nodes {p lt p 2 , -,p r ], determining to generate each of the substripes p i where 1≤ i≤ a and 1≤ /≤ r, in dependence on a combination of different source data substripes d (Ji j2) , where the pair Ί. ' 2 ) is present in the t-th row of the index array P ; .

According to a second aspect of the invention, there is provided a method for determining how to generate a systematic code, the method comprising: receiving the code parameters n, a, k and/or r; configuring an algorithm with the received code parameters; determining, by the algorithm, how to generate a systematic code in accordance with the method of the first aspect. Preferably, n, k, a are inputs to the algorithm and index arrays Pi, ..., P r are outputs that define how to generate each of the redundant nodes, and wherein the algorithm performs the steps of: initialising Pi, ..., P r as arrays P = ((ί,/)) α χ ¾ ; appending additional m = \ /A columns to P 2 , ..., P r all initialized to (0,0); setting portion setting ValidPartitions setting j <- 0; repeating the steps of: setting ; ' <- ; ' + 1;

setting v J' /r

setting run

setting step run; determining

D d . = V alidP artitioning (ValidPartitions, k, r, portion, run, step, J v ); setting ValidPartitions = ValidPartitions U D d . and determining one D pd . £ D d . such that its elements correspond to row indexes in the (/c + v)-th column in one of the arrays P 2 ,...,P r , that are all zero pairs (0, 0), wherein the indexes in D pd . are the row positions where the pairs (i,j) with indexes i £ D\D pd . are assigned in the (k + v)-th column of P 2 ,...,P r ; until (run > 1) AND (j ≠ 0 mod r); while j < k, performing the steps of: setting ;<-; + !;

setting v Vr

setting run <- 0; determining

D d . = ValidPartitioning(yalidPartitions, k, r, portion, run, step,J v ); setting ValidPartitions = ValidPartitions U D d . determining one D pd . £ D d . such that its elements correspond to row indexes in the (/c + v)-th column in one of the arrays P 2 ,...,P r , that are all zero pairs (0, 0), wherein the indexes in D pd . are the row positions where the elements (ί,;) with indexes i £ D\D pd . are assigned in the (k + v)-th column of P 2 ,---, P r ' , and when the condition j < k is no longer satisfied, outputting the determined Pi,..., P r ; wherein, the steps of the algorithm further comprise: partitioning the set Nodes = {d lt ... , d k ] of k data disks in | k / r | disjunctive subsets ] x , where | / v | = r and where if r does not divide k then the last subset /pj has k mod r elements and where Nodes = U^ f-1 / v ! wherein the function V alidP artitioning is called by the algorithm and r, portion, run, step, J v as inputs and outputs the ValidP artitioning function comprises the steps of: setting D = {1, 2, a}; if run≠ 0 then finding D d .that satisfies Condition 1 and Condition 2; else finding D^.that satisfies Condition 2; where Condition 1 is that at least one subset D p d .has portion elements with runs of run consecutive elements separated with a distance between the indexes equal to step, wherein the elements of that subset correspond to row indexes in the (k + v)-th column in one of the arrays

P 2 , ..., P r , that are all zero pairs (0, 0), and the distance between two elements in one node is computed in a cyclical manner such that the distance between the elements α α _ χ and 2 is 2; and Condition 2 is that a necessary condition for the valid partitioning of the elements in the systematic nodes to achieve the lowest possible repair bandwidth is D d = D d . or all d 7i and d j2 in J v and D d ≠ D d . or all d ;i and d j2 systematic nodes in the system, and if portion divides a, then D p d . for all d j in the / v -th subset are disjunctive, i.e., D =u^ =1 D p d . = {1, 2, ... , a}.

Preferably, the combining of substripes of source data to generate a substripe Pi of the redundant node p ; is by linear combinations over finite fields according to the index arrays P 1 ..., P r ; wherein Pu =∑ i 1 2 a jl 2 , where l≤ i≤ a , 1≤ I≤ r and the pair (J 1 ,j 2 ) exists in the i-th row of the index array P ; and co- efficient c, , , is some nonzero element in the finite field.

According to a third aspect of the invention, there is provided a method for storing data in a data storage system, wherein n is the total number of data storage nodes of the data storage system, k is the total number of source data nodes of the data storage system, r is the total number of redundant data nodes of the data storage system, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant data nodes comprise the same number of substripes, and wherein a satisfies the condition 1 < a < r m , where m = ceiling(k/r), the method comprising: determining how to encode the source data for storing in the data storage system in dependence on the systematic coding technique of the first or second aspect; determining the redundant data by encoding the source data in accordance with the determined systematic coding technique; and storing the source data and the redundant data in the data storage nodes of the data storage system.

Preferably, the method further comprises performing a mapping operation on the source data and encoded source data such that one or more of the data storage nodes stores both source data and encoded source data. According to a fourth aspect of the invention, there is provided a method of coding source data, the method comprising: obtaining source data; determining how to encode the source data in accordance with the systematic coding technique of any of the first aspect; and encoding the source data in accordance with the determined systematic coding technique.

Preferably, the method further comprises transferring the source data and redundant data over a network.

According to a fifth aspect of the invention, there is provided a coding technique for generating coded data from source data, the coding technique being equivalent to generating the coded data in dependence on the method of any of the first to fourth aspects.

According to a sixth aspect of the invention, there is provided a generator matrix for defining how to generate coded data from source data, the generator matrix defining a code that is equivalent to a code that has been generated by the method of any of the first to fifth aspects.

According to a seventh aspect of the invention, there is provided a generator matrix, G, for defining how to generate coded data from source data, wherein G has (a x n) rows and (a x k) columns with elements in a finite field in accordance with an above aspect, and G defines a code that is equivalent to a code that has been generated by the method of any of the first to sixth aspects and where G has the following form G = ^ , where / is an identity matrix with dimensions (a k) x (a x k) and where P is a matrix with dimensions

(a x r) x (a x k) .

According to an eighth aspect of the invention, there is provided computing system configured to perform the method of any of the first fourth aspects.

According to a ninth aspect of the invention, there is provided a computer program that, when executed by a computing system, causes the computing system to perform the method of any of the first to fourth aspects.

According to a tenth aspect of the invention, there is provided a data storage system, wherein n is the total number of data storage nodes of the data storage system, k is the total number of source data nodes of the data storage system, r is the total number of redundant data nodes of the data storage system, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant data nodes comprise the same number of substripes, and wherein a satisfies the condition 1 < a < r m , where m = ceiling(k/r), the data storage system configured to store data in accordance with the method of the third aspect.

According to an eleventh aspect of the invention, there is provided method of recovering a node that is one of a plurality of systematically coded source and redundant nodes, the method comprising applying a decoding method that is the inverse of the method according to any of the first to fourth aspects.

According to a twelfth aspect of the invention, there is provided a decoding technique for recovering a node that is one of a plurality of systematically coded source and redundant nodes, wherein the decoding technique is the inverse of the coding technique according to the fifth aspect.

According to a thirteenth aspect of the invention, there is provided a method of recovering one of a plurality of nodes, wherein the plurality of nodes have been coded according to the method of any of the first to fourth aspects, the method comprising: obtaining a set of ^ substripes of data from nodes of the plurality of nodes other than the node being recovered; obtaining one or more further substripes of data from nodes of the plurality of nodes other than the node being recovered; using the obtained set of ^ stripes to recover one or more substripes of the node being recovered; and recovering all of the other substripes of the node being recovered in dependence on the one or more further substripes and a re-use of the obtained set of [-] substripes.

According to a fourteenth aspect of the invention, there is provided a method for recovering one of a plurality of nodes that have been coded according to the method of any of the first to fourth aspects, the method comprising: receiving data defining how the plurality of nodes were coded; configuring an algorithm with the received data; and determining, by the algorithm, how to recover the node in dependence on data in the other of the plurality of nodes.

Preferably, the algorithm recovers a node, d l by performing the following steps: accessing and transferring (k— 1) |-| elements a ij7 - from all k failed systematic nodes and [-1 elements p u from pi where i e repairing a u e D p l ; accessing and transferring (r - 1) |-| elements p ij7 - from p 2 , ... , p r where

accessing and transferring from the systematic nodes the elements a j indexed in the i -th row of the index arrays P 2 , ... , P r where i e D p d . that are different from said accessed and transferred (k— 1) ^ elements a itj from all k— 1 non-failed systematic nodes and ^ elements ρ ίΛ from p ; and repairing a u where i e D\D p dl ; wherein:

Pi,..., p r are redundant data nodes with respective index arrays Pi, ... , P r ; and the parameters of the code are as defined in the second aspect.

Preferably, the nodes are nodes of a data storage system.

According to a fifteenth aspect of the invention, there is provided a method of decoding data, the method being equivalent to decoding data in dependence on the inverse of the generator matrix generated according to the sixth or seventh aspects.

According to a sixteenth aspect of the invention, there is provided a method of reading data from a data storage system, the method comprising reading data in dependence on the method of any of claims eleventh to fifteenth aspect.

According to a seventeenth aspect, there is provided a method of recovering up to r failed nodes from a plurality of systematically coded source and redundant nodes, the method being equivalent to decoding data in dependence on the inverse of the generator matrix according to the sixth or seventh aspects.

Preferably, the method is computer-implemented. According to an eighteenth aspect, there is provided a computing system configured to perform the method of any of the eleventh to seventeenth aspects.

According to a nineteenth aspect, there is provided a computer program that, when executed by a computing system, causes the computing system to perform the method of any of the eleventh to seventeenth aspects.

According to a twentieth aspect, there is provided a method for determining how to encode data in accordance with a systematic coding technique and encoding data in accordance with the determined systematic coding technique, the method comprising: determining the code parameters n, k, r and a, wherein n is the total number of nodes, k is the total number of source data nodes, r is the total number of redundant nodes, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant nodes comprise the same number of substripes, and wherein a is determined so that it satisfies either the condition 1 < a < r m or both of the conditions a = r m and (k/r) is not an integer, where m = ceiling(k/r); determining source data nodes that comprise source data that is not encoded by the systematic coding technique; for each of the redundant nodes, determining to generate each of the substripes of data in dependence on a combination of a different substripe from each of the source data nodes such that each of the substripes is generated in dependence on a combination of k substripes of source data and the a substripes of the redundant node are generated in dependence on all of the (a x k) substripes of source data; and determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, wherein said determination comprises selecting a further substripe of source data for a redundant node to be further dependent on with the further substripe being selectable from any one of the k source nodes; and encoding data in accordance with the determined systematic coding technique.

Preferably, each of the substripes within each node is identifiable by substripe index level, i, where 1 < i < a and i is an integer; for at least one of the redundant nodes, said determining to generate each of the substripes of data in dependence on a combination of a different substripe from each of the source data nodes comprises determining, for each substripe of said at least one of the redundant nodes, the substripe of the redundant node to be a combination of a single substripe from all of the source data nodes with the substripe of the redundant node and the substripes from each source data nodes all having the same substripe index level; and in said step of determining each of one or more of the substripes of at least one of the redundant nodes to be further dependent on at least one further substripe of source data that it is not currently dependent on, the determination comprises selecting a further substripe of source data for a redundant node to be further dependent on with the further substripe being selectable from any one of the a substripe index levels; and, preferably, the determination comprises selecting substripes from the source data nodes as further substripes of source data that one or more of the redundant nodes are further dependent on with the selection comprising at least one substripe from each of the a substripe index levels.

Preferably, for each substripe index level, there is at least one substripe of a redundant node that is dependent on an additional substripe of source data with a different substripe index level from the substripe of said at least one substripe of the redundant node. Preferably, in said step of determining each of one or more of the substhpes of at least one of the redundant nodes to be further dependent on at least one further substhpe of source data that it is not currently dependent on, the determination comprises, for all but one of redundant nodes, the redundant nodes to be determined in dependence on at least one additional substhpe of source data.

Preferably, in said step of determining each of one or more of the substhpes of at least one of the redundant nodes to be further dependent on at least one further substhpe of source data that it is not currently dependent on, the determination comprises selecting at least one substhpe from each of the k source data nodes as further substhpes of source data that one or more of the redundant nodes are further dependent on.

Preferably, in said step of determining each of one or more of the substhpes of at least one of the redundant nodes to be further dependent on at least one further substhpe of source data that it is not currently dependent on, the determination comprises selecting at least one substhpe from each of the k source data nodes as further substhpes of source data that one of the redundant nodes is further dependent on.

Preferably, in said step of determining each of one or more of the substhpes of at least one of the redundant nodes to be further dependent on at least one further substhpe of source data that it is not currently dependent on, the determination comprises, for each of two or more of the redundant nodes, selecting at least one substhpe from each of the k source data nodes as further substhpes of source data that the redundant node is further dependent on. List of Figures

Figure 1 shows a coding technique according to an embodiment; Figure 2 shows a coding technique according to an embodiment; Figure 3 demonstrates the advantageous performance of coding techniques according to embodiments over known coding techniques; and Figure 4 is a flowchart of a method for determining how to encode data in accordance with a systematic coding technique according to an embodiment.

Description Embodiments of the invention provide an advantageous systematic erasure coding technique. It is shown that a code with (n, k, d = n - 1), that is constructed according to an embodiment, can have one or more of the advantageous properties of being Maximum-Distance Separable (MDS), systematic, having a flexible sub-packetisation level, providing minimum repair bandwidth, access optimality and fast decoding. A particularly advantageous property of the coding technique is the flexibility in the sub-packetisation level. That is to say, the number of substripes, i.e. sub-packets, that the data within a node is divided into can be flexibly chosen. Moreover, the sub-packetisation level can approach and/or include the lower bound as defined in the paper 'Network coding for distributed' by A.G. Dimakis et al, IEEE Transactions on information Theory, September 2010.

A problem with the codes disclosed in the Agarwal paper is that it is essential for k/r to be an integer in order for the codes to be constructed. This excludes the use of many widely used data storage systems, such as those with (n,k) = (14, 10).

Embodiments do not experience this problem and the codes can be constructed both when k/r is an integer and when k/r is not an integer. According to embodiments, the sub-packetisation level, a, can be equal or lower than r fml where m = k/r and [ l is the ceiling function. The ceiling function is equivalently represented as ceiling( ). The ceiling function rounds a non-integer value up to the next integer. Advantageously, the coding technique can be flexibly applied in data storage systems, such as those with (n,k) = (14, 10).

The construction of a systematic erasure code according to embodiments is described in detail below.

In a data storage system, it is quite rare for there to be a failure of a node and very rare for multiple node failures to occur simultaneously. During maintenance of the data storage system, disruption is usually minimised by only one node being offline at any given time. Accordingly, by far the most frequent requirement in a data storage system is the recovery of only one node.

The code according to embodiments is a systematic (n,k) code. That is to say, there are n nodes that store data. Of the n nodes, k of the nodes store source data that is data that has not been encoded by the erasure coding technique. There are r redundant nodes, where r = n - k. The performance of the code is presented with (n,k,d = n-1). The performance of the code is therefore demonstrated in the situation with only one node being recovered as this is by far the most common practical situation experienced. However, the code can recover up to r simultaneous failures.

The codes according to embodiments define redundant nodes as each comprising a linear combination of all of the source data packets. At least one of the redundant nodes is defined so that its substhpes are defined a linear combination of all of the substhpes of source data but with each substhpe of source data used only once in the construction of the substhpes of the redundant node. This is the left hand redundant node in Figures 1 and 2.

For the other redundant nodes, one or more of their substhpes are further defined as being dependent on additional substhpes of the source data. There is the condition that the introduced substhpes into the construction of a substhpe of the redundant node are not substhpes of source data that that particular substhpe of the redundant node is already dependent on. The introduced substhpes, and into which substhpes of the redundant nodes they are added, may otherwise be selected randomly, pseudo-randomly, or according to a predetermined technique. If the introduced substhpes are selected randomly, or pseudo-randomly, then preferably the determined code is tested so that it still meets the properties of being MDS. The introduction of substhpes may then be repeatedly changed and tested until a code with the desired properties is obtained.

Such a definition of redundant nodes can be seen in Figures 1 and 2. Advantageously, the introduction of the additional substhpes into the definition of redundant nodes greatly reduces the amount of data that needs to be read in order to reconstruct an unavailable node. After data has been read to reconstruct a first substripe of the node, the re-construction of the other substhpes can be performed with a substantial re-use of the already read data. A technique for constructing codes according to embodiments is set out below.

Consider a file of size B = kA N symbols from a finite file F q stored in k systematic nodes d j of data capacity A N . The capacity of a node can also be expressed by the sub-packetisation level, a, that represents the number of substhpes, i.e. sub-packets, or symbols, of data stored by the node. The substhpes are the smallest blocks of data transferred in operations to both encode and recover (i.e. decode) a node.

We define a code according to embodiments in the following way:

The data from the k source data nodes {d lt d 2 , ... , d k } where each data node d j comprises of an indexed set of a substhpes {a 1:j , a 2 , - , a ) is presented as a two-dimensional array Data with a rows and k columns

Let P = be an index array of size a x k where a≤ r iml and m = r- We define r two-dimensional index arrays P 1 ..., P r for the r redundant data nodes {p 1 ,p 2 , - ,p r ] where each redundant node p where 1≤ l≤r, comprises of an indexed set of a substripes {p 1 ,i,p 2 ,i, -,p a ,i}- The symbols p i in the parity nodes, where 1≤ i≤ a and 1≤ I≤r, are generated as a combination of the elements from the source data nodes a^ ^, where the pair (J 1 ,j 2 ) is present in the ί-th row of the index array P ; . We refer to the elements of the nodes as elements or symbols and we use the terms interchangeably. The elements of each of the a rows are linearly combined with coefficients from the finite field F q . These linear relations are determined according to known techniques so that they provide an MDS code, i.e., to have the property that the entire source data can be recovered from any k nodes (systematic or parity).

The index array for P 1 has a rows and k columns, and each cell in P 1 is a pair of indexes with the following values: P

The index arrays P 2 ,..., P r have a rows and k + m columns, and each cell in P where 2≤ /≤ r, is a pair of indexes with the following values:

(1,1) (1,2) ... (1,/c) (?,?) - (?,?)

(2,1) (2,2) ... (2,/c) (?,?) - (?,?)

P, = , where the pairs with values (?,?)

: : ··. : : "·. (?,?)

L(a,l) (a, 2) ... (cU) (?,?) - (?,?)J

are further determined with Algorithm 1. We use the following terms and variables in Algorithm 1: Algorithm 1 Algorithm to generate the index arrays

Input: n, k,a

Output: Index arrays P r

Initialization: P ,..., P r are initialized as index arrays P

2. Append additional |Vr| c °l umns to P 2 ,---, P r all initialized to (0,0); . Set portion . Set ValidPartitions <- 0; . Set 7 <- 0; . # Phase 1 . repeat . Set j<-j + 1;

10. Set run <-

11. Set step run:

12. D d . = ValidPartitioning(ValidPartitions,k,r, portion, run, step, J v );

13. Set ValidPartitions = ValidPartitions U D d .

Determine one D pd . e D d . such that its elements correspond to row indexes in the (k + v)-th column in one of the arrays P 2 ,---, P r , that are all zero pairs (0, 0);

The indexes in D pd . are the row positions where the pairs with indexes i£D\D pd . are assigned in the {k + v)-th column of 16. until (run > 1) AND (j ≠ 0 mod r)

17. #Phase2

18. while j < k do

19. Set7 <-7 + l;

21. Set run <- 0;

22. D d . = V alidPartitioning(ValidPartitions,k,r, portion, run, step, / v );

Set ValidPartitions = ValidPartitions U D,

24. Determine one Z ) Pjd e D d . such that its elements correspond to row indexes in the (k + v)-th column in one of the arrays P 2 ,---, P r , that are all zero pairs (0, 0);

25. The indexes in D pd . are the row positions where the pairs with indexes i e D\D pd . are assigned in the (k + v)-th column of

P 2 Pr ' 26. end while

27. Return Ρ^.,.,Ρ,..

Algorithm 1 further calls the function V alidP artitioning where ValidPartitions, k,r,portion,run, step, J v are inputs to the function and D d . =

{D dj ,...,Dr idj } is an output; 1. Set D = {1, 2, ... , a};

2. If run≠ 0 then

3. Find ¾ that satisfies Condition 1 and Condition 2;

4. Else 5. Find D d . that satisfies Condition 2;

6. End if

7. Return D d ; where P 1 ..., P r and a are global variables;

The above algorithm receives as inputs n, k, . The input n is typically the largest number of available nodes for storing source data and redundant data in a data storage system. The input k, which is the number of nodes storing source data, is a matter of design choice. Decreasing the value of k increases the number of nodes that can be recovered but also increases the storage overhead. The input a is a matter of design choice. It was shown in the Agarwal paper that, under specific conditions, if a had certain specific values determined by n and k then improvements were made over RS codes in which a = 1. Embodiments advantageously allow a to be flexibly chosen. The choice of a may be motivated by a value that is particularly appropriate for the underlying hardware of the data storage system and/or achieving performance gains over RS coding.

The output of the algorithm are index arrays that determine the way of combining substripes of source data to form all of the redundant nodes, also referred to as parity nodes, of a data storage system. The terms and variables in Algorithm 1 and the function V alidP artitioning are described further below:

• partitioning the set Nodes = {d lt ... , d k } of k nodes in \ i r \ disjunctive subsets A, where I AI = r (if r does not divide k then the last f-l

subset /p| has k mod r elements) and Nodes = \J 1 1 J V - Embodiments include this partitioning being any selection of k nodes, including random selections. Without loss of generality we use the natural ordering as follows: A = {<A, ... , d r ], = {d r+1 , ... , d 2r ], ...

• Each node d j comprises an indexed set of a symbols { 1 ; a 2 j , a a j }; portion , the set of all symbols in d, is partitioned in disjunctive subsets where at least one subset has portion number of elements.

• The algorithm has two Phases. Phase 1 ends when the value of run becomes 1 and the indexes of all nodes from a specific J v have been scheduled. In Phase 2, the indexes from the remaining nodes are scheduled in the index arrays. run for values of v e l, ... , j

• step = [^] - run, for the subsequent (/c + v)-th column, where v e l, ... , j, the scheduling of the indexes corresponding to the nodes in is done in subsets of indexes from a valid partitioning.

• A valid partitioning D d . = ^D l d ., ... , D r dj ^ of a set of indexes D = {1, 2, ... , a], where the i -th symbol in d j is indexed by i in D, is a partitioning in r disjunctive subsets D d . =u p=1 D p dj . If r divides a, then the valid partitioning for all nodes in J v is equal. If r does not divide a, then the valid partitioning has to contain at least one subset D p d . with portion pairs that correspond to row indexes in the (k + v)-th column in one of the arrays P 2 , ..., P r , that are all zero pairs.

Condition 1 : At least one subset D p d .has portion pairs with runs of run consecutive elements separated with a distance between the indexes equal to step. The elements of that subset correspond to row indexes in the (/c + v)-th column in one of the arrays P 2 , ..., P r , that are all zero pairs (0, 0). The distance between two elements in one node is computed in a cyclical manner, i.e., the distance between the elements α _ χ and 2 is 2; Condition 2: A necessary condition for the valid partitioning of the index pairs in the systematic nodes to achieve the lowest possible repair bandwidth is D dh = D dj or all d ji and d in ] v and D dh ≠ D dj or all d jl and d systematic nodes in the system. If portion divides a, then D p d . for all d j in the / v -th subset are disjunctive, i.e., D =uj =1 D p d . = {1, 2, ... , a}. The corresponding algorithm for the repair, i.e. recovery, of a single systematic node di is given in Algorithm 2. A set of ^ symbols are accessed and r l k transferred from each of the n— 1 non-failed nodes. If a≠ r |m | , where m = -, r then additional elements may be required as described in Step 4 of Algorithm 2. Note that when reading data in from a data storage system, specific elements of the data are transferred from their nodes just once and then stored in a buffer. For every subsequent use of that element the element is read from the buffer and a further transfer operation to obtain the element from the data storage system is not required. Advantageously, the amount of read data is less than, for example, that required for RS coding.

Algorithm 2 Repair of a systematic node d l 1. Access and transfer (fc— 1) - elements ί ; from all k— 1 non-failed systematic nodes and [-1 elements ρ ί from p x where i e D p dl ;

2. Repair a u e D pA

3. Access and transfer (r - 1) - elements p ij7 - from p 2 , ..., p r where i e D Pid ;

4. Access and transfer from the systematic nodes the elements a i with indexes listed in the i -th row of the index arrays P 2 , ..., P r where i e D p d . that have not been read in step 1 ;

5. Repair a u where i e D\D p dl ;

Proposition 1 : The repair bandwidth β to repair a single systematic node is bounded between the following values (lower and upper bound of the repair traffic):

(n - l)

+

The optimality of the proposed Algorithm is captured in the following Proposition.

Proposition 2: The indexes (i, I) of the elements i where i e D\D p dl for each group of r systematic nodes are scheduled in one of the additional columns in the index arrays P 2 , ... P r .

Next we show that there always exists a set of non-zero coefficients from F the linear combinations so that the code is MDS. We adopt Theorem 4.1 in Agarwal paper: Theorem 1 : There exists a choice of non-zero coefficients c l i j where / = 1, ..., r, i = 1, ..., a and j = 1, ..., k from F q such that the code is MDS if q≥

Examples of the generation of a code according to the Algorithms presented herein are provided below. Embodiments are first demonstrated with a (9,6,8) code that has the unusual sub-packetisation level equal of 7. Embodiments are then demonstrated with a (14, 10, 9) code. This is a practical code that is used, for example, in the data storage system of Facebook™.

The first embodiment constructs a (n,k,d) = (9,6,8) code. The optimal sub- packetisation level to achieve the lower bound of the repair traffic given in the

Agarwal paper for this code is = 9 and this is the only sub-packetisation level possible with the codes in the Agarwal paper. The present embodiment demonstrates the use of a flexible sub-packetisation level with = 7.

The algorithms disclosed herein allow any sub-packetisation level to be used that is less than or equal to r^ . Figure 1 shows the structure of the systematic code generated according to the embodiment. The nodes d 1 ..., d 6 are the systematic nodes, also referred to as source data nodes, and store source data. The nodes p 1 p 2 and p 3 are the redundant nodes, also referred to as parity nodes. The file size B is 42 symbols. The elements of p x are linear combinations of the row elements from the systematic nodes multiplied by coefficients from a finite field, while the parity nodes p 2 and p 3 have further elements besides the row sum. The embodiment schedules the indexes of the elements of the systematic nodes that do not belong to D p d . at portion = 3 positions on the (6 + v)-th column, v = 1, 2, of P 2 and P 3 .

The following steps are performed to determine the further elements in the parity nodes p 2 and p 3 besides the row sum. Initialize P t P 2 , P 3 as arrays P

Append additional = 2 columns to P 2 and P 3 initialized to (0, 0).

3. We use the notation D p d ., j = !, ...,&, to denote the subset with its elements corresponding to row indexes in the (6 + v)-th column, v = 1 , 2, in one of the arrays P 2 and P 3 , that are all zero values (0, 0).

4. Set portion equal to 3 and ValidPartitions to an empty set.

5. # Phase 1 :

For the systematic nodes d Xl d 2 and d 3 in J 1 run is equal to 3 and step is equal to 0,

6. We call the function ValidPartitioning() with appropriate parameters for this phase. In this phase both Condition 1 and Condition 2 have to be fulfilled. For the node d x as the output from the function ValidPartitioningQ we get D di = {{1,2,3}, {4,5,6}, {7}}. Further on with steps 14 and 15 of Algorithm 1 we determine D p di = {1, 2, 3}. This is due to the fact that the first 3 zero elements in the 7-th column of P 2 are at the positions (i, 7) where i = 1,2,3. Note that the run length is 3 and the distance between the indexes is 0. The i indexes of the remaining pairs (i, 1) where i = 4, ...,7 belong to 2 other subsets D\D p di , i.e., D 2 di = {4, 5, 6} and D 3 di = {7}. The pairs (i, 1) for i e D\D p di are added in the 7- th column of P 2 , P 3 . Similarly, for the node d 2 as the output from the function ValidPartitioningQ we get D di = {{1,2,3}, {4,5,6}, {7}} and after steps 14 and 15 in Algorithm 1 we get D p d2 = {4, 5, 6}. For the node d 3 we get D d3 = {{1,2,3}, {5,6,7}, {4}} and D p>d3 = {5, 6, 7}.

7. At the end of this phase ValidPartitions = {{1,2,3}, {4,5,6}, {7}, {5,6,7}, {4}} 8. At the end of this phase a snapshot from three index arrays P t P 2 , P 3 is like this:

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

Ρ = 1 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) ( ) (1,2) (1,3) (1,4) (1,5) (1,6) (4,1) (ο,ο;

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (5,1) (ο,ο;

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (6,1) (ο,ο;

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (1,2) (ο,ο;

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (2,2) (ο,ο;

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (3,2) (ο,ο;

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (4,3) (ο,ο;

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (7,1) (ο,ο;

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (0,0) (ο,ο;

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (0,0) (ο,ο;

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (7,2) (ο,ο;

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (1,3) (ο,ο;

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (2,3) (ο,ο;

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (3,3) (ο,ο;

9. # Phase 2:

10. For the systematic nodes d 4 , d 5 and d 6 in ] 2 set run equal to 0. Now, when we call the function ValidPartitioningQ with appropriate parameters for this phase, only the Condition 2 has to be fulfilled. For the node d 4 as the output from the function ValidPartitioningQ we get D d4 = {{1,4,7}, {2,5}, {3,6}} and after steps 24 and 25 in Algorithm 1 we get Dp.d t = {1, 4, 7}. A similar steps and outputs are obtained for d 5 and d 6 and they are: D pAs = {2, 5, 1} and D pAb = {3, 6, 2}.

1 1. At the end of this phase ValidPartitions =

{{1,2,3}, {4,5,6}, {7}, {5,6,7}, {4}, {1,4,7}, {2,5}, {3,6}, {2,5,1}, {3,6,2}}

12. At the end of this phase a snapshot from three index arrays P t P 2 , P3 is like this:

r(l,l) (1,2) (1,3) (1,4) (1,5) (1,6)1

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

i = (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

L(7,l) (7,2) (7,3) (7,4) (7,5) (7,6)J ( ) (1,2) (1,3) (1,4) (1,5) (1,6) (4,1) (2,4)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (5,1) (3,5)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (6,1) (1,6)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (1,2) (5,4)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (2,2) (6,5)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (3,2) (5,6)

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (4,3) (0,0)

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (7,1) (3,4)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (0,0) (4,5)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (0,0) (4,6)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (7,2) (6,4)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (1,3) (7,5)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6) (2,3) (7,6)

(7,1) (7,2) (7,3) (7,4) (7,5) (7,6) (3,3) (0,0)

The above demonstrates how the indexes of the substripes used to generate each of the redundant nodes of a systematic erasure code are determined according to an embodiment.

Next we demonstrate how the systematic node d x is recovered, according to an embodiment, in the event of this node being unavailable. First, we repair the elements a i>1 i e D p di , i.e., α ίΛ where i = 1,2,3.

We therefore access and transfer 3 ί ; elements where i = 1,2,3 and j = 2, ... , 6 from the non-failed systematic nodes and 3 p i elements where i = 1,2,3 from p 1 . In order to recover the rest of the elements a , i e D\D p di , we need to access and transfer 3 symbols p i 2 where i = 1,2,3 from p 2 and 1 symbol p 1 3 from p 3 . This demonstrates that the data from d x can be recovered by accessing and transferring in total 22 elements from the 8 non-failed nodes. The same amount of data, 22 symbols, is needed to recover any of the other systematic nodes in the event of the failure of a single node. Accordingly, codes generated according to algorithm 1 are balanced in that the same amount of data needs to read in order to recover any one of the nodes.

In preferred embodiments, one of the redundant nodes is generated in dependence on all of the substripes of the source data nodes with all of the substripes of the redundant node comprising different substripes of the source data nodes such that no two substripes of the redundant node are generated in dependence on the same substripe of source data. The coding advantages of embodiments are realised by the generation of one or more further redundant nodes that are also generated in dependence on all of the substripes of the source data nodes but with one or more substripes within each node being generated in further dependence of source data packets such that the same substripe(s) of source data are used in the generation of different substripes of the same redundant node. This technique allows the re-use of already read data when reconstructing a node and therefore less data needs to be read than in RS coding.

Figure 2 shows the structure of the systematic (14, 10, 13) code generated according to a second embodiment. Note that this code cannot be generated with the method presented in the Agarwal paper. The optimal sub-packetisation level to achieve the lower bound of the repair traffic given in the paper 'Network coding for distributed' by A.G. Dimakis et al, IEEE Transactions on information

Theory, September 2010 is 4·Μ = 64 but constructing a (14, 10, 13) code with this sub-packetisation level is not possible with the codes disclosed in the Agarwal paper.

We demonstrate the advantage of embodiments of being able to construct an access-optimal code for any sub-packetization level, thus we set a = 8. The nodes d 1 ..., d 10 are the systematic nodes, also referred to as source data nodes, and store source data. The nodes p 1 p 2 , p 3 and p 4 are the redundant nodes, also referred to as parity nodes. The file size B is 80 symbols. The elements of p x are linear combinations of the row elements from the systematic nodes multiplied by coefficients from a finite field, while the parity nodes ρ 2 , ρ 3 and p 4 have extra elements besides the row sum.

The above described Algorithm 1 is used to determine the additional substripes of source data that all but one of the parity nodes is to be generated in dependence on. The embodiment schedules the elements a itj from a specific d j where i e D\D p d . at portion positions in ί-th row, i e D p d . and the (10 + v)-th column, v = 1, 2, 3 of p 2 , p 3 and p 4 . The determined combinations of substripes for each of the parity nodes is shown in Figure 2.

To demonstrate the improved performance of embodiments, we compare the performance for an (14, 10) code under RS, those in the Hitchhiker paper and codes generated according to embodiments. Figure 3 depicts the correlation between the average repair bandwidth and the sub-packetisation level. Average repair bandwidth is defined as the ratio of the total repair bandwidth to repair all systematic nodes to the file size B. The average repair bandwidth is equal to the file size for a RS code and this corresponds to the value of the repair bandwidth when the sub-packetisation level is 1 . A Hitchhiker code with a sub-packetisation level equal to 2 reduces the repair bandwidth by 35% from that of the RS code. The rest of the values for the average repair bandwidth are for different sub-packetisation levels and determined for the codes according to embodiments. The lowest average repair bandwidth is 3.25 and is reached for a = r m = 64, where m = [Vr|- The codes according to embodiments reduce the repair bandwidth for any systematic node by 67.5% from that of the RS code.

Embodiments therefore provide a general construction of access-optimal regenerating codes for any systematic node when the sub-packetisation level a is less than or equal to r m , where m = |Vr| and r ' s n °t restricted to being an integer. The wide range for sub-packetisation levels together with the two Phases in Algorithm 1 in the code construction lead to a high flexibility of constructing codes for different code rates and for different bottlenecks caused by underlying hardware and system configurations. The lower bound of the repair bandwidth is achieved for a = r^, while the repair bandwidth is close the lower bound when a < r^l. The repair process of a failed systematic node is linear and highly parallelized. A set of \ a / r ] symbols is independently repaired first and used along with the accessed data from other nodes to repair the remaining symbols.

Codes according embodiments include codes in which Vr is an integer. For such codes, a may be less than r m , where m = Vr- Embodiments do also include codes in which Vr ' s an integer and a is equal to r m , where m = Vr- There are advantages of codes according to these embodiments over the codes disclosed in the Agarwal paper. For example, the codes according to embodiments have improved properties such as providing access optimality, fast decoding, consecutiveness of stripes and others. In addition, as described in more detail later, the techniques and algorithm disclosed herein for constructing codes according to embodiments can be used to construct a large number of codes and the codes with the most appropriate properties can be selected for a particular application.

Accordingly, the embodiment provides an advantageous structure of code. Embodiments are not limited to the construction of codes according to the specific algorithms disclosed herein and embodiments extend to any systematic erasure coding technique that determines how to generate substripes of redundant nodes in a way that improves over RS coding whilst providing advantageous flexible code construction according to embodiments.

As shown in Figures 1 and 2, the first of the redundant nodes is generated as a linear combination of substripes within the source data nodes. The other redundant nodes are based on a similar combination of source nodes as the first redundant node, though the linear coefficients of each node may differ. The other redundant nodes may be based on the same combination of substripes as for the first redundant node and only differ due to the additional substripes that are introduced into the combination. The algorithms presented herein determine how to generate such codes and to achieve significant advantages over RS coding. However, in practice, the addition of any additional substripes into the combination improves on RS coding so long as the introduced substripe is not a substripe that the substripe of the redundant node is already determined to be dependent on. The operation at the substripe level allows the re-use of already obtained data when recovering an unavailable node and thereby reduces the amount of data that needs to be read from RS coding. In particular, when determining which additional substripes are included in a combination for generating a substripe of a redundant node, there is a lot of flexibility on which substripes can be included. For example, the additional substripes that are determined for inclusion in a combination are not restricted by the condition of being dependent on message symbols from previous instances. That is to say, the selection of each additional substripe is independent from the order of writing and reading the previous substripes of source data. Preferably, the selection of each additional substripe is also independent from the previous determinations of substripe combinations. For example, an additional substripe may be selected for any of the substripe levels of any of the source nodes. This flexibility allows codes with a wide range of properties may be generated so as to realise, for example, an advantageous repair bandwidth.

Particularly preferred determinations of additional substripes for generating one or more substripes of a redundant node in dependence on include one or more of the following:

- At least one additional source data substripe being selectable from any one of the source nodes;

- At least one additional source data substripe from each substripe index level being included in the redundant nodes, wherein each of the substripes within each node is identifiable by substripe index level, i, where 1 < i < a and i is an integer;

For each substripe index level, there is at least one substripe of a redundant node that is dependent on an additional substripe of source data with a different substripe index level from the substripe of said at least one substripe of the redundant node;

- At least one additional source data substripe being selectable from any one of the substripe index levels; for all but one of redundant nodes, the redundant nodes to be determined in dependence on at least one additional substripe of source data;

selecting at least one substripe from each of the source data nodes as further substripes of source data that one or more of the redundant nodes are further dependent on;

selecting at least one substripe from each of the source data nodes as further substripes of source data that one of the redundant nodes is further dependent on; and

for each of two or more of the redundant nodes, selecting at least one substripe from each of the source data nodes as further substripes of source data that the redundant node is further dependent on.

Accordingly, embodiments provide a systematic erasure coding technique with a flexible sub-packetisation level that improves on RS coding. Improvements are made by determining each of one or more of the substripes of at least one other redundant node than the first redundant node to be further dependent on at least one further substripe of source data that it is not currently dependent on.

Figure 4 is a flowchart of a method for determining how to encode data accordance with a systematic coding technique according to an embodiment.

In step 401 , the process begins.

In step 403, the code parameters n, k, r and a are determined, wherein n is the total number of nodes, k is the total number of source data nodes, r is the total number of redundant data nodes, such that n = k + r, a is the number of substripes of data in one of the nodes and each of the source data nodes and redundant data nodes comprise the same number of substripes, and wherein a satisfies the condition 1 < a≤ r m , where m =

In step 405, source data nodes are determined that comprise source data that is not encoded by the systematic erasure coding technique. In step 407, it is determined to generate, for each of the r redundant nodes, each of the a substripes of data in dependence on a combination of a different substhpe from each of the k source data nodes such that each of the a substripes is generated in dependence on a combination of k substripes of source data and the a substripes of the redundant node are generated in dependence on all of the x k substripes of source data.

In step 409, each of one or more of the a substripes of at least one of the r redundant nodes are determined to be further dependent on at least one further substhpe of source data that it is not currently dependent on.

In step 41 1 , the process ends.

Embodiments of the invention also include a number of modifications and variations to the embodiments as described above.

Preferably, each coding of source and redundant nodes that are generated according to embodiments is tested to determine that it has the property of being MDS. This preferably is part of an iterative process with each iteration changing one or more of the coefficients of one of the additional substripes of source data that a redundant node is made to be dependent on, the actual substhpe of source data or the substhpe of the redundant node that an additional substhpe is added to. The iterative process would be stopped as soon as the generator matrix was determined to be MDS. Advantageously, the coding of source and redundant nodes according to this embodiment is always MDS.

An advantageous property of the coding of source and redundant nodes according to embodiments is that, when the coding is expressed as a generator matrix, if all of the columns of the generator matrix are linearly independent, the generator matrix defines an MDS coding technique. The process of determining if a code is MDS is therefore straightforward for a skilled person to implement.

Table 1

Example values for k and r for implementation in embodiments are shown in Table 1 . The codes shown in Table 1 cover many common disk array sizes. Particularly preferable are codes with k a lot greater than r. These are the most efficient as only one source node reconstruction at a particular time is likely to be required. The underlying finite field may be GF(256). Embodiments also include using more complex GF(256) operations and this would allow larger configurations to be realised. The codes according to embodiments can be constructed over any GF field of size GF(2 n ) so long as n is large enough for a MDS erasure code to be generated. A typical implementation would have a GF field size of GF(16) or GF(256). Embodiments are particularly appropriate for generating data for storage in nodes of a data storage system. In a typical implementation of such a system the number of redundant data nodes is a lot fewer than the number of source data nodes. In a preferred implementation of an embodiment in a data storage system, the number of substripes in each node is less than or equal to r m , where m

A particularly advantageous property of the codes according to embodiments is that there is a lot of flexibility when generating codes with the same parameters of k, r, a. That is to say, a large number of different codes can be generated with all of the codes being designed for systems with the same number of source nodes, the same number of redundant nodes and the same sub-packetisation level. However, the generated codes will differ in their properties and by selecting codes for use that have the most appropriate properties for a particular application the used codes can be optimised for the particular application. Examples of properties that the codes can be optimised with respect to are repair bandwidth, the number of Input/Output operations (i.e. the number of reads and writes) and the latency of the repair. The code optimisation with respect to a particular property may be performed as part of a stand alone iterative process for optimising MDS codes, or the code optimisation may be performed within the above-described iterative process for determining if a generator matrix is for an MDS code.

The actual generation of coded data in dependence on source data according to embodiments can be performed with known techniques and using known hardware. The processes required to use the techniques according to embodiments to generate a plurality of source nodes and redundant nodes in a data storage system/network of a data centre storing the coded data would be a straightforward task for the skilled person. The skilled person would also be able to use known hardware to reconstruct one or more source nodes in order to implement embodiments. The nodes according to embodiments include single data disks, or drives, or groups or data disks, or drives. A node includes any form of data storage element, a part of such an element or multiple such elements. In particular, a node can be any logical entity where data can be stored, and can be anything from a whole, a group of or parts of physical storage devices or locations including but not limited to memory based storage devices such as RAM and SSDs, hard drives, tape storage, optical storage devices, servers and data centers. The method according to embodiments may be performed within a single SSD disk. The method according to embodiments may be performed between chips inside a SSD, or between banks inside (flash) chips.

The coding technique according to embodiments can therefore be used for each file/object/entity of data being stored on storage nodes/storage mediums. For example, say that the storage nodes comprise 14 hard drives and the encoding scheme has 10 data nodes and four redundancy nodes. The coding technique of embodiments can be applied to each file being stored on the hard drives so that each file is split into 10 segments and four redundancy segments of data is generated for each file. The encoding pattern is therefore found multiple times in the stored data (as many times as there are files being stored) on the storage nodes and not necessarily once for all of the storage nodes. The pattern is therefore repeated for each file.

The storage of the data in a data storage system is not limited to the data storage system having nodes, i.e. data drives or sections of a data drive, that are only for use as a store of source data node or redundant data. The systematic property may be maintained but a mapping introduced so that a data drive may store redundant data within a source data node and vice-versa. This interleaving of data changes the mapping of coded data to stored data and can be used to control the read operations from a data storage system, for example to ensure that the network traffic is balanced across the data storage system.

Although data storage is a particularly preferable application for the coding techniques disclosed herein embodiments include the generation of codes for any application, such as data transmission. For example, the nodes may correspond to data packets for transmission over a network.

The systematic coding techniques according to embodiments include erasure resilient systematic coding techniques.

The flow charts and descriptions thereof herein should not be understood to prescribe a fixed order of performing the method steps described therein. Rather, the method steps may be performed in any order that is practicable. Although the present invention has been described in connection with specific exemplary embodiments, it should be understood that various changes, substitutions, and alterations apparent to those skilled in the art can be made to the disclosed embodiments without departing from the spirit and scope of the invention as set forth in the appended claims. Methods and processes described herein can be embodied as code (e.g., software code) and/or data. Such code and data can be stored on one or more computer-readable media, which may include any device or medium that can store code and/or data for use by a computer system. When a computer system reads and executes the code and/or data stored on a computer-readable medium, the computer system performs the methods and processes embodied as data structures and code stored within the computer-readable storage medium. In certain embodiments, one or more of the steps of the methods and processes described herein can be performed by a processor (e.g., a processor of a computer system or data storage system). It should be appreciated by those skilled in the art that computer-readable media include removable and non-removable structures/devices that can be used for storage of information, such as computer-readable instructions, data structures, program modules, and other data used by a computing system/environment. A computer-readable medium includes, but is not limited to, volatile memory such as random access memories (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only-memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM), and magnetic and optical storage devices (hard drives, magnetic tape, CDs, DVDs); network devices; or other media now known or later developed that is capable of storing computer-readable information/data. Computer-readable media should not be construed or interpreted to include any propagating signals.