Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR IMPROVED DATA RECOVERY IN DATA STORAGE SYSTEMS
Document Type and Number:
WIPO Patent Application WO/2019/246527
Kind Code:
A1
Abstract:
A method and apparatus for improved data recovery in data storage systems is described. When errors occur while retrieving a plurality of codewords from a plurality of storage devices, a long vector may be formed from the plurality of codewords and decoded by a special, long parity check matrix to re-create data stored on the plurality of storage devices when normal decoding efforts fail.

Inventors:
VARANASI CHANDRA (US)
YEO ENGLING (US)
Application Number:
PCT/US2019/038479
Publication Date:
December 26, 2019
Filing Date:
June 21, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GOKE US RES LAB (US)
International Classes:
G06F11/10; G06F12/14; G11B20/14; H03M5/00; H03M7/00
Foreign References:
US20050193312A12005-09-01
US20040225944A12004-11-11
US20080140686A12008-06-12
US6229458B12001-05-08
US20020007438A12002-01-17
Other References:
YUAN ET AL.: "A Randomly Expandable Method for Data Layout of RAID Storage Systems", ARXIV PREPRINT, 22 August 2017 (2017-08-22), XP055667282, Retrieved from the Internet [retrieved on 20190825]
Attorney, Agent or Firm:
JACKSON, Juneko et al. (US)
Download PDF:
Claims:
CLAIMS

We claim:

1. A method for data storage and retrieval, comprising:

generating a plurality of data blocks from a quantity of unencoded data;

receiving, by a plurality of encoders, a respective one of the plurality of data blocks;

encoding the plurality of data blocks by the plurality of encoders, respectively, to generate a plurality of codewords;

logically combining the plurality of codewords to generate an encoded parity block;

storing the plurality of codewords and the encoded parity block in a plurality of data storage media, respectively;

retrieving the plurality of codewords and the encoded parity block from the data storage media;

decoding each of the plurality of codewords;

when at least two of the codewords cannot be decoded correctly:

generating a long vector from the plurality of codewords and the encoded parity block; and

decoding the long vector using a long parity check matrix to re-create the data blocks.

2. The method of claim 1, wherein generating the long vector comprises:

concatenating the plurality of codewords and the encoded parity block.

3. The method of claim 1, wherein generating the long vector comprises:

applying a long generator matrix GL to the plurality of data blocks.

4. The method of claim 3, wherein the long generator matrix comprises a matrix having a number of rows equal to a number of the plurality of data storage media multiplied by a length of each of the plurality of data blocks, and a number of columns equal to the number of the plurality of data storage media multiplied plus one, multiplied by a length of each of the plurality of codewords.

5. The method of claim 3, wherein the long generator matrix comprises a tensor product of a) a matrix comprising an identity matrix and a column vector of all l’s, and b) a generator matrix G used to encode the data blocks by the encoders.

6. The method of claim 3, wherein encoding the plurality of data blocks by the plurality of encoders comprises:

encoding each of the plurality of data blocks by a respective encoder, each encoder applying a generator matrix G to a respective data block;

wherein the long generator matrix GL comprises a tensor product of a) a matrix comprising an identity matrix augmented with a column vector comprising all l’s, and b) the generator matrix G.

7. The method of claim 1, wherein decoding each of the codewords comprises: decoding each of the plurality of codewords by a plurality of decoders, respectively.

8. The method of claim 1, wherein decoding each of the plurality of codewords by a plurality of decoders, respectively, comprises:

applying a same parity check matrix by the plurality of encoders to the plurality of codewords, respectively.

9. The method of claim 1, wherein the long parity check matrix comprises:

a matrix comprising a tensor product of a) an identity matrix augmented with a column vector comprising all 0’s, and b)a parity check matrix H used to decode the codewords; and

a second tensor product of a) a transpose of a column vector comprising all l’s, and b) a second identity matrix.

10. The method of claim 1, wherein the long vector comprises a long systematic vector comprising information bits, from the plurality of data blocks, and parity bits, the method further comprising:

forming the long systematic vector by re-arranging the information bits to occupy two or more least or most significant bits of the long systematic vector; wherein the long parity check matrix comprises a systematic matrix.

11. The method of claim 10, wherein the systematic matrix comprises

In(m+i)-km \ Q ' > where n is a number of bits in the long vector, m is a number of the plurality of codewords, k is a number of bits of each of the plurality of data blocks and

Q comprises a transpose of a tensor product of a) a matrix comprising Im augmented with a column vector of all l’s of length m, and b) a matrix comprising Pk(n-k) augmented with a tensor product of the column vector of all l’s of length m and an identity matrix having kxm rows and columns.

12. A data retrieval system, comprising:

a plurality of decoders;

a memory for storing processor-executable instructions and a long parity check matrix; and

a processor coupled to the plurality of decoders and the memory for executing the processor-executable instructions that causes the data retrieval system to:

retrieve, by the plurality of decoders, a plurality of codewords from a plurality of data storage media, respectively;

decode each of the plurality of codewords by the plurality of decoders; when at least two of the codewords cannot be decoded correctly:

retrieve an encoded parity block from a first data storage media;

generate, by the processor, a long vector from the plurality of codewords and the encoded parity block; and

decode the long vector using the long parity check matrix to re create a plurality of data blocks that were stored by the data retrieval system.

13. The data retrieval system of claim 12, wherein the processor-executable instructions for decoding each of the plurality of codewords comprises instructions that cause the data retrieval system to:

apply a same parity check matrix by the plurality of encoders to the plurality of codewords, respectively.

14. The data retrieval system of claim 12, wherein the processor-executable instructions for generating the long vector comprises instructions that causes the data retrieval system to:

concatenate the plurality of codewords and the encoded parity block.

15. The data retrieval system of claim 12, wherein the long parity check matrix comprises:

a matrix comprising a tensor product of a) an identity matrix augmented with a column vector comprising all 0’s, and b)a parity check matrix H used to decode the codewords; and

a second tensor product of a) a transpose of a column vector comprising all l’s, and b) a second identity matrix.

16. The data retrieval system of claim 12, wherein the long vector comprises a long systematic vector comprising information bits, from the plurality of data blocks, and parity bits, wherein the processor-executable instructions further comprise instructions that cause the data retrieval system to:

form, by the processor, the long systematic vector by re-arranging the information bits to occupy two or more least or most significant bits of the long systematic vector;

wherein the long parity check matrix comprises a long systematic matrix.

17. The data retrieval system of claim 16, wherein the long systematic matrix comprises In(m+i)-km \ Q ' > where n is a number of bits in the long vector, m is a number of the plurality of codewords, k is a number of bits of each of the plurality of data blocks and Q' comprises a transpose of a tensor product of a) a matrix comprising Im augmented with a column vector of all l’s of length m, and b) a matrix comprising Pk(n-k) augmented with a tensor product of the column vector of all l’s of length m and an identity matrix having kxm rows and columns.

Description:
METHOD AND APPARATUS FOR IMPROVED DATA RECOVERY IN DATA STORAGE SYSTEMS

BACKGROUND

I. Field of Use

[001] The present invention relates to the field of digital storage systems and more specifically to improving the error correction capabilities of data storage systems.

II. Description of the Related Art

[002] RAID storage systems have been used for years to redundantly store large amounts of data, used in applications such as online storage. Error Correcting Codes have been developed that recover erasures of stored data (as opposed to errors). Some codes, such as the well-known Reed-Solomon code, are forward error

correction (FEC) codes that transform messages of k bits into longer codewords of n symbols such that the original message can be recovered from a subset of the n symbols. The fraction r = kin is called the code rate, where k and n are the information packet length and codeword length, respectively. Codewords are stored on a number of independent storage media, such as pages in a flash memory, blocks, LUNs, Planes or Medium, any of which can be considered to have independent error statistics.

[003] More recently, other coding techniques have been used, including low-density parity check (LDPC) codesr In a storage system having m independent storage media, an z* media may store an z lh codeword v, encoded using a generator matrix Gz and decoded using a corresponding parity check matrix H,, where 1 < i < m. In standard RAID systems, each codeword is XORed with each other prior to storage, forming a parity codeword v m+i .

[004] If any single codeword v; fails, 1 < i < m + 1, it is recovered by performing an XOR function of all the remaining m successfully-decoded codewords

{v^, v i2 > < v i m }· Such a scheme cannot recover more than one failed medium.

SUMMARY

[005] The embodiments herein describe methods and apparatus for retrieving data by a data storage and retrieval system. In one embodiment, a method is described, comprising generating a plurality of data blocks from a quantity of unencoded data, receiving, by a plurality of encoders, a respective one of the plurality of data blocks, encoding the plurality of data blocks by the plurality of encoders, respectively, to generate a plurality of codewords, logically combining the plurality of codewords to generate an encoded parity block, storing the plurality of codewords and the encoded parity block in a plurality of data storage media, respectively, retrieving the plurality of codewords and the encoded parity block from the data storage media, and decoding each of the plurality of codewords. When at least two of the codewords cannot be decoded correctly, generating a long vector from the plurality of codewords and the encoded parity block, and decoding the long vector using a long parity check matrix to re-create the data blocks.

[006] In another embodiment, a data retrieval system is described, comprising

plurality of decoders, a memory for storing processor-executable instructions and a long parity check matrix, and a processor coupled to the plurality of decoders and the memory for executing the processor-executable instructions that causes the data retrieval system to retrieve, by the plurality of decoders, a plurality of codewords from a plurality of data storage media, respectively, and decode each of the plurality of codewords by the plurality of decoders. When at least two of the codewords cannot be decoded correctly, retrieve an encoded parity block from a first data storage media, generate, by the processor, a long vector from the plurality of codewords and the encoded parity block, and decode the long vector using the long parity check matrix to re-create a plurality of data blocks that were stored by the data retrieval system.

BRIEF DESCRIPTION OF THE DRAWINGS

[007] The features, advantages, and objects of the present invention will become more apparent from the detailed description as set forth below, when taken in conjunction with the drawings in which like referenced characters identify correspondingly throughout, and wherein:

[008] FIG. 1 is a block diagram of one embodiment of a data storage and retrieval system for storing and retrieving data with increased data recovery characteristics:

[009] FIG. 2 is a block diagram of one embodiment of an encoder portion of the data storage and retrieval system as shown in FIG. 1; [0010] FIG. 3 is a block diagram of one embodiment of a decoder portion of the data storage and retrieval system as shown in FIG. 1 ; and

[0011] FlGs. 4 A and 4B are flow diagrams illustrating one embodiment of a method performed by the data storage and retrieval system as shown in FIG. 1 to store and retrieve data with increased data recovery characteristics.

DETAILED DESCRIPTION

[0012] Certain aspects and embodiments of this disclosure are provided below. Some of these aspects and embodiments may be applied independently and some of them may be applied in combination as would be apparent to those of skill in the art. In the following description, for the purposes of explanation, specific details are set forth in order to provide a thorough understanding of embodiments of the invention.

However, it will be apparent that various embodiments may be practiced without these specific details. The figures and description are not intended to be restrictive.

[0013] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.

[0014] Spe cific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, straetures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.

[0015] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow' diagram, a structure diagram, or a bloc diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed, but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.

[0016] The term "computer-readable medium" includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing, containing, or carrying instruction(s) and/or data. A computer- readable medium may include a non-transitory medium in which data can be stored and that does not include carrier waves and/or transitory electronic signals propagating wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as compact disk (CD) or digital versatile disk (DVD), flash memor ', RAM, ROM, etc. A computer- readable medium may have stored thereon code and/or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents.

Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, or the like.

[0017] Furthermore, embodiments may be implemented by hardware, software,

firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a computer-readable or machine- readable medium. A processor(s) may perform the necessary tasks.

[0018] The embodiments described herein provide specific improvements to a data storage system. For example, the embodiments allow the storage system to recover more data in the event of erasures or errors due to, for example, media failures or noise. [0019] FIG. 1 is a simplified block diagram of one embodiment of a data storage and retrieval system 100 used for storing large amounts of data in accordance with the teachings herein. In the embodiment shown in FIG. 1, numerous hosts 102 provide unencoded data to data storage server 104 via wide-area network 106, such as the Internet, and data storage server 104 processes the data for storage in a plurality of data storage media 108a~108m. Such data storage systems are used in cloud storage models, in which digital data may he stored in logical pools, physical storage may span multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment protected and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data. Example of such cloud storage include Amazon’s S3, Google’s Cloud Storage and Microsoft’s Azure storage platforms.

[0020] The data received from hosts 102 is stored in data storage media 108 using forward error correcting codes to mathematically calculate extra redundant data. In some embodiments, data storage and retrieval system 100 may, in addition, replicate data for redundant storage. Erasure encoding and replication are well-known methods for protecting data from loss, due to noisy transmission channels or failure of one or more storage media.

[0021] FIG. 2 is a block diagram of one embodiment of an encoder portion of data storage and retrieval system 100. Processor 200 executes processor-executable instructions stored in memory 202 to store data in accordance with the teachings herein. Some of the functionality to store and retrieve data may be performed by processor 200, while others may he performed by the various functional blocks shown in FIG. 2. Each of the functional components shown in FIG. 2 may be integrated into a custom integrated circuit, such as an Application Specific Integrated Circuit (ASIC) or a custom System on a Chip (SoC). In other embodiments, one or more of the components shown in FIG. 2 may be integrated into one or more custom integrated circuits, while other functional components may comprise integrated circuits, discreet components, or a combination thereof. Each of the functional blocks shown in FIG. 2 may utilize either separate or shared processing and memory resources. Processor 200 comprises one or more microprocessors, microcontrollers and/or custom or custom integrated circuits, and/or discrete components able to carry out the functionality required for operation of data storage and retrieval system 100.

Processor 400 may be selected based on processing capabilities, power-consumption properties, and/or cost and size considerations. Memory 202 comprises one or more information storage devices, such RAM, ROM, Flash, and/or virtually any other type of electronic memory device. Typically, memory 202 comprises more than one type of memory. For example, a ROM may be used to store static processor-executable instructions, while a RAM memory or flash memory may be used to store variable data. Memory 202 may be used to store a variety of matrices used to encode and decode data.

[0022] Data from hosts 102 is provided to input data transfer logic 200, typically via wide-area network 104, where input data transfer logic 200 may apportion a predetermined number of bits, bytes or symbols of the data (i.e.,“segments”) into a plurality of m unencoded data blocks, each data block comprising a predetermined number of bits, bytes or symbols and, in general, k bits. Input data transfer logic 200 comprises circuitry for receiving data from a large number of hosts 102, such as cellular telephones, personal computers, cloud-based servers, etc. The unencoded data blocks «,· are typically generated in parallel as data is received by input data transfer logic 200. In the example shown in FIG. 2, a predetermined number of bits, bytes or symbols of the data is apportioned into m data blocks (where m > 2), and the data blocks are provided to encoders 202a-202 m, respectively. It should be understood that in other embodiments, fewer encoders may be used to encode the data blocks. For example, if four parallel data blocks are generated by input data transfer logic 200, two encoders could be used to encode the data blocks, each encoder responsible for encoding two of the four data blocks.

[0023] Encoders 202a-202m each comprise a forward error correction encoder, such as a low-density parity check (LDPC) encoder that encodes the data blocks using a low-density generator matrix G. In one embodiment, each of the encoders uses the same G matrix to encode respective data blocks, while in another embodiment, two or more of the encoders use different G matrices.

[0024] Encoders 202a-202 m generate encoded data blocks, or codewords, v,,

respectively, where 2 < i < m, and the codewords vi from each encoder is stored in a respective independent data storage media 108a-108m, respectively. Data storage media 108a~108m comprises two or more independent hard drives, solid state drives, pages in a memory device, logical unit numbers (LUNs), or any other electronic, optical or mechanical information storage device or portion thereof. Each data storage media 108 typically comprises different error statistics, i.e., the likelihood of a failure due to factors such as technology, age, temperature, or humidity. In one embodiment, data storage media 108 are part of data storage and retrieval system 100 and co-located therewith, for example, as part of a rack-mounted data storage system located in a physical data center. In other embodiments, one or more of the data storage media 108 are remotely located from data storage server 104, for example, located in a different physical location than data storage server 104 and accessed via the internet.

[0025] The codewords v* are additionally used to create a parity codeword v m+i by logically combining the codewords from each of the encoders by logic circuitry 208. Logic circuitry 208 typically comprises hardware and/or firmware to combine the codewords using an exclusive OR, or XOR, function on a bit-wise basis. The parity codeword can be used to recover from erasures when retrieving the codewords from the data storage units 108 by XORing the parity codeword with each of the available codewords from data storage units 108, as will be explained in greater detail later herein.

[0026] Each encoders 202a-202m generates codewords, each comprising information bits and parity bits, and each having a length greater than the data block. In coding terminology , R = k!n where, where R is the encoding rate, n is the length of each codeword and k is the length of the original, unencoded data block. The codewords are generated as a result of applying a generator matrix G to each of the data blocks, in an embodiment where encoders comprise LDPC encoders.

[0027] FIG. 3 is a block diagram of one embodiment of a decoder portion of data storage and retrieval system 100. As before, processor 200 executes processor- executable instructions stored in memory 202, in this case to retrieve data from data storage media 108 in accordance with the teachings herein. Some of the functionality to retrieve data may be performed by processor 200, while others may be performed by the various functional blocks shown in FIG. 3.

[0028] M codewords are retrieved, generally in parallel, from data storage media 108a- 108m. In the embodiment shown in FIG. 3, the codewords are each decoded using an independent decoder 300a -300m, respectively. In other embodiments, a fewer number of decoders may be used to decode the m codewords. An z* decoder 300 applies an //parity check matrix to the codewords, respectively, to re-create data blocks u,. The H parity check matrix used by the decoders may all be the same, or two or more may be different from one another, depending on whether different generator matrices G were used to encode the data. Each of the decoders uses an iterative decoding technique, such as a message passing algorithm, that is then used by each decoder to try and re-create the data words w,·, respectively.

[0029] The decoder shown in FIG. 3 can detect and correct errors caused by failed storage media and/or noise in the channel between the storage media and the decoders 300a-300/r; in three ways.

[0030] First, errors from each data storage media may be corrected by decoders 300a- 300m, respectively. That is, decoder 300a may be able to correct errors due to noise in the transmission medium between data storage media 108a and decoder 300a using well-known error correction decoding techniques, such as the use of LDPC decoding techniques. As previously mentioned, each of decoders 300a-300m utilizes an iterative decoding technique, such as a message passing algorithm, to provide hard or soft decoding of a respective codeword.

[0031] Second, erasures may be detected and corrected using logic circuitry 302, when one of the data storage media fail. Logic circuitry 302 comprises hardware and/or firmware to combine the codewords from all of the other data storage media plus parity media 108 ®+! using an exclusive OR, or XOR, function on a bit-wise basis. The decoded codeword, data block u f is provided to output data transfer logic 304, where it is combined with the other successfully decoded data blocks to form the original data provided by one of the host(s) 102.

[0032]

[0033] Third, errors from any of the data storage media may be detected and

corrected by constructing a“long vector” V L as a way to utilize information in all of the received codewords to aid in the error detection and correction process.

[0034] In one embodiment, the long vector V L is formed as a concatenation of the m individual codewords and v m + 1 as they are generated by encoders 206a-206m and logic circuitry 208 during the encoding process. In matrix notation, V L may be expressed as:

[0035] where Gi, G2...G m are each a different generator matrix used by respective ones of the encoders 206a-206m to encode data blocks ui-u m , respectively.

[0036] G L may be defined as:

[0037] The matrix G L comprises mk rows by n(m+ 1 ) columns, and it can be

alternatively used to generate the n(m+l)-long vector VL, as v L = [u 1 u 2 ··· u m ]G L . In one embodiment, where the same generator matrix is used by each of the encoders 206a-206m to encode the data blocks, i.e., G, = G, 1 < i < m, G L can be more succinctly expressed as a tensor product as follows:

G L - Um I I n] ® G

[0038] where 4 is an m x m identity matrix, l m is an (m x l)-long column vector of all ls, G is the generator matrix common to all the encoders, the vertical bar I denotes a partition of the matrix, and (g) represents a tensor product function. Since the vector Vm+i is simply a linear combination of codewords vi, v 2 , ..., v m , each generated from the same generator matrix G, v m+i itself is a codeword in the space of codewords generated by G, and, as such, could be decoded using the decoding algorithm used to decode v; , where 1 < i < m.

[0039] Regarding a tensor product of two matrices A and B of size (m x n) and (p x q) respectively, the tensor product between them, denoted by A (g) B , is defined as follows: every element of A individually multiplies the whole matrix B and, therefore, the size of the resulting matrix is (mp x nq). Note that A and B need not be matrix- multiplication-compatible in the conventional sense to compute their tensor product. For example, if:

T i r

Oil

0 0 1

[0040] Then, l.B 0.B l.fll

A B

0.B l.B l.flJ

[0041] which results in

1110 00111

011000011

001000001

A<S> B = 11111

0001

000011011

-000001001

[0042] Clearly, A (g) B ¹ b (g) A

[0043] A long parity check matrix H L may be formed to decode data blocks encoded using the long GL matrix. Such a HL matrix is formed such that GL*H'L is an all-zero matrix of size mk x n(m+l)-mk, which ensures that VL is a codeword generated from G L , allowing decoding of VL using the HL matrix.

[0044] Since the vector v m+i is an XOR-combination of individual codewords

generated by encoders 206a-206m using the same generator matrix G, it does not contain any independent bits, and as such, could potentially constitute some of the parity bits of VL . Since the only independent bits in constructing tv using G L are { u t , u 2 , u m }, where each u, is k-bits long, 1 < i < m, assuming that an H L exists for G L , the code rate R L of the long codeword tv is given by R, = mk =

n(m+l)

( m+ ϊ) ^ < ^ · The higher the value of m, the closer Ri is to R. The lower the value of m, the better its performance will be.

[0045] As an example, a data storage and retrieval system may be defined that uses three codewords for each segment of unencoded data from host 102, i.e., m = 3. A generator matrix G may be defined and stored in memory 202 as a 2x4 matrix as shown below, to encode data blocks of 2 bits into codewords 4 bits long (i.e., k = 2, n = 4).

1110

0101 [0046] Since G is in systematic form G = [P 1 1 2 ] where I 2 denotes a 2x2 identity matrix, and P denotes a k x (n-k) matrix, the corresponding H matrix is given by H = [In-kIP'] = [I2IP'] (where ' represents“the transpose of’):

1 0 1 0

H

0 1 1 1

[0047] The product G*H yields a 2x2 all-zero matrix, as expected. G L may now be formed as a matrix having 6 rows and 16 columns:

[0048] V L may now be determined using one of two methods, as shown in the

following example. Given three data blocks m = [1 1]; ¾ = [1 0]; U 3 = [0 1], and the G matrix shown above, the codeword for each data block is shown below:

V7 = [l 1]*G = 1 0 1 1;

V2 = [1 0]*G = 1 1 1 0;

m = [0 1]*G = 0 1 0 1;

[0049] In the codewords above, and hereafter, the information bits are shown in

boldface. V L may now be determined by multiplying U*G L where u is a concatenation of the information bits of the codewords, or a concatenation of simply the data blocks, i.e„ [ui U 2 u 3 ] which yields:

VL = [1 1 1 0 0 1]*GL = [1 0 1 1 1 1 1 0 0 1 0 1 0 0 0 0]

[0050] In a second method, VL can be determined simply by concatenating each of the codewords vi, v 2 , and v 3 , with v 4 , where v 4 is equal to v 1 @v 2 0 v 3 . The term “concatenating” as used herein, is to form a row vector comprising codewords arranged ins successive alignment. [0051] Referring back to FIG.3, decoder 300L uses a long parity check matrix H L to decode the long vector V L - In one embodiment, H L comprises a tensor product matrix. As there are only mk independent bits in tv , H L should comprise (m+l)n-mk independent rows. Since each bit vy in tv = {vy}, 1 £ i £ n, 1 < j < m is already checked by (n-k) parity bits in the H matrix, and since each v,y participates in generating a bit bi, 1 < i < n in v m+i, a candidate H \. matrix can be constructed as follows:

[0052] We can now compute GL*HL and verify that the product is an all-zero matrix of size (m+l)n-mk X n(m+l), when (m+l) is even. H L can be written in tensor product notation as:

[0053] where I m is an mxm identity matrix, 0 m is a column vector of all 0’s of length m, and l' m+i is a row vector of all l’s of length (m+l) and I n is an nxn identity matrix. The vertical bar I denotes a partition of the matrix, i.e., / m is augmented with 0 m . To demonstrate that this construction generates a valid HL matrix for GL, the following example is given, continuing from the example given above, where m = 3, n = 4, and k = 2:

-1010000000000000

0111000000000000

0000101000000000

00000111 00000000

0000000010100000

H L

0000000001110000

1000100010001000

0100010001000100

0010001000100010

-0001000100010001 [0054] A verification of the product GL*HL generates a 6x6 all-zero matrix. This confirms that VL is indeed a codeword in the space of codewords spanned by GL, and HL can be used to decode tv .

[0055] Although each of the individual codeword stored in the data storage media is systematic— in the sense that all the information bits are grouped at one end of each codeword— that property no longer holds for VL if it is generated from GL as shown above. However, if the codewords in the individual data storage media are systematic, VL can be made systematic using one of two methods, to form VLS. In a first embodiment, a systematic generator matrix GLS is formed, based on GL, by placing the columns in GL that correspond to the information bits of each codeword to one end of the GLS matrix. Then, GLS is applied to a concatenation of the information bits of each of the codewords, or a concatenation of the data blocks themselves, as explained previously. In a second embodiment, the long codeword VL can be made systematic even if it is formed using GL, by rearranging the information bits in VL SO that they occupy either the least significant bits or the most significant bits of VLS- However, rearranging the columns of GL and generating GLS improves the

performance of data storage and retrieval system 100, by allowing a simplified generation of HLS from Hr as shown in the following example, expanding on the examples previously discussed, above, where three data storage media are used, i.e., m = 3, and n = 4 and k = 2.

[0056] In the example above, since the 4-bit codeword stored on each media is

systematic with the information bits occupying the last two bits of the codeword— i.e., the 3 rd and the 4 th bits of each codeword, or the 3 rd , 4 th , 7 th , 8 th , 11 th and 12 th bits of the concatenated code words, in the GL matrix, columns 3 and 4, 7 and 8, and 11 and 12 are moved to the right-most columns, sequentially of the matrix, that results in all the information bits in the resulting long codeword VLS to occupy the right most, or least significant, bits of VLS-

1100001 110100000

0100000 101010000

0011001 110001000

0001000 101000100

0000111 110000010

-0000010 101000001 [0057] Since GLS is now in systematic form, i.e., GLS = [PLII 6 ], we obtain H s = [I n- k lPL'] = [IioIPL'], where n=l6 and k=6 for VL.

[0058] The product GLS*HLS' equals a 6x10 all-zero matrix, confirming that H s is, indeed, the proper parity check matrix for GLS . By using the information vector u = [1 1 1 0 0 1] from the example above, VLS = U*GLS = [1 0 1 1 0 1 0 0 0 0 1 1 1 0 0 1].

The long codeword is now in systematic form. It can be noted from VLS that the parity check bits in it, i.e., bits 1-10, from the left, are the concatenation of the parity check bits of the codewords (i.e., the first two bits of each codeword), plus the entire codeword v 4 , generated as the XOR of codewords vi, v 2 , and v 3 . Understanding this allows processor 200 a more efficient way of generating the systematic version of the long codeword VL, by simply concatenating the parity bits from each codeword, followed by the XOR-ed version of the individual codewords, and then appending a concatenation of the information bits from the individual codewords.

[0059] Tensor Product Interpretation of GLS and HLS

[0060] In tensor product terms, GLS may be represented by:

[0061] where n is the number of bits in each codeword, m is the number of data

blocks/codewords generated from each data segment received from host 102, k is the number of bits in each data block, I m is an mxm identity matrix, l m is a column vector of all 1’ s having a length m, P is a portion of G that generates the parity bits of a codeword, and I km is an identity matrix of kxm rows and columns.

[0062] If Q is defined as:

[0063] then

GLS— Q Vkm

[0064] where Q is a matrix comprising a tensor product of a) a matrix comprising I m augmented with a column vector of all l’s of length m, and b) a matrix comprising P k( n— k) augmented with a tensor product of the column vector of all l’s of length m and an identity matrix having kxm rows and columns.

[0065] This allows formation of H LS as:

HLS In(m+l)-km \Q

[0066] where Q ' is the transpose of the matrix Q. Therefore, knowing P from the G matrix of one of the encoders 206 (assuming that each encoder utilizes the same G generator matrix), as well as k, n and m, allows formation of both G LS and H LS for the long codeword in systematic form.

[0067] Decoding Algorithm for the long codeword

[0068] If H is a low-density parity check (LDPC) matrix, constructing H L in

accordance with the teachings herein ensures that it retains a low-density structure. This is because the density of l’s in Hi in the first m rows of the block matrix shown above paragraph 0051 is only (l/m+l) the density of l’s in H. (The contribution of the last block row to the density of l’s is l/n, which is negligible.) If a decoding algorithm tailored for low-density H matrices— such as a message passing decoding algorithm— is used on H, such an algorithm can also be used on H L to decode VL since it also retains the low-density character. Relative to H, the maximum column weight of H L increases by 1.

[0069] If H is an LDPC matrix, in one embodiment, a particular scheduling can be used to decode VL using the message -passing decoding algorithm executed by decoder 300L, by taking advantage of the fact that only the last n rows of H L have l’s in them that span all the m individual codewords: each of those rows has (m+ 1) l’s in it. Decoder 300Lcan periodically update the log-likelihood-ratio (LLR) values of the bits connected to each of those parity check bits based on the LLR values that were handed to them from the individual codewords at that time. The updated LLR values can then be used by the decoders to continue their decoding. In other words, in this embodiment, the codebits in one of the data storage media interact with, or obtain information from, codebits stored in other data storage media at regular time intervals, not continuously in every iteration. Such a scheduling can also be used in decoding V L using H L .

[0070] Fig. 4 is a flow diagram illustrating one embodiment of a method performed by data storage and retrieval system 100 to store and retrieve data with increased data recovery characteristics. The method is implemented by one or more processors, executing processor-executable instructions stored in one or more memories, such as processor 200 and memory 202, respectively. It should be understood that in some embodiments, not all of the steps shown in FIG. 4 are performed and that the order in which the steps are carried out may be different in other embodiments. It should be further understood that some minor method steps have been omitted for purposes of clarity.

[0071] At block 400, information may be pre-stored in memory 202 and/or the

memory of the encoders and/or decoders in order to encode data blocks and decode codewords. For example, one or more of a parity check matrix H, a long parity check matrix H a long systematic parity check matrix s, a generator matrix G, a long generator matrix GL, and a long systematic generator matrix s may be stored.

[0072] At block 402, data is received by input data transfer logic 204 from one of numerous hosts 102 remotely coupled to data storage and retrieval system 100, typically via the Internet.

[0073] At block 404, a predetermined amount of the data from host 102 is divided into m, equal-sized data blocks, generally provided in parallel to a plurality of encoders 206a-206m by input data transfer logic 204. Each of the data blocks comprises k bits.

[0074] At block 406, each encoder 206a-206m encodes a respective one of the data blocks using, in one embodiment, LDPC encoding, each encoder 206a-206m using the same generator matrix G stored in memory 202 or locally by each of the encoders. The result is a plurality of codewords, each comprising k information bits and rn parity bits, where m is the number of codewords generated for each segment of unencoded data from host 102. In one embodiment, each of the codewords are made systematic by a respective encoder.

[0075] At block 408, each of the codewords from the encoders are logically combined with each other using an XOR function, as provided by logic circuitry 208, to generate a parity codeword having a length equal to the length of each of the

codewords. [0076] At block 410, each of the codewords are stored in data storage media 108a- 108m, and the parity codeword is stored in data storage media 108 m+i · Each of the data storage media may be co-located with data storage server 104 or they may be remotely distributed in the cloud.

[0077] At some later time, one of the hosts 102 requests retrieval of data that was previously provided to data storage and retrieval system 100. In response, at block 412, a set of codewords are retrieved in parallel from data storage media 108a-108m.

[0078] At block 414, each codeword retrieved from the data storage media are

provided to a respective one of the decoders 300a-300m. In one embodiment, each of the decoders decodes a respective one of the codewords using the same parity check matrix H, stored in memory 202 or locally by each of the decoders. An iterative two- step decoding algorithm known as a message passing algorithm may be employed by each of the decoders, where a number of check nodes are first updated based on messages received from all or some variable nodes. In a second step, all or some of the variable nodes may be updated based on messages received from all or some of the check nodes. The process may be repeated until either a codeword has been decoded or until a threshold number of iterations or sub-iterations has been reached.

[0079] The messages used in message passing algorithm may be log-likelihood-ratio (LLR) messages, also known as soft information. The iterative decoders 300a-300m may calculate the LLR messages to correct or detect errors in a received codeword. Prior to the first iteration of the message passing algorithm, for example, each of the variable nodes may receive an LLR message based on information from the original code word as provided to any one of the decoders 300a-300m.

[0080] The result of the iterative decoding process is a re-creation of the original data blocks created at block 404, assuming that each codeword is decoded without errors.

[0081] At block 416, each of the decoded codewords is provided to output data

transfer logic 304, where they are combined to re-create the original data that was provided by the requesting host 102. In the event that one or more of the decoders 300a-300m cannot decode a respective codeword, in one embodiment, traditional XOR techniques may be used to recover the“bad” codeword.

[0082] At block 418, in the event that two or more codewords are not properly

decoded by decoders 300a- 300m, an attempt may be made to decode all of the codewords together by creating a long vector VL by processor 200 . In one embodiment, processor 300 creates VL as a concatenation of codewords vi-v m plus the result of an XOR function of all of the codewords vi-v m . Alternatively, V L may be created by multiplying each of the data blocks u u m by a long generator matrix GL stored in memory 202 or by decoder 300L, as discussed previously.

[0083] In one embodiment, at block 420, VL may be made systematic (VLS) by

processor 200, by multiplying a systematic generator matrix GLS by an information vector comprising a concatenation of data blocks ui-u m - The GLS may be formed from the GL matrix by processor 200 or decoder 300L, or it may be pre-stored in memory 202 or a memory of decoder 300L. In one embodiment, GLS is formed by re arranging the columns of the GL matrix, by placing the columns in GL that correspond to the information bits of each codeword to one end of the GLS matrix.

[0084] In a second embodiment, VL can be made systematic if it is formed using GL, by rearranging the information bits in VL SO that they occupy either the least significant bits or the most significant bits of VLS-

[0085] At block 422, either the long vector VL or the long systematic vector VLS may be decoded by decoder 300L. In the case of VL, a long parity check matrix H stored in either memory 202 or decoder 300L, is applied to VL, using the iterative decoding process discussed above. In the case of VLS, a long, systematic parity check matrix H h s, stored in either memory 202 or decoder 300L, is applied to VLS, using the iterative decoding process discussed above. It should be noted that the same decoding algorithm to decode tv or the long systematic vector VLS could be the same as the decoding algorithm used by decoders 300a-300m. During the decoding process, processor 200 or decoder 300L periodically updates the log-likelihood-ratio (LLR) values of the bits connected to each of the parity check bits based on the LLR values that were provided to the parity check bits from the individual codewords at that time. In one embodiment, the updated LLR values are used by the individual decoders to continue their decoding.

[0086] At block 424, processor 200 determines whether the long vector was decoded correctly, using techniques well known in the art. If so, then each of the decoded codewords are provided to output data transfer logic 304, where they are combined to form the original data segment that had been provided by one of the hosts 102. The decoded codewords from decoders 300a- 300m are ignored. Thus, by creating the long vector and decoding it with a long parity check matrix, data storage and retrieval system 100 may be able to better tolerate errors and erasures, because the codebits of each codeword in the long vector interact with or obtain information from each other. [0087] The methods or algorithms described in connection with the embodiments disclosed herein may be embodied directly in hardware or embodied in processor- readable instructions executed by a processor. The processor-readable instructions may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components.

[0088] Accordingly, an embodiment of the invention may comprise a computer- readable media embodying code or processor-readable instructions to implement the teachings, methods, processes, algorithms, steps and/or functions disclosed herein.

[0089] It is to be understood that the decoding apparatus and methods described

herein may also be used in other communication situations and are not limited to RAID storage. For example, compact disk technology also uses erasure and error- correcting codes to handle the problem of scratched disks and would benefit from the use of the techniques described herein. As another example, satellite systems may use erasure codes in order to trade off power requirements for transmission, purposefully allowing for more errors by reducing power and chain reaction coding would be useful in that application. Also, erasure codes may be used in wired and wireless communication networks, such as mobile telephone/data networks, local-area networks, or the Internet. Embodiments of the current invention may, therefore, prove useful in other applications such as the above examples, where codes are used to handle the problems of potentially lossy or erroneous data.

[0090] While the foregoing disclosure shows illustrative embodiments of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the embodiments of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.