Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR ERROR CORRECTION DECODING
Document Type and Number:
WIPO Patent Application WO/2007/082626
Kind Code:
A2
Abstract:
Methods and apparatuses are disclosed for decoding LDPC coded data, employing in iteration likelihood values in a memory, check node update steps and symbol node update steps. For accelerating convergence, before performing the check node update, symbol node update steps are performed in which updated likelihood values are used in matrix rows above and likelihood values from a previous iteration are used in matrix rows below. For efficiently decoding structured LDPC coded data, an accumulator is employed for each symbol node, new extrinsic information is computed using the accumulator and the memory, intrinsic information is computed by subtracting old extrinsic information from the accumulator, and a-posteriori information is computed by adding the new extrinsic information to the accumulator. Cyclic shifting is performed at the accumulator.

Inventors:
MUELLER STEFAN (DE)
SCHREGER MANUEL (DE)
KABUTZ MARTEN (DE)
Application Number:
PCT/EP2006/070101
Publication Date:
July 26, 2007
Filing Date:
December 21, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THOMSON LICENSING (FR)
MUELLER STEFAN (DE)
SCHREGER MANUEL (DE)
KABUTZ MARTEN (DE)
International Classes:
H03M13/11; H04L1/00
Foreign References:
EP1610466A12005-12-28
Other References:
KIENLE F ET AL: "A Synthesizable IP Core for DVB-S2 LDPC Code Decoding" DESIGN, AUTOMATION AND TEST IN EUROPE, 2005. PROCEEDINGS MUNICH, GERMANY 07-11 MARCH 2005, PISCATAWAY, NJ, USA,IEEE, 7 March 2005 (2005-03-07), pages 100-105, XP010780248 ISBN: 0-7695-2288-2
"DIGITAL VIDEO BROADCASTING (DVB); SECOND GENERATION FRAMING STRUCTURE, CHANNEL CODING AND MODULATION SYSTEMS FOR BROADCASTING, INTERACTIVE SERVICES, NEWS GATHERING AND OTHER BROADBAND SATELLITE APPLICATIONS" ETSI STANDARDS, EUROPEAN TELECOMMUNICATIONS STANDARDS INSTITUTE, SOPHIA-ANTIPO, FR, no. V111, June 2004 (2004-06), pages 1-74, XP002311764 ISSN: 0000-0001
ANDREW J BLANKSBY ET AL: "A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder" IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 37, no. 3, March 2002 (2002-03), XP011061705 ISSN: 0018-9200
Attorney, Agent or Firm:
THIES, Stephan (European Patent OperationsKarl-Wiechert-Allee 74, Hannover, DE)
Download PDF:
Claims:

Claims

1. A method for decoding LDPC coded data, the method employing in iteration likelihood values in a memory associated to nonzero elements of the parity check matrix of the LDPC code, and a check node update step generating check node updated likelihood values being performed for consecutive rows of the parity check matrix, the method characterized in that before performing the check node update step for a row of the parity check matrix, symbol node update steps are performed in columns where the row of the parity check matrix has set elements, in which symbol node update steps the check node updated likelihood values are used from the memory associated with matrix rows above the the current row and likelihood values from a previous iteration are used from the memory associated with the matrix rows below the current row.

2. The method of claim 1, wherein the symbol node update steps are performed immediately before consecutive ones of the check node update steps .

3. The method of claim 1 or 2, wherein the check node updated likelihood values from the check node update step are stored using memory positions associated with set elements of the subsequent row of the parity check matrix.

4. The method of claim 1 or 2, applied on a structured code the parity check matrix of which has submatrices consisting of one or more permutation matrices, wherein for those sub-matrices consisting of two or more permutation matrices, an additional buffer of a size equalling the number of permutation matrices minus one is provided, and the intrinsic information is computed by the second step of the two phase message passing, using the additional buffer of the sub-matrix.

5. The method of claim 3, wherein an accumulator is used, wherein the accumulator is initialized with the associated channel information, wherein the check node updated likelihood values are added to the accumulator and the contents at the memory positions associated with set elements of the subsequent row of the parity check matrix is subtracted from the accumulator, and wherein the check node updated likelihood values are then stored to the memory positions associated with set elements of the subsequent row of the parity check matrix.

6. A method for decoding structured LDPC coded data having check node groups and bit node groups, and employing one accumulator for each bit node and an edge memory, where a decoding subiteration is performed for each of the check node groups, the method characterised by steps of

- computing for each Check Node Group new extrinsic informations using the accumulator values and information from the edge memory; - computing the intrinsic information of each Check Node Group for each sub-matrix by subtracting old extrinsic information associated with the permutation matrices constituting the sub-matrix from the accumulator;

- computing a-posteriori information by adding the new extrinsic information to the accumulator.

7. The method of claim 2, wherein the symbol node update steps immediately before consecutive ones of the check node update steps are performed only in contiguous columns of the parity check matrix that have column weight 2, wherein a single buffer is used for each of the contiguous columns, wherein the intrinsic information is computed anew for each row of the parity check matrix by column-wise accumulating the a-priori information and the extrinsic information, and by writing the new resulting extrinsic information over the old extrinsic

information, and wherein the new resulting extrinsic information is used in the next check node update step.

8. The method of claim 6, wherein the edge memory is organised in words associated to the permutation matrices constituting the submatrices of the parity check matrix, each edge memory word comprising likelihood elements to store one likelihood value for every bit node of a bit node group, the method additionally characterised by steps of: - storing in the first likelihood element of each edge memory word the likelihood corresponding to the first sub equation of the associated bit node group,

- storing in the i-th likelihood element of an edge memory word the likelihood associated to the i-th row of the permutation matrix associated to the edge memory word,

- in the step of computing the intrinsic information, perform a cyclic shift at the accumulator.

9. An apparatus for decoding LDPC coded data, characterized by being equipped and configured to execute the decoding method of one of claims 1 to 8.

10. The apparatus of claim 9, having accumulators and associated accumulator buffers associated to received or data bits.

Description:

Method and apparatus for error correction decoding

Field of the invention

The present invention relates to Error Correction Codes, Soft Decision Decoding, Low Density Parity Check Codes (LDPC) , DVB- S2, irregular repeat accumulate codes, structured codes.

Background of the invention

Low Density Parity Check (LDPC) codes are the most powerful codes today; they achieve a very good error correction performance, especially for long codes.

To describe decoding of linear codes, let x represent a received vector assumed to be an N -tuple of symbols from GF(2) . A vector y is a valid code word if and only if

where is called the Parity Check Matrix. The goal of decoding is to derive a vector y from each vector x , which minimizes the

Hamming distance d nammin% {x,y) between x and y. The Hamming distance is defined as , i.e. as the cardinality of the set of those vector components, where the two vectors to be compared differ.

In soft decision decoding, each bit is represented by a conditional probability. Specifically, the probability that the bit / of a received vector was zero is denoted as p t . This probability is called the a-priori probability. Advantageously the bits of the received vector x can be represented by so- called Log-Likelihood-Ratio values or LLRs or likelihood values for short, defined as

This function maps the open interval to the entire set of real numbers % . In order to be able to represent probabilities p in a range a number range of at least -30<L(O)<30 must be foreseen for integer-valued LLR values. In the following, the a-priori information is called the channel information or channel LLR value and is denoted as

where i denotes the bit number within the received vector. Additionally we define a sub vector L c (n) as

where S is a dimension of sub matrices as explained in the following .

The principle of Structured Codes

For structured codes, the parity matrix H is divided into identically sized, square submatrices. The rows and columns are arranged into groups and are called check node groups and symbol node groups, respectively. In DVB-S2, submatrix dimension is 5=360, so that one check node group consists of 360 check equations and one symbol node group consists of 360 columns. The resulting submatrices, sometimes denoted as TZ 1 , define the connection between the nodes, and for structured codes they consist of permutation matrices. In case of DVB-S2, permutation matrices are denoted as V to indicate an identity matrix / cyclically left shifted / times. Some of the submatrices may as well consist of a sum of two permutation matrices, for which a shorthand notation of V +} is sometimes used, meaning I l +I J with i≠j.

Examples of permutation matrices of size S =4 are:

A parity matrix of a structured code may look like:

Another way of representing structured codes is the so-called Tanner graph. A Tanner graph of a structured code is depicted in Figure 4. In the Figure, each of the lines joining a symbol node group and a check node group stands for a number "S" of individual connections between individual check equations and individual symbols.

The idea behind structured codes is that this special structure of dependencies between the check nodes also called parity nodes and the symbol nodes also called bit nodes allows to update a certain number of parity and symbol nodes simultaneously. The result is a partially parallel hardware. The resulting hardware is a good compromise of memory and complexity. With the structured code approach, long codes can be realised with moderate complexity.

Implementation of long LDPC codes in hardware often becomes the bottleneck of a design. The memory consumption of a conventional LDPC decoding is quite high and it requires a lot of memory

accesses. The high number of write accesses yields high power consumption. Even if a structured code is used, the power consumption is often quite high and proportional to the number of iterations.

Invention

The LDPC decoding method and apparatus according to the invention reduce the overall size of memory and reduce the number of write accesses. Hence the power consumption is decreased. Additionally it converges faster, which also contributes to a power reduction.

For an example where a known LDPC decoder for DVB-S2 requires at least a so-called "edge memory" of size 792*360 Log-Likelihood- Ratio or LLR words, the method and apparatus according to the invention require an edge memory of at least 612*360 LLR words. Hence the resulting reduction is approximately 22 percent. Also the write accesses are reduced to approximately 50%.

Let H denote the parity matrix of an LDPC code. Let M denote the number of parities or parity equations or parity digits, K the number of information symbols or information digits and N the total number of symbols or digits in one code word. The row and column weights of H are denoted as w r and w c , respectively.

The channel log likelihood ratios of received code word bits CW 1 are expressed as L 1 , where log likelihood ratio is defined as

L 1 is the received conditional probabi l ity that informat i on bit CW 1 i s z ero .

Same as the known "Two Phase Message Passing" or TPMP decoding method, LDPC decoding according to the invention uses check node update steps and symbol node update steps, but it does so in

different order, which has consequences for the memory size and memory access needs.

Let ≠ 0/ for the associated likelihood matrix, be the set of checks that are connected to variable n = 0,...,N -I , with other words the set of positions of those check equations that depend (among others) on a specific information digit or bit n = 0,...,N -I . Correspondingly, let C(m)= \n : H mn =1/, corresponding to C(m) = ψ : H mn ≠ 0/ for the likelihood matrix, be the set of variables that are connected to check m = 0,..,M-l, or with other words the set of positions of those information digits or bits that have influence on a specific check equation m = 0,..,M -1. Also, according to usual mathematical convention, F(n)\m denotes the set F(n) without one specific member m from amongst its member checks or check equations, and correspondingly, C(m)\n denotes the set C(m) without a specific one n of its member variables or information digits or bits.

Check node update or Horizontal step

For each check node m = 0,...,M -1 and for each neC(m), with other words for the non-zero positions of rows of the partity check matrix, compute

This computation goes horizontally through the matrix, it amounts to first reading the set of existing terms Q corresponding to a specific row m of the parity matrix H , next calculating a set of terms R mn therefrom, and then writing the

set of terms R mn back to the same memory positions that held the Q m, , before.

Symbol node update or vertical step

For each symbol node n = 0,...,N-l and for each meF(n), with other words for the non-zero positions of columns of the partity check matrix, compute

This computation goes vertically through the matrix. In the known TPMP decoding method, the same memory is used to hold the R mn and the Q mn terms, so that R mn is overwritten with the new

Q mn for the next iteration step. The calculation amounts to first reading the set of existing terms R 1n corresponding to one specific column n of the parity check matrix H , next calculating a set of terms Q mn therefrom, and then writing the set of terms Q mn back to the same memory positions that held the R m,« before.

In the known TPMP decoding, first the Horizontal step is performed for all rows of the parity check matrix, and after that the Vertical step is performed for all columns of the matrix. The memory requirement in this case is proportional to the number of non-zero entries in the parity check matrix, hence if we denote as w c the column weight of the j -th column of the parity check matrix, the overall memory consumption can be calculated as

with w H the weight of the parity check matrix H . For the example of DVB-S2, the memory requirement using this conventional decoding algorithm corresponds to 792*360 = 285120 LLRs.

Assuming we have a regular structured code, i.e. a code with constant column and row weight w c and w r , the number of memory read and write accesses during the check node updates are w r M + N and w r M , respectively. During the symbol node updates, again, the number of memory read and write accesses is w r M + N and w r M , respectively, so the total number of read and write accesses is 2(w r M + N) and 2w r M , respectively.

The idea of the invention is to interpret each check equation in the parity matrix and associated likelihood matrix independently as a turbo code and to apply directly the Turbo Decoding principle. With other words, in a decoding method according to the invention, which employs, in iteration, likelihood values in a memory associated to nonzero elements of the parity check matrix of an LDPC code, and where a check node update step generating check node updated likelihood values is performed for consecutive rows of the parity check matrix, before performing the check node update step for a row of the parity check matrix, symbol node update steps are performed in columns where the row of the parity check matrix has set elements, and in these symbol node update steps the check node updated likelihood values are used from the memory associated with matrix rows above the current row whereas likelihood values from a previous iteration are used from the memory associated with the matrix rows below the current row. This has the advantage of a faster convergence.

Advantageously, the symbol node update steps are performed immediately before consecutive ones of the check node update steps; this maximises the convergence speedup.

DECODING PRINCIPLE A

We can now define a Decoding Principle A, where after each decoding or evaluating step, the extrinsic information Lei is accumulated, and the new resulting extrinsic information is then saved to the next extrinsic memory position. This overwrites the information stored there, but this is allowable and possible because that information is not needed any more in the next decoding or evaluating step nor in any of the other subsequent decoding or evaluating steps. As a consequence, the memory requirement is one less than the column weight for each column of the likelihood matrix.

With other words, in a decoding method according to Decoding

Principle A, the check node updated likelihood values from any one of the check node update steps are stored using those memory positions that are associated with set elements of the respective next or subsequent row of the parity check matrix.

DECODING PRINCIPLE B

Decoding Principle A described above constitutes a way to reduce the overall memory consumption employing what can loosely be called a "turbo decoding principle". It is applicable where the sub-matrices of the parity matrix of a structured code all are permutation matrices. However, the parity matrix of the DVB-S2 standard has some sub-matrices which are sums of two permutation matrices. Categorically using the above described Turbo Decoding principle results in problems on those check equation groups where a sub-matrix consists of two or more permutation matrices.

Let us consider a section of the parity matrix and the associated likelihood matrix H, where the sub-matrices have dimension S X S ≡ 4X4 or for short S=4.

)

Not the full matrix H is shown here. In this, BNG 1 is the i-th Bit Node Group of variables or information digits μ, s ,--^ ( , + i )S -iJ and CNG 1 is the i-th Check Node Group of check digits \c lS ,..C 0+1 ^ -1 J. Each of the sub-matrices joining or relating a Check Node Group to a Bit Node Group consists of one or more cyclic left shifted identity matrices V , where y identifies the number of shifts. L 1J is the extrinsic information, due to the memory allocation described above, / can be seen as the address and j as the j -th LLR value at that address. In the example shown, the position of the non-zero entries of the sub-matrix for BNG 0 and CNG 0 is defined by two permutation matrices, whereas all other shown sub-matrices of H are each defined by a single permutation matrix of individual shift. The idea of Decoding Principle A above was to interpret each equation as one independent turbo code. That means, for updating one equation the intrinsic information for that code is obtained by adding together the channel LLR values and the extrinsic information of the other "decoders". The result of one decoding step is saved on the location of the next unused information set, this reduces the overall memory size.

The buffer for BNG 0 looks like

and t e depth of the buffer corresponds to the column weight minus one, as mentioned above.

In order to decode, the following computation has to be done, where L x denote the input LLR values of the check equations: For C 0 , the following must be accumulated:

For C 1 , the fol lowing must be accumulated :

Performing these computations according to Eqq. (B3) and (B4) in parallel results in two difficulties:

First: Accumulating takes place over different words, see L 10 and L 03 in Eq. (B3) . This is not really a difficulty and can be solved. Second: The result from C 0 is directly used or needed in the accumulation step of C 1 .

In order to cope with the second difficulty, according to Decoding Principle B the decoding algorithm is extended. The extension consists in not using the output of C 0 in the other decoder, C 1 . That means that instead of computing Eqq. (B3) and (B4), one computes

Hence the first input values, namely check equation C 0 and C 1 , are the same. Likely in the first iteration, where all check nodes are being fed with the same input values L c . By using this approach the data flow of extrinsic information from C 0 to C 1 is avoided, and the extrinsic information from C 0 flows to C 1 via C 4 and C 9 and so on. This amounts to appropriately mixing the decoding approach known as "Two Phase Message Passing" or "TPMP" with the Decoding Principle A, also called "Turbo Decode Message Passing" or "TDMP" in the following.

For this, according to Decoding Principle B, an additional buffer is introduced for those locations where one sub-matrix is identified via two or more permutation matrices. With that additional buffer it is possible to use the normal two phase message passing approach on those locations. With this extension, we have a bit accurate implementation of the Tanner graph; while the performance is still increased compared against the known two phase message passing method. The idea here is to mix the turbo decoding principle with the two phase message passing principle. Submatrices which are defined via one permutation matrix are decoded via the described Turbo Decoding principle and submatrices which are defined via two or more permutation matrices are decoded using the known Two Phase Message Passing. Of course, overall memory size is slightly increased according to the number of sub-matrices identified via two or more permutation matrices.

With other words in a decoding method according to Decoding Principle B, when applied on a structured code the parity check matrix of which has submatrices consisting of one or more permutation matrices, for those sub-matrices consisting of two or more permutation matrices, an additional buffer of a size equalling the number of permutation matrices minus one is provided, and the intrinsic information is computed by the

second step of the two phase message passing, using the additional buffer of the sub-matrix.

The general idea of method B is applicable to any LDPC decoding where at least some of the submatrices are defined by more than one permutation matrix.

DECODING PRINCIPLE C

The decoding architecture of Decoding Principle B can be seen as being less suitable for applications requiring a high data throughput. The following therefore desribes another extension into a Decoding Principle C that is especially suited for high throughput applications. A high number of memory read accesses can be seen as a disadvantage of Decoding Principle B, so its architecture is changed in order to make it applicable for high throughput applications. The idea of Decoding Principle C is to reduce the high number of read accesses that occur when mixing the so-called "TPMP" and "TDMP". In order to avoid recomputing the intrinsic or input information of the next check node group, an accumulator is used. It is initialized with the associated channel information. Before overwriting the buffer with the decoding results the decoder result is added to the accumulator and the buffer contents of the next check node group is subtracted. After that the decoding result is saved on the buffer location of the next check node group.

From a mathematical point of view, Decoding Principle C results in the same performance and same computation as Decoding Principle B, but the buffer handling and timing scheme is different. Therefore it requires more memory, albeit less than conventional two phase message passing. The main advantage is the fast convergence and the high throughput .

With other words, in a decoding method according to Decoding Principle C, an accumulator is used, the accumulator is

initialized with the associated channel information, the check node updated likelihood values are added to the accumulator and the contents at the memory positions associated with set elements of the subsequent row of the parity check matrix is subtracted from the accumulator, and then the check node updated likelihood values are stored to the memory positions associated with set elements of the subsequent row of the parity check matrix .

DECODING PRINCIPLE D

The idea of Decoding Principle D is to apply the turbo decoding principle to single steps of the known "Two Phase Message Passing" or "TPMP" decoding approach without a reduction of memory. Decoding Principle C, above, is applicable for high speed applications and achieves some memory reduction compared to a straightforward TPMP implementation. But it can be seen to have the disadvantage of being applicable only for structured LDPC codes where some submatrices of the Parity Check Matrix are defined via one permutation matrix and others via more than one permutation matrix.

In Decoding Principle D it has been found that the idea of the turbo decoding principle can, on its own, be advantageously applied within the Two Phase Message Passing (TPMP) approach. The basic idea is to transfer extrinsic information from one equation to the next equation. Decoding Principle D does not aim at memory reduction, but it can advantageously be applied to every LDPC code, whereas Decoding Principle C is only applicable for structured codes where sub-matrices consist of one or more permutation matrices.

From a mathematical point of view Decoding Principle D realises the same computation as Decoding Principles B and C. For each bit an accumulator is used for which a buffer is necessary. The

main advantage is, that is applicable for every LDPC code, but it requires more hardware and memory than B and C.

In order to avoid recomputing the intrinsic or input information to be used in the calculations of the next check node group, accumulators together with accumulator buffers are used. With other words, a method for decoding structured LDPC coded data according to Decoding Principle D performs a decoding subiteration for each of the check node groups, computes for each Check Node Group new extrinsic informations using the accumulator values and information from the edge memory, computes the intrinsic information of each Check Node Group for each sub-matrix by subtracting old extrinsic information associated with the permutation matrices constituting the sub-matrix from the accumulator, and computes an a-posteriori information by adding the new extrinsic information to the accumulator .

DECODING PRINCIPLE E

Applying a "turbo decoding principle" on the entire parity matrix calculation leads to highly complex timing schemes, as seen in the previous Decoding Principles. Often it is not necessary to have that high throughput and convergence speed, but to have something in the middle. Decoding Principle E applies the turbo decoding principle to the TPMP basis in a different way and without increasing the complexity of the system. Even so, it increases the convergence speed compared to TPMP and decreases the overall memory consumption.

Decoding Principle E is based on the fact that the parity check matrix H of the LDPC code of DVB-S2 can be described as a juxtaposition of three partial matrices: H=A (X) A (2) T (El)

The (/) in A (j) is an index and not a "power-of" operation. The sizes of all three matrices depend on the code configuration of the LDPC code. The row weight w r of H is constant in one code configuration and depends on the code configuration (number of ones in column) . The column weight of λ (1) is w c and also depends on the code configuration. The column weight of λ (2) always is w c = 3 and is constant in all code configurations. The important fact to build upon is that the column weight of T invariably is w c = 2 , regardless of the code configuration. The matrix T is referred to as the Triangular part of H .

According to Decoding Principle E, the turbo decoding principle as described above, i.e. performing symbol node update steps immediately before performing the check node update step, is only applied on T , i.e. on those columns that belong to T. As mentioned, the column degree of T is 2.

§ Conventionally or TPMP decoding T would imply that first all check equations are decoded and the resulting extrinsic information Le is saved in the edge memory. In the subsequent accumulation, for each column the a-priori information and all extrinsic information are added together. The resulting new intrinsic information to be used in the next iteration is computed and saved in the edge memory. This would imply that one edge buffer is needed for each Le, so that at the given column weight 2 of

T , two buffers per column are needed.

§ With Decoding Principle E, only one buffer is needed for each column of T . The intrinsic information in one column is computed anew for each check equation. This is done by accumulating the a-priori information and the extrinsic information in each decoding step inside a column of T; the new resulting extrinsic information replaces the old extrinsic information. In that way the new resulting

extrinsic information is directly used in the next check equation, which results in a faster convergence.

In summary, Decoding Principle E has two effects:

§ It needs less memory: For DVB-S2, a conventional decoder needs to store all 792 edges, whereas with E, only 640 edges need to be stored. § It has a faster convergence, which leads to a higher throughput . § It needs less computation time per iteration, which also contributes to the higher throughput. § It has a much easier timing scheme.

Let us consider the following parity matrix:

Concep ually, using this parity check matrix for decoding can be visualised with a Log-Likelihood matrix or LLR matrix where one LLR value is associated to every set element of the party check matrix. For convenience, the LLR matrix is also called η here and in the following. It is clear from the respective context, whether η denotes the parity check matrix proper or the LLR matrix .

In this, the first column group SNG 0 represents the matrix part A (X) , the second column group SNG 1 represents the matrix part A (2) , and the following column groups SNG 1 J = 2,3,4 represent the matrix part T . Note that this example has a different dimension than the code of DVB-S2. The submatrix denoted as 7 1* is a special structure, namely

i.e. a non-cyclically left shifted identity matrix.

With other words, a method for decoding structured LDPC coded data according to Decoding Principle E performs the symbol node update steps immediately before consecutive ones of the check node update steps only in contiguous columns of the parity check matrix that have column weight 2, uses a single buffer for each of the contiguous columns, computes the intrinsic information anew for each row of the parity check matrix by column-wise accumulating the a-priori information and the extrinsic information, and by writing the new resulting extrinsic information over the old extrinsic information, and uses the new resulting extrinsic information in the next check node update step .

DECODING PRINCIPLE F

According to Decoding Principle F, a principle similar to turbo decoding is applied on nodes wherever possible, and a principle similar to two phase message passing is applied on nodes in those submatrices that are defined via two or more permutation matrices. The way this is done results in a straightforward

timing scheme. It also allows a portioning of the computation cores and therefore simplifies the routing inside an integrated circuit to realize the Decoding Principle.

In summary, Decoding Principle F has at least the following advantages :

- same memory consumption as conventional decoders (For DVB-S2 storing 792 groups of 360 edges each in a FIFO);

- allows to partition the computation units and edge memories;

- the check processing units can be located at the memories;

- faster convergence => higher throughput;

- less computation time per iteration => higher throughput;

- easy timing scheme.

Assuming the following parity check matrix and associated LLR matrix :

H = (Fi;

NB: Here, Eq. (Fl) assumes the permutation matrix I 1 to be defined as a cyclic right shift. NB2 : Eq. (Fl) and following, illustrating the case of binary symbols, use "Bit Node Group" (BNG) as a synonym for "Symbol Node Group" (SNG) .

Again, the LLR values are stored in a so-called "edge memory" where one memory word wide enough to store S =4 LLR values is provided and associated to each permutation matrix that is a constituent of the LLR matrix. But according to Decoding Principle F they are stored in a different order. For the above example, the edge memory layout results as:

With other words, as before, one memory place exists for each entry in the LLR matrix, and the overall memory consumption is 7*S LLR values. But, specifically for Decoding Principle F, with respect to the order in which the LLR values are stored in the cells of the edge memory,

§ the first element at each address of the edge memory corresponds to the first sub equation of the associated bit node group, and

§ the LLR values corresponding to the parity check matrix are ordered row wise, i.e. the i-th element at any address contains the LLR value in the i-th row of the permutation matrix associated to the address.

Decoding Principle F amounts to mixing TPMP and TDMP using a- posteriori summation.

The Decoding Principle F is applicable for all LDPC codes, even for those where some submatrices of the parity check matrix are defined by more than one permutation matrix.

With other words, a method for decoding structured LDPC coded data according to Decoding Principle F uses an edge memory organised in words associated to the permutation matrices

constituting the submatrices of the parity check matrix, each edge memory word comprising likelihood elements to store one likelihood value for every bit node of a bit node group. In the first likelihood element of each edge memory word, the likelihood corresponding to the first sub equation of the associated bit node group is stored, in the i-th likelihood element, the likelihood associated to the i-th row of the permutation matrix associated to the edge memory word is stored, and during computing the intrinsic information, a cyclic shift of likelihood values is performed at the accumulator.

The following Figures, together with the subsequent description, help to understand the invention.

Fig. 1 shows an example decoder having 3 check equations;

Fig . 2 shows the structure of the parity matrix of DVB-S2;

Fig . 3 shows an architecture for Turbo decoding according to the invention for the case of unstructured code;

Fig . 4 shows a Tanner graph of a structured LDPC code;

Fig . 5 shows decoding according to the invention for the case of structured code;

Fig . 6 shows a principle diagram of computing the extrinsic information;

Fig . 7 shows the buffer contents at successive iterations and subiterations;

Fig . 8 shows the buffer contents for Decoding Principe C at the times of iteration and sub iteration;

Fig . 9 shows the accumulator values in the iteration and sub- iteration steps;

Fig . 10 shows the final decision values at the end of the iteration;

Fig . 11 shows the block diagram of an LDPC decoder hardware according to Decoding Principle E;

Fig . 12 shows an example of a memory initialization;

Fig. 13 shows a suggested hardware architecture for Decoding

Principle F; Fig. 14 shows accumulator memory content for an example of

Decoding Principle F; Fig. 15 shows edge memory content for an example of Decoding

Principle F; Fig. 16 shows inputs and outputs of key elements of the

Decoding Principle F hardware;

Conventional decoding algorithm for LDPC codes is described in [1] Robert G. Gallager, "Low-Density Parity Check Codes", 1963

[2] D. MacKay, R. Neal, "Good Codes Based On Very Sparse Matrices", Cryptography and Coding, 5th IMA Conf. , Lecture Notes in Computer Science, 1995

DECODING PRINCIPLE A

Fig. 1 shows 3 check equations corresponding to 3 rows of a parity check matrix. All of Lc, Li, LeO, LeI and Le2 are sets of LLR values. In conventional decoding, first all check equations are decoded or evaluated for instance using dedicated Check Processing Units also denoted as CPUs. The resulting extrinsic information Le (i) is saved in a so-called edge memory. This corresponds to "step I" as described above. Afterwards the accumulation is performed and the new intrinsic informations Li are updated and saved again in the edge memory, corresponding to step II. Hence an edge buffer is needed for sets LeO, LeI and Le2, corresponding to a total of three buffers.

In the decoding method and apparatus according to Decoding Principle A, only two edge buffers are needed. Separately, after each decoding or evaluating step, the extrinsic information Lei is accumulated, and the new resulting extrinsic information is then saved to the next extrinsic memory position.

For the example of decoding check equation 0, channel information Lc and extrinsic informations LeI from check equation 1 and Le2 from check equation 2 are accumulated. After decoding or evaluating Check Equation 0, the resulting extrinsic information LeO is saved to the position of LeI. Of course, this overwrites the previous decoding information of LeI, but this is allowable and possible because that previous decoding information of LeI is not needed any more in the next decoding or evaluating procedure of check equation 1, nor in any of the other subsequent decoding or evaluating procedures.

Correspondingly, in the next decoding or evaluating procedure of check equation 1, the resulting extrinsic information LeI is saved to the position of Le2, and so on.

Advantageously, this results in less memory and in fewer write accesses to the buffers. As a further advantage, due to the fact that the decoding result of previous check equations are used immediately, i.e. are used already in evaluating the next check equation, the algorithm converges faster than conventional decoding, which also contributes to decreasing the number of buffer write accesses.

The decoding according to Decoding Principle A of the invention

The decoding is best explained with an example:

For the sake of illustration, we assume here an irregular and unrealistic code with column weights w c =4,w c =3, row weights w r =3, w r =4, w r =5, w r =4, and K = I, M= 4, N = 5. The associated parity check matrix is:

1 1 1 0 0 1 1 0 1 1

H = (4) 1 1 1 1 1 1 0 1 1 1

• St orage

For each column j we have one memory buffer P the size of which corresponds to the column weight minus one. Hence N = 5 memory buffers of size {3,2,2,2,2} denoted by P 0 ,P 1 ,P 2 ,...,P 4 are needed for the example of Eq. (4), their content is initialized to zero. In the following, P' shall denote the i -th element of P } . To the N = 5 memory buffers, a set of N = 5 counters ctr } is associated. Each of these counters points to the head of its associated buffer P } and is initialized with ctr := 0. Individual counters are needed because of the individually different size of the buffers. Even if after initialization the counters all start at 0, since each of them points to a position that cyclically traverses all positions of the associated buffer, they will loose sychronism very soon, and regain it only occasionally. In total, we have the following memory locations or cells: P 0 ,P 0 ,P 0 ,P 1 ,P 1 ,P 2 ,P 2 ,P 3 ,P 3 ,P 4 ,P 4 , which is a total of 11 locations or cells. This shows the memory reduction, keeping in mind that the parity matrix has a total of 16 non-zero entries.

• At each decoding iteration k , as many decoding sub- iterations are carried out as there are check equations in case of decoding an unstructured code, or as there are check equation groups (check node groups) in case a structured code is being decoded. o For each decoding sub-iteration t, t = 0,.M -1 , the extrinsic informations Q n ,n = 0,...,4 are computed using the channel information L 1 and the extrinsic information saved in P ] :

The calculated Q n are saved in P n ctr and the pointer is advanced: ctr n =(ctr n + l)mod[w c — 2).

§ Example iteration k = 0

• Subiteration k = O,t = O (check equation 0 corresponding to the first line of the parity check matrix H ) :

Save Q n in P n ctr and advance ctr n =(ctr n +l)mod(w c -2): rtr =1 rtr =1 rtr =1

• Subiteration fc = 0,? = l (check equation 1, 2nd line of parity matrix) :

Save Q n in P n dr and advance ctr n =(ctr n+ i)mod[w c —2):

^r 0 =2,^ =0,rtr 3 =l,rtr 4 =1

• Subiteration (check equation 2):

Subiteration (check equation 3)

• Repeat the last step for the next iteration step k = k +l.

Also, perform interleaving, if a structured code is used. • Final Iteration o Compute the a-posteriori information Q n ,n = 0,...,4 :

Make hard decisions X n = sgn(Q n ),n = 0,...,4

Memory requirement of the decoding according to the invention

The memory requirement depends on the number of ones or non-zero entries per column of the parity check matrix H . If, in the parity matrix, we denote as w c the column weight of the j -th column, the overall memory consumption can be calculated as

For the example of DVB-S2, the memory requirement using the decoding according to the invention corresponds to 612*360=220320 LLRs. The memory reduction against a conventional decoding is N LLR words, here JV = 180.

Memory bandwidth or number of write accesses

Assuming we have a regular structured code with column and row weights w c and w r , the number of memory read accesses per subiteration is w c w r . The number of write accesses per subiteration isw r . Since each iteration consists of M subiterations, the total number of read and write accesses are w c w r M and w r M , respectively.

This compares to the total number of memory read and write accesses in the conventional decoding which is 2(w r M + N) and 2w r M , respectively.

For the example of DVB-S2, the code is structured and has q*lSO submatrices, where q represents the number of row groups and there are 180 column groups. Each submatrix is processed in parallel. The width of one word corresponds to the number of ones in the permutation matrices and is in this case 5=360.

Figure 2 shows the structure of the parity matrix of DVB-S2.

According to the invention, we have the following memory accesses:

Read :

Write : 11- 72 = 792

Overall : 792 + 3486 = 4278

With the conventional decoding, we have the following memory accesses :

Read: 2(ll-72+ 18θ) = 1944

Write: 2-11-72 = 1584

Overall: 1584+1944 = 3528

One can see, the reduction in write accesses is approximately 50 percent .

Fig 3 shows a Hardware architecture of decoding according to the invention for the case of unstructured code.

The memory holds the channel LLR values and the extrinsic 5 information. The edge configuration memory holds the parity check matrix configuration, it constitutes or realises the sparse parity check matrix into a compact memory. The accumulator is simply an adder, which accumulates extrinsic information and can be reset by the controller. The controller

10 controls the process flow of all units. The hard decision unit performs the hard decision according to the sign of the LLR value. The CPU (check processing unit) unit computes the extrinsic information; it can be implemented using tanh and its approximation or using BCJR [3] /MAP/log-MAP based on trellis

15 structure.

[3] L. R. Bahl, J.Cocke, F.Jelinek, J.Raviv, "Optimal decoding of linear codes for minimizing symbol error rate", IEEE Trans. Information Theory, vol. IT-20, pp. 284-287, Mar. 1974.

20

• Storage

For each column we have one memory buffer, the size of the memory buffer corresponds to the column weight minus one. 25 Hence N memory buffers are denoted as P 0 ,P 1 ,P 2 ,...,P N-1 and are initialized to zero. P 1 denotes the /-th element of P 1 . A set of N counters that point to the head of these buffers is initialised as c?r ; :=0. Hence we have the following memory locations : o O 0 u p 1 O 0 Ip 1 O 1 '"-P 1 O^o "2 'p1 0 'p1 1 '"-p 1 I w q- 2 ' 'P- 1 Af-I 0 'P- 1 JV-I 1 ' "PiV-1 WcNl~2

• Iteration Step o Compute sub-iteration t = 0,...,M -1 , one sub-iteration of each equation:

§ Initialize the accumulator to zero

§ Compute the extrinsic informations Q n ,n = 0,...,4 using the channel information L 1 and the extrinsic information stored in memory P 1 :

§ Save Q n in P n ctr and advance ctr n = (ctr n + l)mod(w c -l) :

§ Repeat that step according to the number of equations

• Final Iteration o Compute the a-posteriori informations Q n ,n = 0,...,4

Make hard decisions

Hardware architecture of decoding according to the invention for the case of structured code

The longer a code is, the more efficient it is. Only long codes come close to the Shannon bound. The idea of a structured code is to design a code, in which several parity and variable nodes are updated simultaneously. The result is a partially parallel hardware. The resulting hardware is a good compromise between memory requirement and computational complexity. The idea behind that structure is to divide the variable nodes and check nodes into groups of sizeS. That means that the parity check equation is divided into submatrices; the submatrices are connected to

other submatrices via edges. The submatrix itself defines the permutation TZ 1 between the nodes.

Fig 4 depicts the Tanner graph of such code.

Such structured codes can also be expressed via a parity matrix

which nks the symbol node groups and the check node groups by permutation submatrices. Each check node group (CNG) and each symbol node group (SNG) consists of S = 4 check nodes and S = 4 symbol nodes, e.g. SNG 0 consists of the bits C 0 ,C 1 ,C 2 ,C 3 . Hence each submatrix of H is of size 4x4. With I 1 we denote a permutation matrix, which is obtained by a number of i left cyclic shifts of an identity matrix of S = 4. It is also possible to define one entry of H via two permutation matrices like I l+J , which is a shorthand notation for I'+I J , where i≠j . Example :

The principle now is to update S=A check nodes or symbol nodes simultaneously. Each memory word consists of S = 4 LLR values, which corresponds to one permutation matrix. Considering the check node update of group 0, we get

The meaning of these matrices is that in check equation C 0 , LLR values of S 3 have to be combined with s 6 ; in C 1 , LLR values S 0 have to be combined with S 1 , and so on. In hardware the LLR values of one permutation matrix is saved into one word, and the memory is organized as follows

Here, the bit width of one word is S *nb_bits_per_LLR . Assuming 6 bits per LLR value, the total bit width of one word in our example is 4-6 = 24 bits. For updating check node group 0, word 0 has to be cyclically shifted right one symbol and has to be combined with word γ , which is cyclically shifted right by two symbols. Now both words are combined using S = 4 check node update units, they are shifted back and saved in memory. The words are shifted, combined, shifted back and saved in memory.

This procedure can also be performed sequentially due to fact that the computation in CPU is associative.

The update procedure is performed for each check node group.

Decoders for structured or architecture aware codes are easily obtained from those of normal decoding using additional interleavers and deinterleavers .

Fig 5 depicts an appropriate hardware for decoding structured code using the decoding according to the invention.

The decoding algorithm is:

• Storage

For each column we have one memory buffer, the size of the memory buffer corresponds to the column weight minus one. The word width corresponds to the submatrix size S . Hence the resulting bit-width of one word is S *nb_bits_per_LLR . Physically the memory buffers can be in one physical memory .

• At iteration k : Carry out decoding sub-iterations according to the number of check node groups. o Perform sub-iteration t : Compute for each check node group the extrinsic informations Q n using the channel information L 1 and the extrinsic information saved in P ] :

β )

Save Q n in P n ctr and advance ctr n = (ctr n + i)mod[w c — 2) . § This computation is done in the following procedure:

• Initialize the CPU units to zero.

• For each entry in the set iGC(m), compute

§ Initialize the accumulator with L 1 § Feed accumulator with all LLR values contained in buffer P

o Perform interleaving by shifting the accumulator result according to the permutation sub matrix, o Feed CPU units and update • For each entry in the set ieC(m), compute o Perform deinterleaving by shifting the result of CPU units according to the permutation sub matrix, o Save the result and update (advance) the pointer

• Repeat what has been described for iteration k for the next iteration k=k+\, perform interleaving, because structured code is assumed here. • Final Iteration o Compute the a-posteriori informations Q n of group n :

• Make hard decisions of each bit of group x n =sgn(Q n )

DECODING PRINCIPLE B

In order to understand the Decoding Principle B, the following parity matrix is decoded exemplarily with the known "Two Phase Message Passing" or "TPMP" and with the Decoding Principle B or "TDMP", i.e. the mixture of the Decoding Principle A and TPMP. The example again uses a section of the parity matrix and the associated likelihood matrix H, where the sub-matrices have dimension S =4.

We again use F(n) and C(m) as defined above; the channel values of the i-th variable are denoted as L c .

B.I Decoding with conventional two phase message passing

For each permutation matrix contributing to the sub-matrix pattern of η, one memory word is foreseen in a so-called edge memory, the memory word consisting of S =4 Log-Liklihood-Ratio or LLR words. For the above parity matrix, the following memory configuration results:

Hence for each entry in the parity matrix one memory place L mn exists, L tj denoting the / -th LLR value of the extrinsic information stored at address / . The overall memory consumption in this example is 7*S LLR values.

B.1.1 Initialization :

Each variable node «is assigned an a-posteriori LLR value. For every position (m,n) where H mn ≠0, or such that H mn =l,

B.1.2 Phase (I), the so-called Horizontal Step:

For each check equation or matrix row c m , m = 0,...,M -1 , and for each neC(m), compute

where k denotes the time or iteration step.

In practice a subset of S =4 equations are computed at the same time, e.g. updating 4 equations in the first check node group at the same time. This can be achieved by reading the memory and cyclically shifting the memory words. Then feed the appropriate check node processing unit. After computation the result is shifted back and saved.

B.1.3 Phase (II), the so-called Vertical step:

For each symbol node n = 0,...,N—1 , and for each mGF{n), compute

where k denotes the time or iteration step. The computed new L^ n then serve as the input of Phase (I) . For example the new input values for Phase (I) of the equation for C 0 and Bit Node Group BNG 0 is:

For C 1 , we have

In practice, S =4 intrinsic informations are updated at the same time, but this time no shifting is necessary, see above memory configuration .

B.1.4 Decision :

For final decision, all a-posteriori probabilities are to be included in the final computation, hence also the variable node itself. For each n = O,...,N-1 , and for each meF(n), compute

The decision is made on the sign of R n , i.e. M n =I if R n < 0. If uH T =0, then the algorithm is halted with u as the decoder output; otherwise processing is continued by starting another iteration at step or phase (I) .

B.2 Extended decoding "B", mixing two phase message passing and turbo decoding principle

For each column group we have one memory buffer, the depth of the memory buffer corresponds to the column weight minus one.

For the example this corresponds to N =2 memory buffers of size {3,3} denoted by P 0 ^ 1 and initialized to zero. P ; w denotes the z ' -th element of P . The bitwidth or size corresponds to the dimension of the sub-matrix and bits per LLR value, in our case one memory words consists of S =4 LLR values. A set of N=2 counters that point to the head of these buffers are initialised as ctr := 0. For the first column group, we have the following memories :

For the second column group we have the following memories:

In this, t designates the sub-iteration, and k again designates the iteration step.

Exactly for those sub-matrices where we have two or more permutation matrices we introduce an additional buffer, buffer size corresponding to the number of permutation matrices minus one. Let B 1 denote the buffer for the /-th sub-matrix having two or more permutation matrices and the depth of the buffer is one. In our example we have one additional buffer:

The overall memory consumption in this example is 6* S LLR values. Hence the memory reduction in that simple example is one memory word, but it can be very large, if less additional buffers are introduced.

Let here F(n) = {m : H(m,n)≠ O} be the set of check node groups that are connected to a bit node group JC n , n = 0,...,(N/S -1) and let C(m) = {n : H(m,n)≠ O} be the set of bit node groups that are connected to a check node group c m , m = 0,..,(M IS -1) . The size of each sub-matrix H(m,n) is SxS .

• At iteration k : Carry out 3 decoding sub-iterations corresponding to the number of check node groups

0 Perform sub-iteration t, t = 0,..,(M /S -1) . Compute for each check node group the extrinsic informations

using J the channel informations L C BNG, and the

extrinsic information saved in P ; or B/ . For this we have first to compute the intrinsic or input information of each check node group for each sub- matrix, we denote this information as L x . During computation of L x it has to be distinguished if the extrinsic information of L x belongs to a sub-matrix with one or more than one permutation matrices. Perform for each entry in the check node group the following computation.

§ Principle of turbo decoding:

If bit node group i from check node group m corresponds to a sub-matrix defined via one permutation matrix, L x is computed as

§ Principle of two phase message passing:

If bit node group / from check node group m corresponds to a sub-matrix defined via more than one permutation matrix, L x is computed by using the second step of the two phase message passing.

Now the additional buffer of the sub-matrix is used; let B j denote the additional buffer in that sub-matrix. First compute the a-posterior probability

Now subtract the extrinsic information corresponding to that node to get the corresponding intrinsic or input information; for

the first permutation matrix within a submatrix, compute

For the following permutation matrices within the submatrix compute

until all intrinsic or input information of the current check node group are computed.

o After the last computation, the intrinsic information of the check equations is obtained. Now the information is permuted according to the permutation matrices. This is accomplished by cyclically shifting the L x values, the resulting values are denoted as L * .

Compute extrins ic informat ion according to equat ion (B21 ) :

For each check equation entry a new extrinsic information is obtained. o Shift back the new extrinsic information according to the permutation matrix

o Save the new extrinsic information into memory, distinguish if the resulting extrinsic information belongs to a sub-matrix with one or more permutation matrices .

§ One permutation matrix:

Save L n in P n ctr and advance ctr n =(ctr n +l)mod[w c -l),

§ Two or more permutation matrices:

• Save all L n in P n tr and advance ctr n =(ctr n +l)mod[w c -l). Take care of the order. • Save the extrinsic information L n for the first permutation in B } .

• Repeat last step for next iteration step k=k+l

• Final Iteration o The a-posteriori information is computed in the last iteration of one bit node group, e.g. in the last computation of one column, equation (B16) is replaced with the following equation (B23) :

This computation is done, if L n is not required in other check equation group.

• Make hard decisions for every bit in bit group X n = sgn[L post J

For the given example, we have: Updating the First Check Node Group 0 For computing the first check node group we have the following buffer contents:

The contents of the additional buffer is

Buffer contents of second column group:

with CMr 1 =O

B . 2 . 1 . Computing intrinsi c or input informati on of CNG 0 and

BNG 0 :

First we compute the L post , by adding contents P and B :

Now we subtract the extrinsic information in order to get the input values for our equation. First permutation matrix:

These values corresponds to the intrinsic or input information on the location compare equation (BlO) in the two phase message passing, it is exactly the same. In the next step we compute

B.2.2. Computing intrinsic or input information of CNG 0 and BNG 0 :

The intrinsic information from CNG 0 and BNG 1 Is computed according to the turbo decoding principle

B.2.3. Now one can compute the new extrinsic information from the check equation

Fig. 6 shows a principle diagram of computing the extrinsic information, i.e. the principle of this concatenation. After computation of the tanh function we get the new extrinsic information. The values are shifted back according to the sub- matrix permutation offset and saved in memory. Example shifting first transposed column of above equation yields to our new extrinsic information of L 01 (in this case no shifting is necessary) .

L 0 is saved twice, namely in p^ cntr<>) anc } j_ n B 0 (β) ; L 1 is saved in n ((cntrr ) +l)mod(w r — l)) π , π π ■ ,

P 0 , and cntr 0 i s incremented according to cntr 0 = (cntr 0 + 2)mod(w c - i) , hence after computat ion we have

The contents of the additional buffer is

The new extrinsic information of L 4 is saved in P 1 " 1*1 and cwtr j is advanced as . Hence our buffer contents of the second column group is

Fig. 7 shows the buffer contents at successive iterations and subiterations. "P_0 λ 0" stands for P o (o) , and so on. The shaded table cells with bold font inscription symbolize the pointer positions of the counters. These positions are overwritten in the next sub iteration step.

DECODING PRINCIPLE C

We call this Decoding Principle also "High Throughput Turbo Decoding Message Passing" or "HT-TDMP".

We again consider the example of Eq. (Bl) . Now, for each received or data bit an accumulator A n is introduced. The accumulator value is saved in an accumulator buffer. The accumulator is initialized with the channel informations,

K =L , (C3)

the memories P are of course initialized with zero. The accumulator group for channel information group L BNG of bit node group n is denoted as A BNG .

• At iteration k : Carry out 3 decoding sub-iterations corresponding to the number of check equation groups o Perform sub-iteration t, t = 0,..,M /S -1. Compute for each equation check node group the extrinsic informations L n ,πG Cym) using the channel informations L and the extrinsic information saved in P or B .

For this we have first to compute the intrinsic or input information of each check node group for each sub matrix, we denote this information as L x . During computation of L x it has to be distinguished if the extrinsic information of L x belongs to a sub-matrix with one or with more than one permutation matrices.

Perform for each entry in the check node group the following computation. § Principle of turbo decoding:

If group / from check equation group m corresponds to a sub-matrix defined via one permutation matrix, L x is directly obtained from the accumulator: L x =A BNGι (C4)

§ Principle of two phase message passing: If group / from check equation group m corresponds to a sub-matrix defined via more than one permutation matrix, L x is computed by using the second step of the two phase message passing; now the additional buffer of the sub-matrix is

used; let B j denote the additional buffer in that sub-matrix .

First compute the a-posterior probability by

Now subtract the extrinsic information corresponding to that node to get the corresponding intrinsic or input information; for the first permutation matrix compute

For the next permutation matrix compute

And for the third compute

And so on, until computations for all / permutation matrices in the sub-matrix of bit node group n are done .

Prepare the accumulator to compute the a- posterior probability; / is the number of permutation matrices in the sub-matrix of bit node group n , and w c is the column weight of current bit node group n . Compute

Note: j_ s no -|- included in the accumulator, hence it is not necessary to subtract.

Do this step until all intrinsic or input information of current check node group are computed.

o After the last computation, the input information of the check equations is obtained. Now the information is permuted according to the permutation matrices.

This is accomplished by cyclically shifting the L x values, the resulting values are denoted as L * .

o Compute extrins ic informat ion :

For each check equation entry a new extrinsic information is obtained.

o Shift back new extrinsic information according to the permutation matrix

o Compute the a-posterior value by adding the decoding result to the accumulator, for this we have to distinguish if the extrinsic information belongs to one or more permutation matrices. § In the case of one permutation:

§ In the case of more than one permutation, add the decoding results belonging to bit node group n :

o Now we have to prepare the accumulator to decode the next equation group, therefore we have to subtract the extrinsic value of the next pointer position. For this we have again to distinguish if we have one permutation or more

§ In the case of One Permutation, subtract the extrinsic information of the next check node group :

§ In the case of more than one (two) permutations, denoting by / the number of permutation matrices in the sub-matrix of the bit node group (Note: in DVB-S2, 1 = 2) :

Save now the new extrinsic information into memory, distinguishing if the resulting extrinsic information relates to a sub-matrix with one or with more than one permutation matrices.

§ In the case of One permutation matrix: Save L n in P n (ctr) and advance

§ In the case of Two or more permu ation matrices: • Save all L n in P n (ctr) and advance . Take care of the order.

• Final Iteration o The a-posteriori information is computed in the last iteration of one bit node group. It is simply obtained by omitting the computation of equation (C15) and (C16) . Or in other words, computation of equation (C15) and (C16) is not performed if accumulator group A BNG i s not anymore used .

• Make hard deci s ions for every bit in bit group x n = sgn\A BNG J

For the given example, we have: Updating the First Check Node Group 0

For computing the first check node group we have the following buffer contents:

Buffer contents of second column group:

The accumulator values are

Or detai led :

Cl. Computing intrinsic or input information of CNG 0 and BNG 0

The sub-matrix joining CNG 0 and BNG 0 is defined via two permutation matrices, therefore we have to add the additional buffer contents to the accumulator group A BNG0 and have to compute it with two phase message approach.

First we compute the by adding contents according to Eq. (C18) :

Now we subtract the extrinsic information in order to get the input values for our equation. For the first permutation matrix we get according to Eq. (C19) :

These values correspond to the intrinsic or input information on the location L OjO ,L 0 ,,L 0j2 ,L 03 , same as in the Two Phase Message Passing.

In the next step compute the next permutation:

Prepare Accumulator Buffer A BNG :

C . 2 . Computing intrinsi c or input informati on of CNG 0 and

BNG 1

The intrinsic information from CNG 0 and BNG 1 Is computed according to the turbo decoding principle:

Accumulator Buffer A BNG doesn't have to be prepared because the bit node group is defined via one permutation.

C.3. Compute the new extrinsic information

The new extrinsic information can be computed as in B.2.3 above. After the step corresponding to Eq. (B24) we continue with:

C.4. Compute a-posterior probabilities by adding new extrinsic information to Accumulator

C.5. Prepare Accumulator For BNG 0 we use:

For BNG 1 : we use :

C . 6. Save new extrinsic values For the values of BNG 0 , L 0 is saved twice , namely in P 0 cntri> and in 5 n 0 (0) , L T1 is saved T i ■ n P n 0 ((cBftn +l)mod(w -I)) and . cntr 0 is incremented . as c«?r 0 = (cntr 0 + 2)mod(w c - l) , hence after computation we have

-» (crcfr, )

The new extrins ic informat ion of L 4 i s saved in P 1 ' and cntr γ i s advanced as cntr γ = (cntr γ + l)mod(w c — l) cntr γ = cntr γ mod^ — l) . Hence our buf fer content s of second column group i s

Fig. 8 shows the buffer contents for Decoding Principle C at the times of iteration and sub iteration. Same as in Fig. 7, the shaded boxes with the bold type inscriptions symbolize the pointer positions of the counters. These positions are overwritten in the next sub iteration step.

Fig. 9 shows the accumulator values in the iteration and sub- iteration steps. By omitting the computation of eqq. (C15) and (C16) of the intrinsic value for next check equation group we get the final decision values shown in Fig. 10.

Same as for Decoding Principle B, Decoding Principle C is applicable to decoding apparatus and method of any LDPC code where at least some of the submatrices are defined by more than one permutation matrix.

DECODING PRINCIPLE D

In the following description of the Decoding Principle D, each set of accumulator (an arithmetic-logic unit) and associated accumulator buffer (a memory cell) will be together called "accumulator" .

We again consider the example described under Decoding Principle B above, Eq. (B26) .

For Decoding Principle D, same as for Decoding Principle C, an accumulator is introduced for each received or data bit, and the accumulator value is saved in an Accumulator Buffer A n . The accumulator is initialized with the associated channel information :

K=K' (D3) and the edge memories P are initialized with zero.

In the following, the Log-Likelihood-Ratios in an n-th Bit Node Group BNG n are denoted as a Channel Information Group L BNG , for which the associated accumulators are grouped into a corresponding Accumulator Group denoted as A BNG . It should be mentioned, that the accumulator buffer can be the a- priori/channel buffer L c . The accumulator contains always the a- posteriori information.

For the decoding according to Decoding Principle D:

• At iteration k : Carry out decoding sub-iterations corresponding to the number of Check Node Groups: o Perform sub-iterations t, t = 0,..,(M /S -1) . Compute for each Check Node Group the extrinsic informations L n ,tiG C(m) using the accumulator values A BNG and the extrinsic information saved in the edge memory. Additionally compute a new a-posteriori information of

BNG n . For this, first the intrinsic or input information denoted as L x of each Check Node Group for each sub-matrix has to be computed. These values are obtained by subtracting the extrinsic information

associated with the permutation matrix participating in the Check Node Group, from the accumulator:

Do this step until all intrinsic or input information of the current Check Node Group are computed.

In the following these L 1 ' s are denoted as L. (oW)

o After the last of these computations, the input information of the check equations is obtained. Now the information is permuted according to the permutation matrices. This is accomplished by cyclically shifting the L x values, the resulting values are denoted as L * .

L * = cyclic shift of L x (D5 )

o Then compute the extrinsic information as

For each non-zero entry in the parity check matrix row, corresponding to a check equation, a new extrinsic information is obtained.

o Shift back the new extrinsic information according to the permutation matrix:

L n = cyclic shift of L n ( D 7 ) o Subtract the old extrinsic values from the accumulator :

§ In the case where the sub-matrix is defined by one permutation:

§ In the case where the sub-matrix is defined by more than one permutation, add the decoding results belonging to Bit Node Group n :

o Compute the a-posterior value by adding the decoding result to the accumulator. For this it must be distinguished if the extrinsic information belongs to one or to more permutation matrices. § In the case of one permutation matrix:

§ In the case of more than one permutation, add the decoding results belonging Bit Node Group n :

o Save back the new extrinsic information into the edge memory, where each L n has its own memory location .

• Final Iteration o The a-posteriori information is available after each sub-iteration and corresponds to the accumulator buffer values A BNG .

• Make hard decisions for every bit in the Bit Node Group:

The Decoding Principle p_ is applicable to every LDPC code, even to unstructured ones, where it is simpler.

DECODING PRINCIPLE E

The Decoding Principle E_ is described in the following by a comparison of conventional TPMP decoding in comparison to Principle E_ decoding.

E.I Conventional TPMP Decoding

For each permutation matrix, one edge memory word is foreseen, which in this example consists of S =4 LLR words. For the Parity Check matrix of Eq. (E8), the edge memory layout is:

One memory place for storing an LLR value is associated to each non-null entry in the Parity Check matrix, i.e. overall memory consumption is {13- S) LLR values.

E.1.1 Initialization

At time step Jc = O, each symbol node n is assigned an posteriori LLR value. For every position (m,n) in the Parity Check matrix, where H mn =l, the associated LLR value in the associated LLR matrix is initialised:

After that, time step k is advanced by one.

E.1.2 Phase (I) or "Horizontal" step

For each check equation c m , m = 0,...,M -1 , i.e. for each matrix row, and for each neC(m), compute

where k denotes the time step. More specifically, in the case of check equation C 0 , first, the set of existing terms ZJJj 1 ' , corresponding to the row of C 0 , is read, a set of terms L^ n is calculated according to above Eq. (E13) , and then the set of terms L^ n is written back to the same memory positions that held the L^ ~J) before. The computed new L^ n are the input of Phase (ii) .

For the example, we get:

For the first parity check equation C 0 , we compute:

Because of the structured-ness of the code, subsets of S=A equations are computed at the same time, which amounts to updating 4 equations in the first check node group at the same time. This can be achieved by reading the memory and cyclically shifting the memory words, which are then fed to the appropriate check node processing units. After their computation, the results are shifted back and saved.

Updating the first check node group is done in the following way : Reading address 0 :

Permuting the LLR word L 0 according to the permutation of sub matrix, in that case permutation matrix is /° . The new LLR word is called

Reading address 1 :

Permuting the LLR word L 1 according to the permutation of sub matrix; in that case permutation matrix is 7 "1 (cyclic shift) . The new LLR word is

Reading L 4 ,L 7 and L 11 ; and permuting in the same way. According to the parity matrix, in the first check equation L 00 has to be combined with L 03 , with L 04 and with L 08 . As one can see, the first value of L 0 and the first value of L 1 are the correct values, which are to be combined. Now the whole vector can be computed at once. Hence all four equations, the whole check node group can be updated in parallel with

LV = L\ ® L\ ® r 7 θ L'n = ((L 03 ® L 0 4 ® L 0 8 1(L 1 0 © L 1 5 φ L 1 9 φ L 1 16 ),...,...)

Note: The fact that L 019 doesn't exist in the parity check matrix has to be taken into account in the computation of the extrinsic information. The first column corresponds to the computation of above Eq. (E14) .

Cyclically shift back the extrinsic values:

and save the values back to memory: at address 0 write

and at address 1 write:

Adresses 4, 7 and 11 are processed accordingly.

Increase time step k by one.

E.1.3 Phase (II) or "Vertical" step

For each symbol nodeπ = O,...,N-1 , and for each meF(n), compute

where ^denotes the time step. More specifically, in the case of column X 0 , first, the set of existing terms L k~l m , n , corresponding to one specific column not the parity matrix H is read, then a set of terms L k m ,, , is calculated according to above Eq. (E25) , and then the set of terms L k m , n , is written to the same memory positions that held the L k~l m , n , before.

E . g .

For example the new input values for Phase (I) of equation C 0 and bit node group zero is

For C 1 we have

In prac ce S =4 intrinsic informations are updated at the same time, but this time no shifting is necessary, see above memory configuration.

E.1.4 Decision

For final decision, all a-posteriori probabilities are to be included in the final computation, hence also the symbol node itself. For each n = 0,...,N—1 , and for each mGF{n), compute

The decision is made on the sign of R n (i.e. M n =I if Q n <0). If uH T =0, then halt the algorithm with u as the decoder output; otherwise go to step (I) and increase time step fcby one.

E .2 Principle E decoding, employing turbo decoding only on the triangular part of the parity check matrix

For each permutation matrix, one memory word is provided. For the example considered here, a memory word consists of S =4 LLR words. Conceptually, the memory is divided into different regions. The edges of the parity matrix part corresponding to A 1 and A 2 are saved in addresses:

so that for each entry in A 1 and A 2 of the parity check matrix, one memory place exists.

The T matrix, however, is saved in

One can see, that only one memory place is necessary for each column of the T matrix. The overall memory consumption in this example using conventional decoding is (l0 λ 1 ) LLR values.

Further, we use F(n)= \m: H mn =1/ and C(m)= \n : H mn =1/ as defined above. Additionally, N i shall denote the number of symbol nodes in matrix part A 1 , N^ 2 the number of symbol nodes contained in matrix part A 2 , and N A =N A t +NA 2 . In our example, iVA,=4 and

N A δ2 =4.

E.2.1 Initialization

Each symbol node n of parity matrix parts A 1 and A 2 is assigned an a-posteriori LLR value. With other words, for every position

(m,n) where H mn =l and n≤N A -l, at time step k = 0,

For n> N A -\ , the corresponding memory addresses of the Tmatrix are initialized with zero:

Increase time step k by one.

For the example, this initializationyields the memory contents as :

E.2.2 Phase (I): Horizontal step

The following is carried out for each check equation, i.e. for each row of the parity check matrix. For each equation c m , m = 0,...,M -1 , and for each tiGC(m), we have to compute the new extrinsic value. Therefore we load the set of existing terms L k~]l m ,n , iiG C(m) for the specific row m from memory. In Decoding Principle E_, this set is a mixture of intrinsic and extrinsic information. For the matrix part where n≤N A —l, the information is intrinsic and is part of the matrix A 1 Or A 2 ; for the part

where n≥N A , the information is extrinsic and is part of the matrix part T . Therefore in case of extrinsic information we have to build the intrinsic information by adding the a-priori value :

Now we can compute the new extrinsic values by evaluating

The old extrinsic values are overwritten with the new extrinsic values. Note that the columns in matrix part T share one memory location.

For the example, and without parallelization, this amounts to: For the first check equation C 0 , compute:

In practice, a subset of S =4 equations are computed simultaneously, e.g. updating 4 equations in the first check node group at the same time. This can be achieved by reading the memory, cyclically shifting the memory words, feeding the appropriate check node processing units, and after their computation shifting back and saving the result.

With parallelization, this amounts to:

Updating the first check node group is done in the following way : Reading address 0 :

L o =(L L 1 , L 2^2 I 33 ) (E38)

All n of L 1n n in L 0 satisfy n < N A => L mtτ m , n = L m n , so that

Permute the LLR word L mtr o according to the permutation of the submatrix, in that case the permutation matrix is /° . The resulting LLR word is

Reading address 1 and 4 do exactly the same for L 1 "" 1 . The handling for address 7 and 11 is different. Read address 7:

Due to address sharing, the memory contains, at address 7, always the extrinsic value for the next check node update of that column.

All n of L 1n n in L 0 satisfy n ≥ N A => L mtI m , n = L k~l m , n + L c , so that

As one can see, L mtr 4 ,8 directly corresponds to Eq. (E27) . Hence the vertical process of the matrix part T is included in the horizontal process. Permute the LLR word L mtr o according to the permutation of the submatrix, in this case the permutation matrix is /° . The resulting LLR word is

The same computation is done for L mtr n .

According to the parity check matrix, in the first check equation L 00 has to be combined with L 03 , with L 04 and with L 08 .

As one can see, the first value of L 0 and the first value of L 1 are the correct values, which are to be combined. Now the whole vector can be computed at once. Hence all four equations, i.e. the new extrinsic information of the whole check node group, can be updated in parallel with

Note, that the case that L 019 doesn't exist in the parity check matrix has to be taken into account in the computation of the extrinsic information.

The first column corresponds to the computation of above Eq. (E33) . Cyclically shift back the extrinsic values

And save the values back to memory, at address 0 write

And at address 1:

Adress 4, 7 and 11 accordingly.

Note that, due to the fact that intrinsic values are directly computed again in the symbol node update process, the values

(L 4 8 L 5 9 L 6 10 L 1 n ) are not needed anymore . Therefore they are overwritten with the extrins ic values (L 0 8 L 1 9 L 2 10 L 3 11 ) of the next equation group. In that way the resulting new extrinsic information is directly used in the next computation, which is again an implementation of the "turbo decoding principle" of this invention, and leads to two effects, namely less memory and faster convergence.

E.2.3 Phase (II): Vertical step

For each symbol node n = 0,...,N A —1 , and for each mGF{n), compute

This computation is only needed for matrix part A 1 and A 2 , it is not necessary for the triangular part T of the parity check matrix / associated likelihood matrix H. More specifically, in the case of column X 0 , first, the set of existing terms L k~l m , n , corresponding to one specific column n of the parity check matrix H is read, then a set of terms L k m ,, is calculated according to above Eq. (E50), and then the set of terms L k m , n is written to the same memory positions that held the L k~l m , n before. The computed new L k m , n are then used as the input of another Phase (i) •

For our example, the new input values for Phase (I) of check equation C 0 and bit node group zero are

For C 1 , we have

In practice, 5=4 intrinsic informations are updated at the same time, but this time no shifting is necessary, see above memory configuration.

E.2.4 Decision

For the final decision, all a-posteriori probabilities are to be included in the final computation, hence also the symbol nodes themselves. For each n = 0,...,N A —1 , and for each me F\n) , compute

The decision is made on the sign of R n , i.e. M n =I if R n <0. If uH T =0, which can be computed during Phase (I), then the algorithm is halted with u as the decoder output. Otherwise increase time step k by one and loop back to Phase (I) .

E .3 Decoding Principle E hardware

Figure 11 shows the block diagram of an LDPC decoder hardware according to Decoding Principle E. The data is saved in input memory "DEINT". The LDPC decoder is started via start signal "START Decoder", if one frame is completely obtained in the "DEINT" buffer. The iteration control starts the individual processes of the LDPC decoder. The processes consists of three phases, namely initialization, decoding and output phase. The

initialization phase is done by the block "load_cont", the decoding is done by the "run_cont" and "bpu_cont" blocks, and the output phase is done by the block "bit_stream" . For all this, the iteration control block sets up the corresponding data path via multiplexer: MUXl, DEMUX3, MUX4, MUX7, MUX6.

E.3.1 Initialization phase

The iteration control block "iter_control" sets the multiplexer to MUXl=O ;MUX6=0,MUX7=0, thus enabling path 1. Now the block

"load_cont" has control of the input memory and the edge memory. The iteration control block sends a start pulse to the block "load_cont". The load_cont block reads the data contents of the input memory and writes the data to the edge memory. As known, the parity check matrix of the LDPC code can be split into

Where ^ depends on the code configuration and code rate and represents the number of check node groups. The matrix A can be further split into two matrices

Where i in A 1 is an index and not a "power-of" operation. The edge memory is initialized in a column oriented way. That means, that for each column the corresponding channel LLR value is written w r times into successive memory locations in the edge memory. This is done for A 1 and A 2 . A conventional LDPC decoder would also initialize the matrix T . However, in our Decoding Principle E_, the turbo decoding principle is used on these nodes, so that the memory necessary for that part of the matrix is reduced by a factor of 2 and is initialized with zero. This reduces the overall memory consumption of the LDPC decoder. In this approach the memory consumption necessary for the edges are 720x2160Bits, a conventional decoder needs 792x2160Bits .

Figure 12 shows an example of a memory initialization as described, for a code of coderate 1/3 and q = 120. BNG in the figure means symbol or bit node group and denotes the actual column group of the parity matrix. CNG means check node group and denotes the row group of the parity matrix. N Al =bitgrp\ and

N A ι =bitgrp2 denote the start of the next part of the parity check matrix. The even addresses correspond to the edge memory 1 and the odd addresses correspond to the edge memory 2. This code configuration requires 600 edges; a conventional decoder would require 600 memory locations. The new approach needs only 540 memory locations, without loosing any edges, because of exploiting that one memory location is used for two edges. After initialization, the block "load_cont" sends an acknowledge back, indicating to the iteration control block, that decoder is ready for decoding.

E .3.2 Decoding

As usual, the decoding process comprises a check node update and a bit node update.

E.3.2.1 Check Node Update, performed in "Check Processing Units"

The iteration control block "iter_control" sets the multiplexer to MUXl=I ; MUX4=0, MUX6=2 ; MUX7=2 ; DEMUX3=0. Now the block

"run_cont" has control of the input memory, edge memory, adder, shifter and cpu/bpu units. The iteration control block sends a start pulse to the block "run_cont". The block updates now the intrinsic information in the edge memory with the extrinsic information. This is done in sequential manner by reading and writing check node group by check node group. For the example, check node group 0 (CNG 0) comprises the addresses 0, 157, 480, 539. The block reads the word of address 0 (edge memory 1) feeds the value through the shifter to the check processing units (CPU) , together with a synchronization signal, indicating start

of check node group. Then it feeds the check processing units with word of address 157 (edge memory 2) . The address of 480 is in the T part of the matrix where turbo decoding is applied. Therefore the channel value (a-priori) of bit node group 120 is read from input memory and added to the word of address 480 from the edge memory. The result is feed through the shifter to the CPU unit. The same is done for address 539. The new extrinsic information is saved back to the memory location 0, 157, 480, 539. According to the invention, the result of this computation is used in the following check node updates. For instance check node group 1 comprises the addresses 13, 158, 480, 481. Again, the address of 480 is in the T part of the matrix. The channel value of bit node group 120 is read from input memory and the previous written word of address 480 is read from memory and added together. That means that the result in the T part of the matrix for check node group 0 is used in the subsequent check node updates. As a consequence the algorithm converges faster than the conventional algorithms. To handle all the timing and groups, the block "run_cont" reads an internal ROM memory, containing information of edge configuration and timing information of memories, shifter and adder. Due to the partition of the memory it is possible to read one memory, while the other is being written and vice versa.

E.3.2.2 Bit/Symbol Node Update, performed in Bit Processing Units (BPU)

For bit node update the iteration control block "iter_control" sets the multiplexer to MUXl=I; MUX4=1, MUX6=1; MUX7=1; DEMUX3=1. Now the block "bpu_cont" has control of the input memory, edge memory and cpu/bpu units. The iteration control block sends a start pulse to the block "bpu_cont". The block updates now the extrinsic information in the edge memory with the intrinsic information. This is done in sequential manner by reading and writing bit/symbol node group by bit/symbol node group. For the

example, bit/symbol node group 0 (BNG 0) comprises the addresses 0, 1, ..., 12. The block reads the word of address 0 (edge memory 1) feeds the value to the bit/symbol processing units (BPU), together with a synchronization signal, indicating start of bit node group. The BPU computes the a-posteriori and the intrinsic value. The a-posteriori value is used by the block "bit_stream" . The intrinsic values are saved on the same location where extrinsic value was located before, hence addresses 0, 1, ...,12.

This computation is done for matrix parts A 1 and A 2 . Matrix part T is omitted.

E.3.3. Output phase, i.e. output of data (bit_stream)

During the BPU process the bit decisions are available at the output of the BPU units. If the block "iter_control" decides to stop the iteration, it sends a pulse on the signal "start output" before the last BPU is started. This causes the decisions during the last BPU process to be saved in the output buffer and sent to output.

DECODING PRINCIPLE F

According to Decoding Principle F, for each bit, i.e. for each column of the parity check matrix / LLR matrix, an accumulator is introduced. The accumulator value is saved in an accumulator buffer. The accumulator is initialized with the associated channel information:

A «= L c n ' ( F3 ) and the edge memory is initialized with zero. The accumulator group for channel informations of the subvector L BNG of bit node group n is denoted as A BNG . The accumulator buffer can be the a- priori/channel buffer L c . During the decoding the accumulators contain always the a-posteriori informations.

At every iteration k , decoding sub-iterations corresponding to the number of check node groups are carried out:

• Perform sub-iteration Compute for each check node group the extrinsic informations using the accumulator values A^ and the extrinsic information saved in the edge memory. Additionally compute a new a- posteriori information of BNG n . For this we have first to compute the intrinsic or input information of each check node group for each submatrix. These values are obtained by subtracting the extrinsic information belonging to the permutation matrix participating on the check node group, from the accumulator. Specifically, according to this invention, the cyclic shift is done at the accumulator. o Read the Accumulator buffer and shift to perform the permutation :

according to the associated permutation matrix. o Compute the intrinsic information by subtracting the extrinsic information of the edge memory from the accumulator :

Do this step until all intrinsic or input informations of the current check node group are computed. In the following, these L^ ' s are denoted as L° . (In hardware it is recommended to store these values in a FIFO) . o Compute new extrinsic informations according to equation

For each check equation entry a new extrinsic information is obtained.

§ Update a new a-posteriori information

§ Shift back the a-posteriori information, according to the permutation matrix

§ Distinguish now between one or more permutation matrices in BNG 1 : • In case of one permutation, calculate

• n case of more than one permutation in BNG 1 , add the decoding results belonging to the bit node group / . Perform a backward shift of the corresponding extrinsic information and update the a-posteriori sum:

§ Save the new a-posteriori information A^ and the new extrinsic information ι^ ewextr # • Final Iteration o The a-posteriori information is available after each sub-iteration and corresponds to the accumulator buffer values A BNG . Due to the fact that a-posteriori sum is updated during each subiteration, the decoding process can be stopped at any time.

• Make hard decisions for every bit in bit group x n =sgn\A BNG J

As one can see, the extrinsic informations are not shifted, the shifter is relocated to the accumulator buffer, which has several advantages for the hardware.

Figure 13 shows a suggested hardware architecture for Decoding Principle F. The accumulator memory contains the a-posteriori sum and is initialized with the channel LLR values according to Eq. (F3) . These values are updated at each subiteration . Therefore the accumulator values are loaded to the shifter and shifted according to the associated permutation matrices according to Eq. (F4) . The corresponding extrinsic information is loaded from a edge memory which can be implemented as a FIFO, where read underflows are treated with zero reading. The extrinsic information is subtracted from the shifted a- posteriori value according to Eq. (F5) . The resulting intrinsic value is fed to the CNU unit and to a demultiplexer. If the submatrix consists of one permutation matrix, the intrinsic value from demultiplexer is fed to the FIFO. If the submatrix consists of more than one permutation matrix, the first intrinsic value is fed to the FIFO and afterwards the FIFO is fed with the corresponding negative extrinsic values for that submatrix, this prepares the computation inside the brackets of Eq. (FlO) . Now the CNU computes the new extrinsic values according to Eq. (F6) . These values are saved in the edge memory. At the output of the CNU the new extrinsic value is added to the output of the FIFO and passed to the shifter according to Eq. (F7) . The output of the shifter is fed to an adder. Again it is distinguished between one or more permutation matrizes in one submatrix. If the submatrix is defined via one permutation matrix, the value output value of the shifter is directly loaded to the a-posteriori register (no adding) . If the submatrix is defined via more than one permutation matrix, the first value is loaded to the a-posterior register and the following extrinsic update values are added to that value, as in Eq. (FlO) . The old a-posterior sum in the accumulator memory is

overwritten with the new a-posterior value in the a-posterior register .

As an example, let us assume again the parity check matrix and associated LLR matrix of Eq. (Fl) . Figures 14 and 15 show the associated accumulator memory content and edge memory content, respectively .

Figure 16 shows inputs and outputs of key elements of the Decoding Principle F hardware. As one can see, in order to update the sum for bit 0 one has to compute

where L hJ indicates the new computed extrinsic value. In Fig. 16 the step (table row) "write AO" corresponds to that correct computation.