Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ENCRYPTION METHOD
Document Type and Number:
WIPO Patent Application WO/2007/049066
Kind Code:
A1
Abstract:
The invention provides a method of creating a hash value using a modified hash function, e.g. of the MD4 family. The method comprising the steps of inputting the data into a hash function which comprises a plurality of computational steps, after one or more of said computational steps, applying to a variable output from the step a quasi-group operation in order to fold the variable, and then performing a further said computational step using the folded variable. The quasi-group has the combinatorial structure of a Latin square. The folding operation comprises dividing the variable into a plurality of fewer-bit variables and then using the values of pairs of these to determine a new value from a Latin square representing the quasi-group operation. The new value then forms part (i.e. a number of bits) of the folded variable.

Inventors:
GLIGOROSKI DANILO (NO)
MARKOVSKI SMILE (MK)
KNAPSKOG SVEIN J (NO)
Application Number:
PCT/GB2006/004046
Publication Date:
May 03, 2007
Filing Date:
October 30, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NTNU TECHNOLOGY TRANSFER AS (NO)
JACKSON ROBERT (GB)
GLIGOROSKI DANILO (NO)
MARKOVSKI SMILE (MK)
KNAPSKOG SVEIN J (NO)
International Classes:
H04L9/32; H04L9/30
Other References:
SMILE MARKOVSKI, DANILO GLIGORSKI, VERICA BAKEVA: "Quasigroups and Hash Functions", SIXTH INTERNATIONAL CONFERENCE ON DISCRETE MATHEMATICS, 31 August 2001 (2001-08-31), Bansko, Bulgaria, pages 1 - 8, XP002419057, Retrieved from the Internet [retrieved on 20070207]
DANILO GLIGOROSKI: "Candidate One-Way Functions and One-Way Permutations Based on Quasigroup String Transformations", IACR, 7 October 2005 (2005-10-07), http://eprint.iacr.org/complete/, pages A,1 - 14, XP002419058, Retrieved from the Internet [retrieved on 20070205]
Attorney, Agent or Firm:
FRANK B. DEHN & CO. (10 Salisbury Square, London EC4Y 8JD, GB)
Download PDF:
Claims:
Claims

1 A method of encryption using a hash function, wherein the hash function comprises a step of nonlinear bijective transformation of a variable that was created in a previous step of the hash function.

2 A method as claimed in claim 1, comprising the use of a quasi-group in the performance of the transformation step

3 A method as claimed in claim 2, wherein the quasi-group is non- commutative, non-associative, non-idempotent and without either a left or right neutral element.

4 A method as claimed in claim 3, wherein the quasi-group also non- involutory

5 A method as claimed in any preceding claim wherein the quasi-group is randomly generated.

6 A method as claimed in claim 5, wherein the quasi-group is generated in real time.

7 A method as claimed in claim 5 or 6, wherein the quasi-group is generated using a probabilistic algorithm.

8 A method as claimed in claim 7, wherein the algorithm comprises the multiplication of polynomial functions.

9 A method as claimed in any of claims 2 to 8, wherein the quasi-group is used to perform a folding operation on the variable

10 A method as claimed in any of claims 2 to 9, wherein the combinatorial

structure of the quasi-group is defined by a Latin square.

11 A method as claimed in any of claims 2 to 10 wherein the quasi-group is of order 16.

12 A method as claimed in any of claims 9 to 11 wherein the folding operation is repeatedly applied.

13 A method of finding a hash value of data comprising the steps of: a) inputting the data into a hash function which comprises a plurality of computational steps, b) after a said computational step, applying to a variable output from the step a quasi-group operation in order to fold the variable, and then c) performing a further said computational step using the folded variable.

14 A method as claimed in any preceding claim employed as a modification for hash functions of the MD4 family.

15 A data processing system configured to carry out one or more of the method of any preceding claim.

16 A data carrier comprising software code configured to find a hash value according to the method of claim 13.

Description:

Encryption Method

The present invention relates to encryption of data using hash functions and specifically, but not exclusively, to encryption methods that involve the use of the MD4 family of hash functions. It also relates more generally to improved hash functions and their applications.

A hash function His a reproducible method of turning data (usually a message or a file) into a number suitable for handling by a computer. These functions provide a way of creating a small digital "fingerprint" from any kind of data. The function chops and mixes (i.e., substitutes or transposes) the data to create the fingerprint, often called a hash value. It represents concisely the longer document or other data from which it was computed.

Expressed mathematically, H is a transformation that takes a variable-size input m and returns a fixed-size string, which is called the hash value h. Thus, h = H(m). Such functions are commonly used in cryptography where they should be carefully chosen to have certain additional properties. In particular, H(m) should be relatively easy to compute for any given m, it should be one-way and collision- free. However, they are also used in hash tables to enable the fast look-up of data, error correction where the hash value of data may be used to check if an error has occurred during data transmission, digital signatures, etc.

Ease of computation is necessary to enable the functions to be implemented on conventional computers without excessively slowing down their operation. Particularly in the case of cryptographic hash functions, there are two other important requirements relate to the security of the encryption. A hash function H is said to be one-way if it is hard to invert, where "hard to invert" means that given a hash value h, it is computationally infeasible to find some input m such that H(m) = h. Collision free means that there should not be two non-identical inputs m; and m2 with the same hash value (i.e. where ), or at least it should be computationally infeasible to find such inputs. Finding such collisions is an important step in attacking a hash function used in an encryption method.

As noted above, cryptographic hash functions may be used in the provision

of digital signatures. Since hash functions are generally faster than digital signature algorithms, it is typical to compute the digital signature of a document by computing the signature on the document's hash value, which is small compared to the document itself. Such digital signatures are not just used for financial transactions, but also, for example, when logging on to networks. Passwords are often kept in a file which is "hashed", i.e. encrypted using a hash function.

Examples of well-known hash functions are the MD4 family, which includes MD4, MD5, SHA-I, SHA-256, SHA-512, RIPEMD, RIPEMD-160 and others. These can be found in hundreds of thousands of installed pieces of software. This includes: all popular operating systems (UNIX-like family - System V, BSD, Linux, HP-UX, IBM's AIX, Solaris, Mac OS X, and Microsoft Windows family - Windows NT, Windows 2000, Windows XP, Windows Server 2003) and many others, in their access control and password management operations; all popular database management systems (e.g. MySQL, Oracle, Sybase, SAP, and Microsoft SQL); and all popular programming languages and development platforms (e.g. Java and .Net). A description of how these known hash functions operate is given in Appendix I. Unfortunately, however, during the 15 years of their existence, it has been proved for many of them that they cannot be considered as one-way hash functions that are collision free. Numerous attacks, starting by the attacks of den Boer, Bosselaers and Dobbertin, on MD4 and MD5, and lately polished to excellence by Wang et al. on SHA-I, have shown that the MD4 family of hash functions can no longer defend their status as one-way functions. Further information on these attacks and a summary of the inventors' analysis of them is found in Appendix II.

As will be seen from that appendix, the inventors have recognized that all the above-mentioned attacks exploit one common characteristic of the MD4 family: in every iterative step of the functions, the assignment of a new (32-bit) working variable is done by applying addition modulo 2 32 and left rotation. Both of these operations are invertible, and furthermore, addition modulo 2 32 is a group operation. More generally, the inventors have recognized that the use of steps involving Boolean functions, group functions or rotation will give rise to linear invertibility and thereby compromise the security of the encryption.

Thus, no matter how complicated the unfolding and tracing of all equations

involved in the definition of the hash function might seem, in the search for collisions the equations can be unfolded. Then, by applying classical algebraic routines for reduction of variables and tracing the differences, the equations can be solved as a linear system of equations in a finite field. The inventors have further recognized that if this common characteristic of the MD4 family of hash functions could be removed, then the existing differential attacks on all hash functions from this family would be ineffective.

According to a first aspect of the invention there is provided a method of encryption using a hash function wherein the function comprises a step of nonlinear bijective transformation of a variable that was created in a previous step of the hash function. A bijective transformation is a one-to-one and onto transformation.

Thus, by means of the invention, the above-described types of attack can be prevented because the transformation step is not linearly invertible. Since these attacks are the only ones presently known to be successful against the MD 4 family, the invention is clearly of particular application to encryption methods using such hash functions.

In principle, any transformation can be used to map nonlinearly the temporal 32-bit variable, provided it meets the requirement of being bijective. (But it should not involve the same Boolean functions such as XOR, addition and rotation, since they are already present in the design of MD4 family.) Preferably, however, the invention comprises the use of a quasi-group in the performance of the transformation step.

As used herein, a quasi-group is a groupoid (Q, *) satisfying the law:

(Vu, v G Q)(3 !x, ye Q)(u * x = v, y * u = V).

Hence, a quasi-group satisfies the cancellation laws:

x * y = x * z => y = z , y * x : = z * x -=> y = z

and the equations a * x = b, y * a = b have unique solutions x, y for each a, b e Q. If (Q, *) is a quasi-group, then * is called a quasi-group operation. A more detailed treatment is found in Appendix III.

- A -

The quasi-group (Q, *) is preferably non-commutative, non-associative, non- idempotent and/or without either a left or right neutral element. It preferably has all of these attributes and is preferably also non-involutory. If the quasi-group were, for example, associative then there could be a significant reduction and elimination of variables in a large system of group equations, which would greatly reduce the workload in attack in which collisions are sought. However, provided that the quasi- group is of a sufficiently high order (e.g. 16), it may be generated randomly with an extremely low probability of its being associative. Thus, in many applications it may be desirable to generate the quasi-group randomly. Suitable quasi-groups may be generated using a probabilistic algorithm.

However, it is highly preferred that non-linear processes be used to ensure security, and so preferably the algorithm comprises the multiplication of polynomial functions.

Alternatively, the method may read from a look up table containing a previously prepared nonlinear permutation and use this in the transformation.

However, it is appreciated that where quasi-groups of high order are employed, this will incur significant memory and processing resources.

Preferably the quasi-group is used to perform a folding operation on the variable. (Such an operation will herein be referred to as a "quasi-group fold"). It is particularly preferred that the combinatorial structure of the quasi-group is defined (at least in part) by a Latin square. Thus, in the case of a quasi-group of order 16, the square will consist of sixteen permutations in sixteen rows and sixteen permutations in sixteen columns. The quasi-group fold of 32-bit variable is therefore a permutation of 2 32 elements. The order of the quasi-group maybe selected to suit a given application. Higher orders, such as 32, 128 or 256 can be applied, but there is a cost in terms of memory usage.

The quasi-group fold can be applied to a wide variety of encryption techniques and hash functions. It may be applied in the creation of completely new algorithms, but it is particularly well suited as a modification or "fix" for existing hash functions, and in particular (as noted above) to those of the MD4 family. Such an algebraic structure also has the advantage of enormous combinatorial and structural potential

The variable on which the transformation is applied will most typically be 32 bits in length, in which case, as will be discussed further below, the quasi-group should preferably be of order 16.

The transformation may be applied to as many steps of an algorithm as required. It is possible to apply it just once, but this may not be satisfactory in many cases because an attack could be focused on the remainder, which is not folded. However, in less critical applications it may be applied after say, three, four or five of the other steps of the function. Preferably though, it is repeatedly applied during the algorithm and it may be applied at most or every step where a variable is created. Thus, in the case of one fix for the MD4 family of hash functions, the nonlinear quasi-group folding of variables is preferably applied to the result of every step of the definition of the hash function (other than the quasi-group folding steps themselves). However, in another fix to the same algorithm, the folding technique is applied to just six of the internal iterative steps and on ten steps of the message expansion part. This enables the speed of the function to be comparable with the known function, whilst being secure enough to withstand all known successful attacks.

Such a fix makes infeasible all known successful attacks used for finding collisions in the MD4 family and it gives security guarantees, according to the current level of mathematical knowledge. However, it does not increase the hash output length or change other "known and good" statistical properties of the hash functions. As an additional effect, it makes some other popular attacks (such as the dictionary attack — impracticable). By using a "fix" rather than a completely new algorithm it reduces the need for intervention in millions of lines of source code of the currently used software.

The folding operation preferably comprises dividing a many-bit (e.g. 32 bit) variable into a plurality of fewer-bit variables. For example, a 32-bit variable a in a register may be seen as a concatenation of eight, 4-bit variables au ...,αj.

The quasi-group operation can then be applied to pairs of the fewer-bit variables. For example, in the example above it may be applied to the first and fifth, second and sixth, etc. as shown below:

In this diagram: b\= a\ * as bf= as * α 2 by= as * aη

etc. can be obtained from looking up the value of α / *αj etc. in a Latin square in the well-known manner.

Such an operation of quasi-group folding on the variable a may be termed QFOLD:

QFOLD(α)

From the definition of quasi-group folding operation and from the properties of the quasi-groups the following proposition follows: the operation QFOLD as defined above, is a bijection on the set {0, 1 } 32 .

Thus, viewed from another aspect, the invention provides a method of encrypting data comprising the steps of: a) inputting the data into a hash function which comprises a plurality of computational steps,

b) after a said computational step, applying to a variable output from the step a quasi-group operation in order to fold the variable, and then c) performing a further said computational step using the folded variable. As previously discussed, preferably the quasi-group has the combinatorial structure of a Latin square. The folding operation preferably comprises dividing the variable into a plurality of fewer-bit variables and then using the values of pairs of these to determine a new value from a Latin square representing the quasi-group operation. The new value then forms part (i.e. a number of bits) of the folded variable. It has been seen that, by means of the present invention, fixes to the MD4 family of hash functions may be provided by means of quasi-group transformations. The introduced quasi-group folding will make any algebraic attempt to set up a system of equations that will be easily solved infeasible. As far as the inventors are aware, there are no algebraic methods for solving non-linear systems of quasi-group equations in general (and it is easy to make sure that the quasi-group itself does not have any algebraic property that makes solving the equations easy). Thus, all of the techniques that have proved to be effective for breaking existing hash functions from the MD4 family will become ineffective.

As is always the case, adding a new operation to a hash function will increase the number of computations. However, in the present invention, an optimum balance can be obtained between making the modified MD4 family of hash functions resistant to the current successful attacks and requirements not to decrease the speed of computation of the hash functions too much.

Although the invention is especially useful as a fix for existing hash functions, the undesirable property of algebraic tractability of the differences can be removed, by applying the technique of quasi-group folding, in the design of any new hash function. Moreover, there is made available by the invention, not only one particular one-way hash function, but rather a huge family with more than 2 430 oneway hash functions. The inventors recognise that the invention may present opportunities for fast hardware implementation in the next generation of 32- and 64-bit CPUs.

The invention extends to an a data processing system configured to carry out

one or more of the hash functions described above, such as a computer operating in accordance with the above methods. Thus, as well as providing a cryptographic method, the invention provides a method of data processing comprising the use of a hash function as described above for any purpose including digital signatures, error checking, databases, audio file identification, etc.

It also extends to the hash function of the invention in the form of software loaded onto a computer, or a CPU, which contains instructions to carry out such a method or in the form of a software product (or CPU alone). The software product may be in the form of a physical medium, such as a disk, or in the form of signals transmitted from a remote location, for example by a download over the internet. In addition, the invention extends to a method of storing and/or transmitting data (and corresponding apparatus) in which the data is hashed by means of a hash function of the kind described herein.

Certain embodiments of the invention will now be described, by way of example only, and with reference to the accompanying drawing:

Figure 1 is a graph comparing the speeds of hash functions that are embodiments of the invention with prior hash functions.

The embodiments described here are all fixes for the MD4 family of hash functions in which quasi-group folding of variables a takes place at every step of the definition of the hash function.

Thus, in the first embodiment (using the terminology from Appendix 1) the modified version of MD4 would incorporate the following:

FF : a := QFOLD((a + F (b, c, d) + Z) <<s ), GG : a := QFOLD((a + G(b, c, d) + Z) <<s ),

HH : a := QFOLD((a + H (b, c, d) + Z) <<s ).

The second embodiment is a fix for MD5 in which:

FF : a := QFOLD(b + (a + F (b, c, d) + Z + Y) <<s ),

GG : a := QFOLD(b + (a + G(b, c, d) + Z + Y) <<s ), HH : a := QFOLD(b + (a + H (b, c, d) + Z + Y) <<s ),

II : a := QFOLD(b + (a + 1 (b, c, d) + Z + Y) <<s ).

The third embodiment, known as SHAl-Q, is a fix for SHA-I wherein:

t:=QFOLD(a <<5 +F(b,c,d) + e + W + K),

(a,b,c,d,e):=(t,a,b <<3 °,c,d),

t := QFOLD(a <<5 + H (b, c, d) + e + W + K), (a,b,c,d,e):=(t,a,b <<3 °,c,d),

t := QFOLD(a <<5 + G(b, c, d) + e + W + K), (a, b, c, d, e) := (t, a, b <<30 , c, d),

t := QFOLD(a <<5 + H (b, c, d) + e + W + K) 5 (a,b,c,d,e):=(t 5 a,b <<30 ,c,d)

The forth embodiment, termed SHA -1Q2 is a development of SHAl-Q which has been devised in order to combine the advantages of the invention with the speed of the original SHA-I function. It applies quasi-group folding on ten steps of a modified message expansion part of the SHA-I compression function and then the same folding technique is applied to just six iterative steps,

In this embodiment the notation used by NIST for the description of SHA-I will be employed: FIPS 180-2.

The following operations are applied to 32-bit words in SHA- 1Q2: 1. Bitwise logical word operations: λ, V, θ, - (AND, OR, XOR, NOT).

2. Addition modulo 232.

3. The rotate left (circular left shift) operation, ROTL" (x), where x is a 32-bit word and. n is an integer with 0 < n < 32.

4. The operation of quasi-group folding of a 32-bit word x — x 1X2X3X4X5X5X7X8 (represented as a concatenation of eight 4-bit variables (X 1 , ... , x & ) is defined by the following equations:

QFOLD(Jt) *6 X? X8 ,

x 'l - X] *%5 > χ ' 2 = Xfi *X2 , ϊ 3 = ^ *X7 , x '4 ~ X§ *X4 (1) where the operation * is a l6 x l6 quasi-group operation (here the quasi-group shown in Appendix III is used).

SHA-1Q2 uses a sequence of logical functions,./^ ,/21, /22 ,/23 ,fi4 ,fi5 ■ Each function^? , where 20 < t < 25, operates on three 32-bit words, x, y, and z , and produces a 32-bit word as output. The function 7> (x, y, z ) is defined as follows:

(2) SHA-I Q2 does not use any sequence of predefined constant 32-bit words.

Pre-processing in SHA-I Q2 is completely the same as that of SHA-I . That means that this three steps: padding the message M, parsing the padded message into message blocks, and setting the initial hash value, Bf 0) are the same as in SHA- 1. In the parsing step the message is parsed into Nblocks of 512 bits, and the i-th block of 512 bits is denoted as a concatenation of 16, 32-bit words denoted as Mp , Mp . . , M 15 (i) .

The initial hash value, JH® for SHA- 1Q2 is the same as that of SHA-I and consists of the following five 32-bit words:

Ho (o) = 67452301 H/ 0) = efcdab89 Hf ) = 98badcfe H 3 (0) = 10325476

H 4 (0) = c3d2elfθ. (3)

The SHA- 1Q2 hash computation uses the functions and constants defined above. Addition (+) is performed modulo 232. After pre-processing is completed,

each message block, M® , M® , . . . , M ^ , is processed in order, using the following steps:

For i = 1 to N:

{ 1. the message schedule, {W t } :

2. Initialize the five working variables a, b, c, d, and e, with the (i - l) st hash value and the values of W 25 , W 24 , W 23 , W 22 , W 21 :

d =# 3 (M) +W 22 T , w rr 3. For t=20 to 25:

T = QFOLD(ROTL 5 (a) +^(b,c,d) + e + W t ) e = d d = c c = ROTL 30 (b) b = a a = T

4. Compute the t intermediate hash value H^ . H U =b + H?'- l) H?> = c + H™

}

In Figure 1 we give a comparison between SHA-I, SHA-IQ, SHA-1Q2 and

SHA-2. To obtain the values for SHA-2 we have taken the relative speed comparison between SHA-I and SHA-2 in Crypto++ library (see http://www.cryptopp.com). The label SHA-I Q2CPU is an estimation of what would be the speed of SHA-I Q2 if the operation of Quasi-group Folding were

implemented in the microcode of the CPU as an assembler instruction, i.e. as logical bitwise operations NOT, XOR, AND and OR are implemented in every modern CPU as a single assembler instructions, the same can be done for the operation of Quasi-group Folding in future versions of the modern CPUs. The operation is univariate, and can be implemented in parallel fashion and its execution can take only one or two CPU cycles.

It will be noted that SHA- 1Q2 compares favourably in terms of speed with the existing algorithm SHA-I and the comparison is highly favourable if the embodiment were to be implemented in the microcode of the CPU. Analysis of this embodiment by the inventors has shown it to be highly resistant against collisions, preimages and second preimages after the first iterative step of the compression function. It is pseudo-collision resistant too.

Since by the proposition set out above, the operation QFOLD is a permutation, it is guaranteed that these fixes for the MD4 family of hash functions will not introduce undesirable statistical effects such as shrinking the space of possible outcomes of the hash function or introducing bias of any kind. On the other hand, quasi-group folding will make any algebraic attempt to set up a system of equations that can be easily solved infeasible. The only possible way to trace the equations would be by setting up a system of non-linear quasi-group equations in a quasi-group of order 16 that is non-commutative, non-associative and without a neutral element.

This will now be illustrated by considering one of the equations obtained in Dobbertin, K: Cryptanalysis ofMD4, J. Cryptology (1998) 11: 253271. The equation (4) in that paper is:

F(W 5 V , U ) - F (W 5 V , U ) = z <<13 - Z <<13 It is obtained by these two equations that are applied in step 15 of MD4:

Z := (B + F (W, V , U ) + X 15 ) <<19 ,

Z - (B + F (W 5 V ,U ) + X 15 ) <<19 .

If quasi-group folding is applied, then the equations in that 15th step will be:

Z := QFOLD((B + F (W, V , U ) + X 15 ) <<l9 ),

1 := QFOLD((B + F (W ,V ,ϋ ) + X 15 ) <<19 ).

Let us denote 32-bit variables Z and Z as 8 concatenated 4-bit variables, i.e. Z = Z 1 Z 2 Z 3 Z 4 Z 5 Z 6 Z 7 Z 8 and Z = Z 1 z 2 z 3 Z 4 Z 5 Z 6 Z 7 Zg . Then, to obtain the original values of the variables Z and Z , first these eight quasi-group equations would have to be set:

Z 2 = Z 6 \ Z' 2 ,

Z 2 = Z 6 \ Z \ ,

Z 3 = Z ' 3 / Z 7 , Z 4 = Z 8 \ Z ' 4 ,

and then the equation

F(W 5 V ,U)-F(W 5 V 5 U)= Z <<13 -Z <<13

would hold. More specifically, from the definition of the quasi-group fold, it follows that every referencing of an equation used in some step of the execution of an MD4 hash function introduces at least 4 nonlinear quasi-group equations with 8 unknown variables from the set {0, I 5 ..., 15}. Thus, by applying quasi-group folding in every further step, the final system will end up as a huge nonlinear system of quasi-group equations. Its solution would demand a huge combinatorial effort to try every possible value of the involved variables.

An additional positive effect that should be noted is that, when applying the quasi-group folding technique, there is an enormous number of possibilities for choosing a quasi-group of order 16. Their approximate number can be calculated from [14] and it is more then 2 430 . Thus by applying the quasi-group folding technique on MDx with different and randomly chosen quasi-groups of order 16, we are dealing not only with several one-way hash functions but with several huge families of one-way hash functions. In some protocols for network access this fact can be used to eliminate the well-known dictionary attack since attackers in such a case, in addition to the knowledge of which particular MDx hash function was used, will have to know the quasi-group that is used too, and this can be kept secret.

The quasi-groups used in the embodiments above may be stored in memory or they may be generated as required using a suitable probabilistic algorithm, which is favoured for large quasi-groups. The approach described below is based on R. L. Rivest, Permutation polynomials modulo T ', Finite Fields and Their Appl. 7 (2001), pp. 287-292.

Let the generated quasi-group (Q, *) is of order 2 n . Let take Q = Z 2n . We will find two polynomials P] 5 P 2 -Xi 2n+ I →2-' 2 » + i °f order not bigger then 2" "1 and with coefficients from 2^ n+1 such that:

χ * y == hq*H-l)* f 2 (2X+l)mod 2" + ' j ^

Let us denote by H the following set H= {1 } u {2 ' + 1 \ i = l, n} and the image of H by polynomials P 1 and P 2 by P 1 (H) = {t \ t = P 1 (x) mod 2 " +1 , x e H } andP 2 (H) = {φ = P 2 (» mod 2 n+1 , x e H}.

. The basis for an algorithm that will generate a quasi-group of huge order is the following theorem:

Theorem 1: If the polynomials P x , P 2 :X 2 . + , — >][\ γM of order not bigger then 2 n-1 and with coefficients fromX 2n+ i satisfy the following criterion:

IP; (H)| = \P 2 (H)\ = n + 1 (5)

then if the operation * is defined by the equation (1), the obtained structure (Q, *) is quasi-group of order 2n.

Example 1 : Let us take the order of quasi-group to be 216 = 65536 and let . us choose polynomials of order 3, i.e. polynomials with 4 coefficients from the set/. • Let Pi (x) = 18078;c 3 + 36532/ + 39147x + 91186 mod 2 17 and P 2 (x) = 117675/

+109772/ + 80712 , y + 90990 mod 2 17 . Then if we check the images of the set#=

{1, 3, 5, 9, 17} by polynomials Pi and P% we will get the following sets:

P 1 (H) = {53871, 108017, 52099, 66391, 123711} and P 2 (H) = {5933, 41851,

122433, 73589, 4669. Since \Pi(H)\ = \P(H)\ = 5 i.e. the criterion (5) is satisfied, we conclude that

Jj(2j+l) x f 2 (2y+l)mod 2" x * y ≡ (4)

defines a quasi-group of order 2 .

Appendix I

Operation of MD4 Family

In this section we will start with a brief description of the hash functions MD4, MD5 and SHA-I. The description is not detailed, and the reader can find complete description of them inRϊvest, R.: The MD4 message-digest algorithm, Request for Comments (RFC) 1320, Internet Activities Board, Internet Privacy Task Force, April 1992; Rivest, R.: The MD 5 message-digest algorithm, Request for Comments (RFC) 1321, Internet Activities Board, Internet Privacy Task Force, April 1992; and "Secure Hash Standard", United States of American, National Institute of Science and Technology, Federal Information Processing Standard (FIPS) 180-1, April 1993. We will use the notation used in Dobbertin, R: Cryptanalysis ofMD4, J. Cryptology (1998) 11: 253271. The compression function of MD4 uses three Boolean functions:

F(U, V, W) = (U λ V) v (HLJ λ W), G(U, V , W) = (U λ V ) v (U λ W) v (V λW), H(U, V, W) = U ® V θ W

that act bitwise on 32-bit Boolean vectors U, V and W. It uses two additive constants and it updates the value of the variable a (which is one of four working 32- bit variables a, b, c and d) by one of the three assignments FF (a, b, c, d, Z, s), GG(a, b, c, d, Z, s) and HH (a, b, c, d, Z, s) that are defined as follows:

FF : a := (a + F (b, c, d) + Z ) <<s , GG : a := (a + G(b, c, d) + Z ) <<s , HH : a := (a + H (b, c, d) + Z ) <<s .

MD4 has 3 rounds of 16 steps each, and in each step one of the working 32- bit variables is updated. The compression function of MD5 uses four Boolean functions:

F(U, V, W) = (U λ V) v (-UA W),

G(U, V , W) = (U λ W) v (V λ -W), H(U,V,W) = UθVθW, I(U, V, W) = (U v --W) θ V.

It uses 64 additive constants and it updates the value of the variable a (which is one of four working 32-bit variables a, b, c and d) by one of the four assignments FF(a, b, c, d, Z, Y , s), GG(a, b, c, d, Z 3 Y , s), HH(a, b, c, d, Z, Y , s) and II(a, b, c, d, Z, Y , s) that are defined as follows:

FF:a:=b + (a + F(b,c,d) + Z + Y) <<s ,

GG:a:=b + (a+G(b,c,d) + Z + Y) <<s , HH:a:=b + (a + H(b, c, d) + Z + Y) <<s , II:a:=b + (a + I(b,c,d) + Z + Y) <<s .

MD5 has 4 rounds of 16 steps each, and in each step one of the working 32- bit variables is updated.

Finally, we give brief description of the main components in the definition of SHA-I. It uses the same three Boolean functions as MD4, but in 4 rounds and in the following order:

F(U, V, W) = (U λ V) v (-U λ W), H(U,V,W) = UθVθW, G(U, V, W) = (U λ V) v (U λ W) v (V λ W), H(U, V,W) = U®VθW

It uses four additive constants and has five working 32-bit variables: a, b, c, d and e. Every round has 20 steps, and it uses an internal procedure for message expansion from 16 to 8032-bit variables. Its output is 160 bits. The assignments of working variables are done by the following functions:

t := a <<5 + F (b, c, d) + e + W + K,

(a,b,c,d,e)"(t,a,b <<3 °,c,d),

t:=a <<5 + H(b,c,d) + e + W + K, (a,b,c,d,e):=(t,a,b <<30 ,c,d),

t - a <<5 + G(b, c, d) + e + W + K, (a 3 b,c,d,e):=(t,a,b <<30 ,c ; d),

t := a «5 D + H (b, c, d) + e + W + K, (a,b,c,d,e):=(t,a,b <<3 °,c,d)

where W is a 32-bit variable obtained by message expansion and K can have one of four predefined values of additive constants.

Appendix II

Attacks on the MD4 family

The &st published successful attack on the last two rounds of MD4 was described in den Boer, B., and Bosselaers, A.: An attack on the last two rounds of MD4, Advances in Cryptology, CRYPTO91, Lecture Notes in Computer Science, vol. 576, Springer-Verlag, Berlin, 1992, pp. 194203. One of the crucial parts of their attack is solving 32 equations with 48 unknown 32-bit variables, and then again solving a system of 16 equations with 16 unknown 32-bit variables. Setting up and successfully solving such a system of equations is possible due to the fact that operations GG and HH use easily invertible operations of addition modulo 2 32 and left rotation.

In his attack on the full MD4, Dobbertin, H.: Cryptanalysis ofMD4, J. Cryptology (1998) 11: 253271 used the same principle as den Boer and Bosselaers, with several crucial observations on weak development of differences (caused by the operations of addition modulo 2 32 ) in steps 12-19. In his paper he sets up 8 equations with 14 unknown 32-bit variables for finding inner almost-collisions. Again, obtaining those 8 equations is very easy having in mind invertibility of addition modulo 2 32 and left rotation. Further on in the paper all other analysis and computing efforts to find collisions of MD4 are again expressed by systems of equations heavily exploiting the fact that every step in MD4 involves invertible operations of addition modulo 2 32 and left rotation.

The successful series of attacks on the MD5 hash function started early in 1993 as described in den Boer, B., and Bosselaers, A.: Collisions for the compression function ofMDS, Advances in Cryptology, EUROCRYPT93, Lecture Notes in Computer Science, vol. 765, Springer-Verlag, Berlin, 1994, pp. 293304. Again, with clever techniques of choosing "magic" numbers as a target values to obtain, they analyze the first two rounds of MD5 finding collisions by "walking forward" and "walking backwards" - i.e. tracing the obtained differences, and solving simple linear equations for some parts of the processed message. Part 2.2 of their algorithm explicitly describes equations that are obtained directly from the

equations of MD5 that involves invertible operations of addition modulo 2 32 and left rotation.

Then in 1996, at the rump session of Eurocrypt '96, by the similar techniques as den Boer and Bosselaers, Dobbertin provided even better results on finding collisions for MD5. Again, the crucial point was the fact that setting up equations for tracing the differences was, although complicated, possible due to the fact that the compression function of MD 5 uses the invertible operations of addition modulo 2 32 and left rotation. In 2004 at the rump session of CRYPTO 2004 (as well as at Cryptology ePrint Archive) Wang et al. presented concrete results of breaking MD5, HAVAL-128 and RIPEMD hash functions (see Wang, X, Feng, D., Lai, X, Yu, H: Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD, rump session, CRYPTO 2004 and Wang, X, Feng, D., Lai, X, Yu, H.: Collisions for Hash Functions MD4, MD5, HAVAL- 128 and RIPEMD, Cryptology ePrint Archive, Report 2004/199, first version (August 16, 2004), second version (August 17, 2004), http://eprint.iacr.org/2004/199.pdj)

Shortly after that, Wang at ah, in February 2005, gave a note about their latest findings about SHA-I and the possibility of finding collisions after 2 69 SHA-I hash computations (see Wang, X, Lai, X, Yu, H.,: Collision Search Attacks on SHAl, http://theory.csail.mit.edu/yiqun/shanote.pdf February 2005). Although the approach of Wang at al. is much broader and more complicated, we can say that, again, the basic principle of finding the collisions is exploiting the invertible and linear properties of the functions used in MD4 family of hash functions.

Finally, in August 2005 it was announced that it is possible to find a collision in SHA-I in 2 63 operations. This is due to Professor Xiaoyun Wang of Tsinghua University in Beijing, together with Professors Andrew Yao and Frances Yao. It extends the work of Wang, Yin, and Yu, which demonstrated that a collision could be found in 2 69 operations. The new result was announced at the Crypto 2005 conference in Santa Barbara.

Appendix III

Quasi-groups and quasi-group operations

For completeness, the definitions of a quasi-group and quasi-group operation used in the main text are repeated here.

As used herein, a quasi-group is a groupoid (Q, *) satisfying the law:

(Vu, v G Q)(3!x, y E Q)(u * x = v, y * u = v).

Hence, a quasi-group satisfies the cancellation laws:

x*y = χ*z=>y = z,y*x = z*x=>y = z

and the equations a * x = b, y * a = b have unique solutions x, y for each a, b

G Q. If (Q, *) is a quasi-group, then * is called a quasi-group operation.

An example of a quasi-group of order 16 is shown below:

* 0123456789abcdef

0 a45960el2cdf38b7

1 5bc84e0732fal9d6

2 c52df8ael367b094 3 7d3e21bc59480f6a

4 124ab7890d3e6c5f

5 4a8bd2c6ef597310

6 0ed28365cb749afl

7 b6059d487a23flec 8 d861cafθb592473e

9 2fl 07c5b968dae43 a 6cb7af 1348e0d529 b 8 If63974aec52dθb c f394e62d8701cba5 d e973 lbdf60ac5482 e 30ec549aflb6827d f 97af0532d41be6c8

Here we consider only finite quasi-groups, i.e. Q is a finite set. Closely related combinatorial structures to finite quasi-groups, are the so-called Latin squares: a Latin square L on a finite set Q (with cardinality |Q| = s) is an s x s-matrix with elements from Q such that each row and each column of the matrix is a permutation of Q. To any finite quasi-group (Q, *) given by its multiplication table there is an associated Latin square L, consisting of the matrix formed by the main body of the table, and each Latin square L on a set Q defines a quasi-group (Q, *). Given a quasi-group (Q, *) five new operations, so called parastrophes or adjoint operations, can be derived from the operation *. We will need only the following two, denoted by \ and /, and defined by:

x * y = z <==> y = x \ z <==> x = z /y (1)

Then the algebra (Q, *, \, I) satisfies the identities

x \(x*y) = y, x*(x\y) = y, (x*y)/y = x, (x/y)*y = x (2)

and (Q, \), (Q, I) are quasi-groups too. The quasi-group (Q, \) is called left parastrophe and (Q, /) right parastrophe of (Q, *).

The corresponding (Q, \) and (Q, I) of the quasi-group defined above (1) are given below (left then right):

\ 0123456789abcdef

0 578cl24fd30e9a6b

1 6c9840f73dbl2e5a

2 d829flab5e6c0374

4 801a2ec56734d9bf

5 fe5dθa7c2bl36489

10 6 0f35b76a4cd9821 e

7 2dab63187490f5ec

8 73bec92dla5840f6

9 320fe694a8c75bdl a b6e78d039f421 ca5

15 b elc47b36058fad92 c ab613f5982edc740 d 94f3dc82ela5b607 e 19d054bec67a3f28 f 4a7695dlf02be8c3

20

/0123456789abcdef

0 6e79f0184dca32b5

25 1 4b98d3a02efcl756

2 942635cf0178bead

3 ec3db6fal2470589

4 504cle7baf368d92

5 1207ef96385bda4c

30 6 a78bθc65d92e4f31

7 3fda94bl7c6258e0

8 b8516247ca93e0df

9 fdc07be49385612a a 05f4a82eb7dl96c3

35 b 71a54d3986ef2c0b c 2ale895360bdc4f7 d 836257dcf409abl e e d6e3cl025ba4f978 f c9bf2a8de5107364

40