Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A DATA PROCESSING SYSTEM AND METHOD FOR PERFORMING A MATHEMATICAL OPERATION ON MULTI BIT BINARY INTEGER NUMBERS USING FLOATING POINT ARITHMETIC
Document Type and Number:
WIPO Patent Application WO/2003/083642
Kind Code:
A2
Abstract:
The data processing system and method performs a mathematical operation on multi bit binary integer numbers using floating point arithmetic. The binary integer numbers are divided into corresponding segments and processed to determine at least one mathematical operation product for each segment. Corresponding segments comprise a corresponding group of w bits of the binary integer numbers. Floating point registers store the products. Each floating point register has m mantissa bits, where m > w. A product sum is determined for each segment in a floating point register. A w bit result of the mathematical operation is generated in a floating point register for each segment. Also, a carry product is generated in a floating point register to be carried over to a next segment for use in the determination of the product sum for the next segment. The result and the carry product for each segment is determined by performing floating point rounding to 2w or 2w(i+1) on the product sum in the floating point register, where i is the segment number.

Inventors:
ROBINSON DAVID (GB)
Application Number:
PCT/GB2003/001388
Publication Date:
October 09, 2003
Filing Date:
March 28, 2003
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ROBINSON DAVID (GB)
International Classes:
G06F5/01; G06F7/00; G06F7/48; G06F7/483; G06F7/50; G06F7/52; G06F9/302; G06F7/72; (IPC1-7): G06F7/00
Foreign References:
EP0821303A21998-01-28
US5303174A1994-04-12
Other References:
"USING FLOATING POINT TO PACKAGE LARGER THAN INTEGER DATA FIELDS" IBM TECHNICAL DISCLOSURE BULLETIN, IBM CORP. NEW YORK, US, vol. 35, no. 6, 1 November 1992 (1992-11-01), pages 184-186, XP000314105 ISSN: 0018-8689
MUHAMMAD ALI MAZIDI & JANICE GILLISPIE MAZIDI: "THE 80x86 IBM PC AND COMPATIBLE COMPUTERS, 3d edition, section 23.2: Intel's Pentium" 2000 , PRENTICE HALL , UPPER SADDLE RIVER, NEW JERSEY XP002255947 page 698; figures 23-4 page 700; figures 23-6
Attorney, Agent or Firm:
Collins, John David (57-60 Lincolns Inn Fields, London WC2A 3LS, GB)
Download PDF:
Claims:
CLAIMS:
1. A data processing system for performing a mathematical operation on multi bit binary integer numbers using floating point arithmetic, the system comprising: floating point logic for processing corresponding segments of the binary integer numbers to determine at least one mathematical operation product for each segment, said corresponding segments comprising a corresponding group of w bits of the binary integer numbers; and floating point registers for storing said at least one product, each floating point register having m mantissa bits, where m > w; wherein said floating point logic is adapted to, for each segment, determine a product sum in a said floating point register, and generate a w bit result of the mathematical operation in a said floating point register and a carry product to be carried over to a next segment in a said floating point register for use in the determination of the product sum for the next segment by performing floating point rounding to 2w or 2W (i+l) on said product sum in said floating point register, where i is the segment number starting from zero.
2. A data processing system according to claim 1, wherein said floating point logic is further adapted to combine said results to form a final result of the mathematical operation on the multi bit binary integer numbers.
3. A data processing system according to claim 1 or claim 2, wherein said floating point logic is adapted to perform the floating point rounding by adding and subtracting a factor which includes a component based on the number of mantissa bits m of said floating point register containing the product sum for each segment.
4. A data processing system according to claim 3, wherein said floating point logic is adapted to perform said addition and subtraction using a number > 2 (m').
5. A data processing system according to claim 4, wherein said floating point logic is adapted to perform said addition and subtraction using a number > (2 (m') + C/2W), where O<C<2'.
6. A data processing system according to claim 3 or claim 4, wherein said floating point logic is adapted to perform said addition and subtraction using a number > 2w (m I).
7. A data processing system according to claim 6, wherein said floating point logic is adapted to perform said addition and subtraction using a number > 2w (2 ('') +C/2W), where 0<C<2m.
8. A data processing system according to any one of claims 5,6 or 7, wherein said floating point logic is adapted to perform said addition and subtraction using a number 2w(i+1) (2(m1)+C/2w), where 0#C#2m.
9. A data processing system according to any one of claims 4 to 6, wherein said floating point logic is adapted to perform said addition and subtraction using a number > 2w (i+l) 2 (mt).
10. A data processing system according to claim 4 or claim 5, wherein said floating point logic is adapted to generate said carry product by also multiplying by 2W.
11. A data processing system according to claim 6, wherein said floating point logic is adapted to generate said result by adding 2w 2(m1) to said product sum, subtracting 2w 2 (ml) from the result of the addition, and subtracting the result of the subtraction from said product sum, and to generate said carry product by multiplying the result of the subtraction by 2W.
12. A data processing system according to claim 4, wherein said floating point logic is adapted to generate said carry product by multiplying said product sum by 2W, adding 2 (ml) to the result of the multiplication, and subtracting 2 (m1) from the result of the addition, and to generate said result by multiplying said carry product by 2w and subtracting the result of the multiplication from said product sum or by multiplying said result by 2w and adding the result of the multiplication to said product sum.
13. A data processing system according to claim 7, wherein said floating point logic is adapted to generate said result by adding 2w (2'+ C/2) to said product sum, subtracting 2w (2(m+1) + C/2w) from the result of the addition, and subtracting the result of the subtraction from said product sum, and to generate said carry product by multiplying the result of the subtraction by 2W, where 0 < C < 2m.
14. A data processing system according to claim 6, wherein said floating point logic is adapted to generate said result by subtracting 2w (2(m1) + C/2W) from said product sum, adding 2w (2(m1) + C/2W) to the result of the subtraction, and subtracting the result of the addition from said product sum, and to generate said carry product by multiplying the result of the addition by 2w where 0 < C < 2m.
15. A data processing system according to claim 9, wherein said floating point logic is adapted to generate said carry product by adding 2w(i+1) 2(m1) to said product sum, and subtracting 2w(i+1) 2(m1) from the result of the addition, and to generate said result by subtracting said carry product from said sum product.
16. A data processing system according to claim 8 or claim 9, wherein said floating point logic is adapted to generate said carry product by adding 2W (2(m1) +C/2w), to said product sum, and subtracting 2w(i+1) (2(m1) +C/2w), from the result of the addition, and to generate said result by subtracting said carry product from said sum product.
17. A data processing system according to claim 10, wherein said floating point logic is adapted to generate said carry product by multiplying said product sum by 2W, adding 2(m1), and subtracting 2(m1), and to generate said result by multiplying said carry product by 2w and adding said product sum.
18. A data processing system according to claim 10, wherein said floating point logic is adapted to generate said carry product by multiplying said product sum by 2W, adding (2(m1) +C/2w), and subtracting (2 (ml) +C/2W), and to generate said result by multiplying said carry product by 2w and adding said product sum, where 0 < C < 2m.
19. A data processing apparatus according to claim, 8,15 or 16, wherein said floating point logic is adapted to generate said carry product by adding (2w(i+1) 2(m1) D) to said product sum, and subtracting (2' ('+) 2 ( ') D) from the result of the addition, and to generate said result by subtracting said carry product from said sum product, where D = 2d, d is a positive or negative integer, and 1 > D > 2".
20. A data processing system according to any one of claims 1 to 9,11 to 16 or 19, wherein said w bit result is positive and said floating point logic is adapted to round down.
21. A data processing system according to any one of claims 10,17 or 18, wherein said w bit result is negative and said floating point logic is adapted to round up.
22. A data processing system according to any preceding claim, wherein said floating point logic is adapted to multiply two said multi bit binary integer numbers or multiply two said multibit binary numbers and add or subtract another said multibit binary number to generate a plurality of said mathematical operation products for at least one said segment, and said floating point registers have m mantissa bits, where m>2w.
23. A data processing system according to claim 22, wherein said floating point registers have m mantissa bits, where m > 2w+lg (n) and n is the number of segments of a smaller of the multi bit binary integer number multiplicands.
24. A data processing system according to any one of claims 1 to 21, wherein said floating point logic is adapted to multiply pairs of multi bit binary numbers and to add or subtract the products of the multiplications to generate a plurality of said mathematical operation products for at least one said segment, and said floating point registers have m mantissa bits, where m > 2w + lg (n*r), n is the number of segments of said multi bit binary integer numbers, and r is the number of said multiplicative products being added or subtracted.
25. A data processing system according to any preceding claim, wherein said multi bit binary integer numbers comprise the same multi bit binary integer number, and said floating point logic is adapted to square said multi bit binary integer number.
26. A data processing system according to any one of claims 1 to 21, wherein said floating point logic is adapted to add or subtract said multi bit binary integer numbers, and to generate a single said mathematical operation product for each said segment, and said floating point registers have m mantissa bits, where m>w+round (lg (r) ), where round represents rounding up and r is the number of said multi bit binary integer numbers.
27. A data processing system for performing modular multiplication comprising the data processing system according to any one of claims 1 to 25.
28. An encryption system comprising the data processing system according to claim 27.
29. A decryption system comprising the data processing system according to claim 27.
30. A data processing method for performing a mathematical operation on multi bit binary integer numbers using floating point arithmetic, the method comprising: processing corresponding segments of the binary integer numbers to determine at least one mathematical operation product for each segment, said corresponding segments comprising a corresponding group of w bits of the binary integer numbers; and storing said at least one product in a corresponding floating point register, each floating point register having m mantissa bits, where m > w; determining, for each segment, a product sum in a said floating point register; generating, for each segment, a w bit result of the mathematical operation in a said floating point register and a carry product to be carried over to a next segment in a said floating point register for use in the determination of the product sum for the next segment by performing floating point rounding to 2w or 2W (i+l) on said product sum in said floating point register, where i is the segment number starting at zero.
31. A data processing method according to claim 30, including combining said results to form a final result of the mathematical operation on the multi bit binary integer numbers.
32. A data processing method according to claim 30 or claim 31, wherein the floating point rounding is performed by adding and subtracting a factor which includes a component based on the number of mantissa bits m of said floating point register containing the product sum for each segment.
33. A data processing method according to claim 32, wherein said addition and subtraction is performed using a number > 2 (m').
34. A data processing method according to claim 33, wherein said addition and subtraction is performed using a number > (2 ( ') + C/2W), where 0<C<2"'.
35. A data processing method according to claim 32 or claim 33, wherein said addition and subtraction is performed using a number > 2w 2 (m 1).
36. A data processing method according to claim 35, wherein said addition and subtraction is performed using a number # 2w (2(m1) + C/2w), where 0<C<2"'.
37. A data processing method according to any one of claims 34,35, or 36, wherein said addition and subtraction is performed using a number > 2w(i+1) (2(m1) + C/2w), where 0<C<2"'.
38. A data processing method according to any one of claims 33 to 35, wherein said addition and subtraction is performed using a number > 2w(i+1) 2(m1).
39. A data processing method according to claim33 or claim 34, wherein said carry product is generated by also multiplying by 2W.
40. A data processing method according to claim 35, wherein said result is generated by adding 2'2 ('') to said product sum, subtracting 2w ('') from the result of the addition, and subtracting the result of the subtraction from said product sum, and said carry product is generated by multiplying the result of the subtraction by 2W.
41. A data processing method according to claim 33, wherein said carry product is generated by multiplying said product sum by 2W, adding 2 ('') to the result of the multiplication, and subtracting 2 ('') from the result of the addition, and said result is generated by multiplying said carry product by 2w and subtracting the result of the multiplication from said product sum or by multiplying said result by 2w and adding the result of the multiplication to said product sum.
42. A data processing method according to claim 36, wherein said result is generated by adding 2w (2 (ml) + C/2W) to said product sum, subtracting 2w (2+ C/2") from the result of the addition, and subtracting the result of the subtraction from said product sum, and said carry product is generated by multiplying the result of the subtraction by 2W, where 0<C<2m.
43. A data processing method according to claim 35, wherein said result is generated by subtracting 2w 2 (m1) from said product sum, adding 2w 2(m1) to the result of the subtraction, and subtracting the result of the addition from said product sum, and said carry product is generated by multiplying the result of the addition by 2W.
44. A data processing method according to claim 38, wherein said carry product is generated by adding 2w(i+1) 2 to said product sum, and subtracting 2w(i+1) 2(m1) from the result of the addition, and said result is generated by subtracting said carry product from said sum product.
45. A data processing method according to claim 37 or claim 38, wherein said carry product is generated by adding 2w(i+1) (2(m1) + C/2W) to said product sum, and subtracting 2w(i+1) (2(m1) + C/2W) from the result of the addition, and said result is generated by subtracting said carry product from said sum product.
46. A data processing method according to claim 39, wherein said said carry product is generated by multiplying said product sum by 2w, adding 2 (m1), and subtracting 2 (ml), and said result is generated by multiplying said carry product by 2w and adding said product sum.
47. A data processing method according to claim 39, wherein carry product is generated by multiplying said product sum by 2W, adding (2'+C/2), and subtracting (2 (''+C/2W), and said result is generated by multiplying said carry product by 2w and adding said product sum, where 0 < C < 2m.
48. A data processing method according to claim 37,44 or 45, wherein said carry product is generated by adding (2w(i+1) 2(m1) D) to said product sum, and subtracting (2w (i+l) 2 (mu D) from the result of the addition, and said result is generated by subtracting said carry product from said sum product, where D = 2d, d is a positive or negative integer, and 1 > D > 2W.
49. A data processing method according to any one of claims 30 to 38,40 to45 or 48, wherein the w bit number is positive and said floating point rounding rounds down.
50. A data processing method according to any one of claims 39,46, or 47, wherein the w bit number is negative and said floating point rounding rounds up.
51. A data processing method according to any one of claims 30 to 50, wherein two said multi bit binary integer numbers are multiplied or two said multi bit binary numbers are multiplied and the result is added or subtracted with another said multi bit binary number, to generate a plurality of said mathematical operation products for at least one said segment, and said floating point registers have m mantissa bits, where m>2w.
52. A data processing method according to claim 51, wherein said floating point registers have m mantissa bits, where m > 2w+lg (n) and n is the number of segments of a smaller of the multi bit binary integer number multiplicands.
53. A data processing method according to any one of claims 30 to 50, wherein pairs of multi bit binary numbers are multiplied and the products of the multiplications are added or subtracted to generate a plurality of said mathematical operation products for at least one said segment and said floating point registers have m mantissa bits, where m > 2w + Ig (n * r), n is the number of segments of said multi bit binary integer numbers, and r is the number of said multiplicative products being added or subtracted.
54. A data processing method according to any one of claims 30 to 53, wherein said multi bit binary integer numbers comprise the same multi bit binary integer number, and said multi bit binary integer number is squared.
55. A data processing method according to any one of claims 30 to 50, wherein said multi bit binary integer numbers are added or subtracted, and a single said mathematical operation product is generated for each said segment, and said floating point registers have m mantissa bits, where m>w+round (lg (r) ), where round represents rounding up and r is the number of said multi bit binary integer numbers.
56. A data processing method of performing modular multiplication comprising the data processing method according to any one of claims 30 to 54.
57. An encryption method comprising the data processing method according to claim 56.
58. A decryption method comprising the data processing method according to claim 56.
59. A processing system for using floating point arithmetic, the system comprising: a program memory containing processor readable instructions; and a processor for reading and executing the instructions contained in the program memory; wherein said processor readable instructions comprise instructions for controlling the processor to carry out the method of any one of claims 30 to 58.
60. A carrier medium carrying computer readable instructions for controlling a computer to carry out the method of any one of claims 30 to 58.
61. A processor instruction set for controlling instructions carried out by a processor, the instruction set comprising at least one of : a round instruction to round or round and shift a product sum in a floatingpoint register to a nonzero power of 2 to yield a carry product stored in another floatingpoint register; a mask instruction to mask a product sum in a floatingpoint register to a specified number of bits to yield a result in another floatingpoint register; and a combined mask/round instruction to take a product sum in a floatingpoint register and a specified power of two to yield a carry product stored in another floating point register and a result stored in a further floatingpoint register.
62. A processor for processing binary numbers using floating point registers, the processor being adapted to operate in accordance with an instruction set comprising at least one of : a round instruction to round or round and shift a product sum in a floatingpoint register to a nonzero power of 2 to yield a carry product stored in another floatingpoint register; a mask instruction to mask a product sum in a floatingpoint register to a specified number of bits to yield a result in another floatingpoint register; and a combined mask/round instruction to take a product sum in a floatingpoint register and a specified power of two to yield a carry product stored in another floating point register and a result stored in a further floatingpoint register.
63. A carrier medium carrying an instruction set for a processor in accordance with claim 61.
64. A data processing system comprising a processor having floating point registers for implementing the instruction set of claim 61 and a memory for storing the instruction set.
65. A cryptographic system comprising a processor having floating point registers for implementing the instruction set of claim 61 and a memory for storing the instruction set.
Description:
A DATA PROCESSING SYSTEM AND METHOD FOR PERFORMING A MATHEMATICAL OPERATION ON MULTI BIT BINARY INTEGER NUMBERS USING FLOATING POINT ARITHMETIC The present invention generally relates to a data processing system and method for performing a mathematical operation on large integers using floating point operations.

There are many applications in which it is necessary to perform mathematical operations on large integer numbers in a data processing system. A primary example of this is encryption. Large integer numbers are used in encryption techniques for increased security.

For example, in the field of public key cryptography a number of different methods for encryption are known and these are based on modular arithmetic of large integer numbers. A widely used system is RSA as described in US patent no. 4,405, 829. In this system messages are represented as large integers M in the range 0 to N-l, where N is a composite number of the form p x q, where p and q are prime numbers. An encryption key e is chosen, relatively prime to p-1 and q-1. The encryption operation is: C = M'mod N and the decryption operation is: mi = Cd mod N where d is the decryption modulus. The usefulness of RSA comes from the fact that d cannot be deduced from knowledge of e and N except by factorization of the composition number N. If N is very large then that factorization is not feasible with currently available computer technology. The decryption key d can be found from the secret factors p and q, as the multiplicative inverse of e :

d = e~'mod X (p-1, q-1) where k (p-1, q-1) is the least common multiple of the numbers p-1 and q-1.

Considerable work in the prior art has been devoted to the efficient implementation of modular exponentiation. Modular exponentiation can be effected using the basic arithmetical operations on large numbers, namely addition, subtraction, multiplication, division and modular reduction. Large numbers are handled as arrays of smaller words; for examples many implementations would process a 512 bit number as an array of 16 words of 32 bits each. Multiplication of two 512 bit numbers would then entail 256 multiplications of 32 bit words.

Arithmetic operations carried out on such large numbers are performed in the prior art using integer operations since integer shifting operations are provided for in such integer operations.

It is an object of the present invention to provide for faster mathematical operations on large integer numbers in a data processing system.

In accordance with a first aspect of the present invention, a data processing system and method is provided for performing a mathematical operation on large multi bit binary integer numbers utilizing floating point arithmetic. The benefit of using floating point operations is that larger registers are available and the operations are faster. For example, on many CPUs one floating point multiplication and one addition can be executed every cycle, whereas an integer multiplication operation can be started only once every four or five cycles.

The disadvantage of using floating point operations is that floating point units lack the integer shifting operation that is used for propagating carries in large number arithmetic.

The present invention overcomes this limitation in order to exploit the advantages of utilizing floating point operations. This is achieved by operating on the binary numbers in segments, where each segment comprises w bits of a binary integer number.

Corresponding segments of the two binary integer numbers are processed to determine at least one mathematical operation product for each segment. The products are stored in floating point registers having m mantissa bits, where m > w. A product sum for each segment is determined. A w bit result of the mathematical operation is generated in a floating point register and a carry product to be carried over to a next segment is generated in a floating point register for use in the determination of the product sum for the next segment. These are determined by performing floating point rounding to 2w or 2' ('+') of the product sum in the floating point register, where i is the segment number.

The present invention is applicable to any mathematical operation such as multiplication, division, addition and subtraction. For example the present invention is applicable to the multiplication of two binary numbers, the addition or subtraction of any number of binary numbers, the multiplication of two binary numbers and the addition or subtraction of one or more binary numbers, or the multiplication of pairs of binary numbers and the addition or subtraction of the products of the pairs.

Thus in accordance with this aspect of the present invention, the mathematical product of the large integer numbers is generated on a column by column basis, i. e. on a segmented basis. The or each intermediate product generated as a result of the mathematical operation on the corresponding segments need not be generated on a segment sequential basis, i. e. from a first segment through to a last segment. The intermediate products can be determined in any order, thus allowing some degree of parallelization for the processing of these intermediate products. For example, a mathematical operation product for four segments can be generated simultaneously in a processing providing for four simultaneous floating point calculations. Alternatively, two mathematical operation products for one segment and two mathematical operation products for another segment can be calculated simultaneously in such a processor.

Thus the processing of intermediate products can be carried out on a block by block basis. This provides a further enhancement on the processing speed provided by the use of floating point operations.

In accordance with the present invention, in order to generate a carry and to output a result for a segment (column) since there is no shifting operation available using

floating point units, the present invention provides such an effect by using floating point rounding. By adding a large number and subtracting a large number or vice versa, it is possible to mimic the effect of shifting and masking in integer units.

In one embodiment of the present invention wherein the mathematical operation is multiplication, the number of bits for the output result is equal to the sum of the number of bits of the binary numbers. For example, if two binary numbers are of the same length, i. e. n x w where n is the number of segments of length w that the binary number is divided into, the output binary number as a result of the multiplication of the two binary numbers will be of length 2nw. The output result is formed by combining the generated results for each segment.

In one embodiment of the present invention, the floating point rounding is achieved by adding and subtracting a factor which includes a component based on the number of mantissa bits m of the floating point register containing the product sum for each segment.

The present invention encompasses any way in which the floating point rounding can be effected to mimic the shifting and masking operation provided in integer logic units.

The floating point rounding operation can be set to round up or round down as necessary dependent upon the manipulation of the product sum to bring the floating point rounding into effect. When the w bit result is positive rounding up is used. When the w bit is negative rounding down is used.

In one embodiment in which the mathematical operation comprises multiplication or combined multiplication and adding or subtracting, the floating point registers for storing intermediate products, and the product sums have m mantissa bits where m > 2w. In a preferred embodiment m > 2w + [lg (n) ] where n is the number of segments into which the smallest of the multi bit binary number multiplicands is divided.

In another embodiment in which the mathematical operation comprises addition or subtraction of multiplicative products of pairs of numbers the floating point register for storing the intermediate result for each segment and the product sum for each segment

has m mantissa bits where m > w + Ig (n * r), where n is the number of segments of the multi bit binary numbers is divided, and r is the number of products being added or subtracted.

In another embodiment of the present invention in which the mathematical operation comprises addition or subtraction, the floating point registers for storing the intermediate product for each segment and the product sum for each segment has m mantissa bits, where m > w+round (lg (r) ), where r is the number of binary numbers being added or subtracted and round indicates rounding up. Thus in this embodiment of the present invention the floating point register need only have one additional bit to hold a single carry bit when adding two binary numbers. The additional bits represent carry bits in addition and in subtraction these represent borrow bits.

The processing can be carried out in any processing means for floating point processing, such as floating point logic which can be in one or a number of floating point logic units.

The present invention is application to any data processing system and method which requires execution of mathematical operations on large integer numbers. The present invention is particularly suited for application in a data processing system performing modular multiplication. Such a data processing system is an encryption system or a decryption system.

The present invention is particularly suited for implementation on a computer capable of performing floating point operations using floating point registers. The present invention can thus be implemented as computer code provided to such a computer or processing system. Thus the present invention encompasses computer readable code for controlling a computer or processing unit to carry out the method of the invention. The computer readable code can be provided on any suitable carrier medium. The carrier medium can comprise a transient carrier medium such as an electrical, optical, microwave, acoustic or radio frequency signal. For example, the signal can comprise a signal carried over a network, e. g. an IP encoded signal carrying computer code over an IP network, such as the Internet. The carrier medium can also comprise a storage

medium such as a floppy disk, hard disk, CD-ROM, magnetic tape device, or solid state memory device.

Embodiments of the present invention will now be described with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of a multiplication device for multiplying two large numbers (xn l... x0) and (yn-i... yo) to give (Z2n-l zo) ; Figure 2 shows the combination of basic multiplication operations necessary to multiply two large numbers; Figure 3 illustrates in more detail the operations required to multiply two numbers segmented into four segments (x3... x0) and (y3... y0) to generate the result as an eight segment result (z7... z0) ; Figure 4 is a diagram illustrating the size of the floating point registers used, i. e. the number of mantissa bits; Figure 5 is a flow diagram of the method of generating results and carry products by masking and shifting; Figure 6 is a diagram to illustrate the content of floating point registers during the shifting and masking operation of a carry unit in accordance with a first embodiment of the present invention; Figure 7 is a diagram illustrating the content of shift registers during a shifting and masking operation by a carry unit in accordance with a second embodiment of the present invention; Figure 8 is a diagram of the carry unit of the first embodiment of the present invention;

Figure 9 is a diagram of a carry unit in accordance with the second embodiment of the present invention; and Figure 10 is a diagram illustrating the operations for performing addition or subtraction in accordance with an embodiment of the present invention.

Figure 1 is a schematic diagram showing a multiplication device for multiplying two large integer binary numbers x and y. Each number is divided into n segments (xn l... xo) and (y"_1... yo). These are processed to produce partial products or intermediate products. In this example which performs multiplication, a partial product array 1 is generated. This array is shown in more detail in Figure 2 and comprises an array of parallelepipoid shape having (2n-1) columns. The array 1 is generated by multiplying each segment in one number by one or more segments in the other number. The partial product array is stored in a number of floating point registers and as shown in Figure 2 each column corresponding to a segment is summed to produce a product sum in an accumulator array (acc a2n_2... acc ao). Each product sum in the accumulator is then input into a respective carry unit 2 to generate a corresponding output z and a carry to the next column to be input into the next accumulator, i. e. to be added to the product sum for the next segment. Thus the array of carry unit 2 carry out the masking and shifting function to generate a carry and output the result. The carry units inherently have to execute their functions sequentially from a least significant bit to a most significant bit. In order to do this they require the product sums in the accumulators.

However, in order to generate the partial product array 1, it is not necessary for the partial products to be generated in any order. Thus this enables a processor capable of performing parallel floating point operations to populate the partial product array 1 in any convenient and efficient manner. For example, the array can be calculated in blocks, rows, or columns. All of the array need not be calculated before a product sum is determined for accumulators containing the least significant bits. The only requirement is that the accumulators contain the product some for all of the partial products and the carry product from the preceding segment before the content of the accumulator can be used by the carry unit 2.

Figure 3 shows the process illustrated briefly in Figure 2 in more detail for the multiplication of two integer numbers which are segmented into four segments. The intermediate products are stored in floating point registers which have the size of at least twice the segment size. In the diagram illustrated in Figure 3 for simplicity the floating point registers are illustrated has having a size which is twice the segment size. In fact, as illustrated in Figure 4, the floating point register size is slightly greater than twice the size of the segment.

As illustrated in Figure 4, a segment is 29 bits in length (i. e. w = 29). The register has 64 mantissa bits (i. e. m = 64). This provides for 6 overflow bits. During multiplication of two segments an intermediate product of length 2w is generated. However, a product sum is required to be determined for each segment and this can comprise a sum of n intermediate products (where n = 4 in the example given in Figure 3). Thus, the summing of the intermediate products to provide a product sum can result in a number with more than 2w bits. The overflow bits provide for this overflow.

As described hereinabove, the intermediate products such as Y2 XO Yl X1 can be calculated at any time to generate the partial product array 1. The output of the segments Z7.... ZO is however required to be carried out sequentially.

The process will now be described with reference to the flow diagram of Figure 5.

The process starts at the first column, i. e. the first segment (step S1). The intermediate products for the column are generated (step S2) which in this example comprises the multiplication of segments XO and Y0. This generates a 58 bit number in the 64 mantissa bit representation held by the floating point register. Since this is the first column (step S3) the lower half, i. e. lower 29 bits of the floating point register are masked and output as a result (ZO). The product sum in the shift register is then shifted to the right as indicated by the arrow in Figure 3 so that the upper bits are shifted to the right by 29 bits as indicated by the shading in Figures 3 and 4. This comprises the shift or carry product (step S6). If the next column (segment) is not the last column (segment) (step S7) the process moves on to the next column (step S8) and the products for the column are generated (step S2). Thus, as can be seen in Figure 3, the next

products are Yl XO and YO X1. The intermediate products and the carry product are then added to produce a sum product which can comprise more than 58 bits, i. e. more than 2w bits due to the sum operation of 2w bit numbers (step S4).

The sum product is then masked and the lower 29 bits output as a result Z1 (step S5).

The sum product is then shifted to the right by w bits as illustrated by the arrow in Figure 3 and as illustrated by the change in the shaded parts in Figure 4 (step S6).

This process is repeated for each of the columns to produce sum products which are the sum of the intermediate products i. e. the sum of columns in the partial product array 1 plus the carry product provided by shifting the sum product obtained from the previous column.

In the last column the shift or carry product is output as a result Z7 (step S9).

Although in this embodiment, in step S2, the products for each column is generated cyclically, the partial product array column need not be generated sequentially and can be generated in any order so long as they are ready to be used in a generation of a sum product for a column.

So far, this embodiment of the present invention has been described without giving any details of how the shift and mask operation is performed. Because floating point registers are used, integer shift operations and integer mask operations cannot be performed. Thus although the terms shift and mask has been used, these describe the effect rather than the actual operation carried out on the floating point registers.

The shifting and masking is performed using floating point rounding. The floating point rounding is achieved by adding and subtracting a large number to the product sum. Floating point rounding can be achieved by setting the rounding mode to round up or to round down depending upon the operations performed on the product sum to bring about floating point rounding.

Before describing the various embodiments for implementing floating point rounding, it should be remembered that a floating point variable (or register) is a value in a program that is stored in floating point format. The floating point format comprises a sign bit s, mantissa bits m and an exponent e. Thus a floating point variable can be represented as s x m x 2e. The sign is either +1 or-1. The exponent is an integer and the mantissa is a fractional number in the range 1 < m < 2. The precision of m and the range of e depend upon the particular format. For example, -1 x 1.5 x 25 represents the number-48. In the present invention the mantissa bits are used to store bits of information representing the value of a group of bits of a multi bit integer number.

A first embodiment of the present invention will now be described with reference to Figure 6 and Figure 8.

In this embodiment the product sum input to the carry unit has a value t such that 0 t < float (2m). Thus, the input is a m bit number as illustrated in the first floating point register 1 in Figure 6. An intermediate product u is generated in a second floating point register 2 by adding 2''\ As can be seen in Figure 6, this effectively provides a shift to the right. Intermediate product u is then modified by subtracting off the same number 2W (m-i) to provide a mask. It can be seen that by adding and subtracting the large number 2W (m-i) floating point rounding brings about the removal of the least significant w bits. The mask can then be used by subtracting from the original product sum 1 to generate the output result. The mask 3 can also be used to generate the shift or carry product by multiplying the mask 3 by 2-W.

These operations can be given by the following equations: <BR> <BR> <BR> <BR> <BR> <BR> u=t+2W2 (m)<BR> <BR> <BR> <BR> <BR> <BR> u=U2W2 (m~1) result R = t-u carry product c = u * 2-w where u is an intermediate product, t is the product sum, w is the number of bits in a segment and m is the number of mantissa bits in the floating point register.

Figure 8 is a schematic diagram of a carry unit to implement this embodiment of the present invention. The sum product is input as a floating point value of size greater than 2w. An adder 10 adds a factor B (where B=2w(m-1)). A subtracter 11 then subtracts off the factor B. The output of the subtracter 11 is then subtracted from the input product sum by a subtracter 12 to generate the output result of w bits. The output of the subtracter 11 is also used to be multiplied with 1/W (where W = 2W) in a multiplier 13 to generate the carry product.

In this embodiment of the present invention, the rounding mode is set to be down or towards zero to round the number in the floating point register down.

This embodiment of the present invention operates for an input product sum having a value of 0 to 2m. Where it is possible that the input could be between-C and 2m-C, in order to ensure the number becomes positive to enable the correct rounding down operation, in such an embodiment the factor B becomes 2' ('-') + C. In other words, a factor of C is added to ensure that the range shifts to 0 to 2m. The factor C can be any number between 0 and 2m. Since this additional factor is added and then subtracted, it has no effect on the result except to ensure that negative numbers can be handled by this embodiment.

This modification to the first embodiment of the present invention can be represented by the following equation: u = t + 2w (2 (--') + C/2W) which reduces to u = u +2w 2(m-1) + C u = u-2" (2+ C/2") which reduces to u-u-2 2-C result R = t-u carry product c = u * 2-w In order to generate the final result, each output result in each column has to be combined. In this embodiment, since the value of the number in each segment only represents the value of the bits and not its true value i. e. there is no information on the position of the segment, in order to reconstruct the final result, each segment must be

multiplied by w2(i+1) where i is the segment number from 0 to I-1, where I is the total number of segments which is 2n in the embodiment of figure 3 in which the number of bits n of the two integer numbers is the same. Where the number of bits of the two integer numbers are different i. e. n1 and n2, then I = ni + n2.

A second embodiment of the present invention will now be described with reference to Figures 7 and 9. This embodiment utilises the capability of a CPU which has a"fused" multiply and add function i. e. one that can calculate A*B+C as one operation.

As can be seen from Figure 7, the input product sum in a 64 bit floating point register (A) is first multiplied by 2-W. This has the effect of placing a decimal point to the left of the wth mantissa bit. The simultaneous addition of 2 (m-1) shifts the highest 35 bits to the right by 29 bits. The subtraction of the same factor of 2 ('-lu removes the added function and brings about the completion of the shift operation. Floating point rounding has thus removed the w least significant bits. This is the shift or carry product. This shift or carry product is then multiplied by -2w to generate a mask. This multiplication by -2w shifts the w least significant mantissa bits up by w bits and the sign bit s indicates that the number is negative. The mask is then added to the product sum to provide the output result.

The equations below define the processing steps involved in this embodiment of the present invention: u = t * 2-w + 2(m-1) carry product c = u-2 (m-l) result R = c * (-2W) + t where u is an intermediate product, and p is the product sum.

Although in this embodiment the shift product is multiplied by -2w to generate a negative mask which can then be added to the product sum to generate an output, the equivalent operation is to multiply the shift product by 2w to generate a positive mask which must then be subtracted from the product sum to generate the output result.

Figure 9 is a diagram illustrating a carry unit for implementing this embodiment of the present invention. This embodiment can receive as an input the same input sum product having a value of 0 to 2m (i. e. the number of bits is greater than 2w). This input is input into a fused multiply and add unit 20 to multiply the product sum by 1/W (where W = 2W) and to add a factor B where B=2''\ The output of the fused multiply and add unit 20 is then input to a subtracter 21 to subtract the factor B. The output of the subtracter 21 is then output as the carry product from the carry unit 2. It is also input to a fused multiply and add unit 22 to multiply by-W and to add the product sum to generate the output result.

In this embodiment it is also possible to ensure that negative numbers can be handled where the input sum ranges from-C to 2m-C, by modifying the factor B to B+C/2W, where C is a number from 0 to 2m. This has the same effect as in the first embodiment of multiplication by 2w since the intermediate product has been divided by 2W.

This modified embodiment can be represented by the following equations: u = t * 2-w + (2(m-1) + C/2w) carry product c = u- (2 (--') + C/2W) result R=c* (-2") +t In this embodiment of the present invention, the rounding mode is set as in the first embodiment of the present invention to round down or towards zero.

The final result in this embodiment is generated in the same way as in the first embodiment.

A third embodiment of the present invention will now be described which is effectively the inverse of the first embodiment of the present invention and requires the rounding mode in the floating point arithmetic to be set to rounding up. This embodiment of the present invention takes negative input numbers into the carry unit 2. The product sum thus lies in the range-2m and 0.

The following equations define the steps in carrying out this embodiment of the present invention: <BR> <BR> <BR> <BR> <BR> <BR> u=t-2w2 (m-1)<BR> <BR> <BR> <BR> <BR> u = u + 2W 2 (m-1) result R = t-u carry product c = u * 2-w It can thus be seen from the above equation that in this embodiment of the present invention the factor 2w 2 ('-') is first subtracted and then added. Thus in this embodiment of the present invention, negative numbers are operated upon and the rounding mode is required to be set to round up in order to carry out the necessary rounding function.

This third embodiment of the present invention is similar to the first and second embodiments of the present invention in that the segments are taken without maintaining their absolute value. Thus in order to recombine the segments in order to generate the final output result, each segment must be multiplied by a factor 2w(i+1), where i is the segment number from 0 to 2n-1.

In this third embodiment it is also possible to ensure that positive numbers can be handled where the input sum ranges from-2m+C to C, by modifying the factor 2w to 2w +C/2W, where C is a number from 0 to 2m.

This modified embodiment can be represented by the following equations: u = t-2" (2 +C/2") which reduces to u = t - 2w 2(m-1) - C u = u + 2w (2 (--') + C/2w) which reduces to u = u + 2w 2(m-1) + C result R = t-u carry product c = u * 2-w In this embodiment of the present invention the rounding mode is set to round up. The modification ensures that the number is negative and rounding errors are avoided.

A fourth embodiment of the present invention will now be described. This embodiment of the present invention differs from the previous embodiments of the present invention in that the floating point variables for each segment maintain their absolutevalue i. e. each floating point variable increases by a factor of 2W. This has the advantage that when calculating the carry product there is no need to divide by 2W. Thus the floating point number input to the carry unit i. e. the product sum is a number between 0 and 2m2wi, where i is the segment number from 0 to 2n-1. Thus the output is of the same format except in the range 0 to 2'2', since only w mantissa bits are output. The carry product is in the range of 0 to 2m2wi.

The following equations define the operations to execute this embodiment of the present invention: u = t + 2w(i+1) 2(m-1) carry product c = u - 2w(i+1) 2(m-1) result R = t-c It can thus be seen that in this embodiment of the present invention in order to calculate the output result there is no need to divide by 2W. However, in order to provide this saving the number being added and subtracted increases in size for each segment by 2W.

This algorithm also has the advantage of not requiring any multiplications in order to reconstruct the final output result. Each segment is represented absolutely by the floating point value and thus the floating point values for each segment can be added to generate the final output result.

In this fourth embodiment it is also possible, where the input sum ranges from-C to 2m - C, to modify the factor 2 to 2 (m-1) +C/2w, where C is a number from 0 to 2m.

This modified embodiment can be represented by the following equations: u = t + 2W (i+1) (2(m-1) +C/2w) which reduces to u = t + (2' 2) +2'C carry product c = u - 2w(i+1) (2(m-1) + C/2W)

which reduces to c = u- (2w(i+1) 2(m-1)) - 2wiC result R = t-c Thus this modified fourth embodiment is similar to the modified first embodiment except that each segment maintains its absolute value.

A further modification to the fourth embodiment of the present invention is the multiplication of each segment by D, where D = 2d, d is a positive or negative integer, and I>D>2-w,. This shifts the segments to fractional values and makes use of the full range of mantissa bits. Some floating point processors provide more bits for fractions than for integers and hence the degree of shifting will depend on the processor design.

The following equations define the operations to execute this modified embodiment of the present invention: u = t + 2WO+1) 2 (m-1) D carry product c = u-2w ('+') 2 (m-1) D result R = t-c In this embodiment of the present invention the segments maintain their value relative to the first segment but do not maintain their absolute value. A final step is required to multiply each segment by 1/D. Although this method includes an additional step to recover the absolute value of the final result, it benefits from making better use of the full range of mantissa bits provided by the processor. Thus this method enables full use of the capability of the floating point registers for large integer number such as those used as keys in encryption.

The floating point operations described hereinabove apply to any sum product generated as the result of any form of mathematical operation. Although the embodiment illustrated and described with reference to Figures 1 to 4 refers to the multiplication of two large integer numbers, the present invention is applicable also to division, addition and subtraction.

Figure 10 illustrates the operations in performing addition or subtraction of two integer numbers segmented into four segments (x3... x0) and (y3... y0). In this embodiment of the present invention the floating point register required to store the result of the additional subtraction only requires an additional bit to store a carry or borrow. This is indicated by the shaded area in the diagram. Thus, if the segments are 29 bits in length, the 29 bits of the first column are output as z0 and shift and mask operation is used to provide the output and the carry product. In this embodiment, the carry product simply carries the single carry bit or borrow bit. Thus in this embodiment, the floating point register only needs m mantissa bits where m=w+1. The process of performing the masking and shifting operation described hereinabove with reference to the multiplication embodiment is equally applicable to this embodiment. The only difference is that the floating point register need not be as large since it only needs to accommodate additional mantissa bit to include the carry or borrow bit.

Although the embodiment of figure 10 illustrates the application of the present invention to addition of two binary numbers, the present invention is applicable to the addition and subtraction of more than two binary number. The number of mantissa bit required increases for the number of binary numbers being added or subtracted to allow for more carry or borrow bits. For example the addition or subtraction of 3 binary numbers requires 2 carry bits as does 4, whereas 5 requires at least 3,8 requires at least 3,9 requires at least 4, etc.

Although the present invention has been described with reference to the multiplication of 2 multi bit integer numbers, the present invention is of course applicable to a case where the number is the same i. e. to performing the squaring of a number. Further, although in the examples given the size of the 2 multi bit integer numbers is the same, i. e. they are of the same bit length, the present invention is equally applicable to the case where the numbers are of different bit lengths. In such a case, the final output result will not be of length 2n where n is the number of segments in each of the binary numbers, but will instead be a length of p+q where p is the number of segments in one binary number and q is the number of segments in the other binary number, when the numbers are multiplied. In that case the number of mantissa bits required is m> w + lg (min (p. q)).

The present invention can be applied to any processing arrangement which requires mathematical operations to be carried out on large integer numbers. The present invention is particularly suited to modular multiplication. Thus, the present invention has a utility in the field of encryption and decryption, particularly for RSA encryption and decryption.

Although the present invention has been described with reference to embodiments in which pairs of numbers are either multiplied or added, the present invention is also applicable to the operation on more than two multi bit binary numbers. For example, the present invention is applicable to multiply and accumulate operations, i. e. +X * Y + Z. The intermediate products generated for each segment by the multiplication operation can be added in segments to the segments of Z to provide product sums for each segment which then require masking and shifting in accordance with the present invention. The present invention is also applicable to the addition or subtraction of multiplicative pairs of multi bit binary numbers, i. e. X *Y + P * Q +... for r pairs.

In such an embodiment the results of each multiplicative pair comprise intermediate products which are provided for each segment and which must be added together to provide a product sum for each segment. In this embodiment, the number of mantissa bits required for each segment is m > 2w + Ig (n * r). Thus the present invention is applicable to any mathematical operation which can generate product sums for each segment which include carry or borrow bits for a next segment which thus require masking and shifting to generate an output of w bits plus carry or borrow bits.

Although in the embodiments of the present invention, the full partial product array is described, when the present invention is applied to squaring a number, the partial product array is of reduced form due to array symmetry. Thus this offers a further improvement in the speed of processing due to the reduced calculation of intermediate products.

The present invention can be implemented on any suitable processor implementing floating point arithmetic and using floating point registers. The segment size chosen for

segmenting the large integers can be chosen to suit the floating point registers available to the processor.

Although the present invention has been described by way of example to the mathematical operation on binary numbers which are of the same size i. e. the same number of bits and hence the same number of segments n, the present invention is applicable to the mathematical operation on numbers of any size. Thus the number of segments into which the numbers are segmented need not be the same. The process can operate so long as corresponding segments i. e. columns in the mathematical operations are used to determine the mathematical operation products. Where there is no corresponding segment in a second number, the mathematical operation product will simply be the value for the segment of the longer first binary number.

The present invention can be executed by a processor provided with a modified instruction set which includes the single instructions for carrying out the mask and shift operation. Specifically the instructions can comprise a round instruction to round or round and shift a product sum in a floating-point register to a non-zero power of 2 which can be specified in another register or specified in the instruction set (e. g. 2w or 2W '+l)) to yield a carry product stored in a further floating-point register, and a mask instruction to mask a product sum in a floating-point register to the number of bits specified in another floating-point register or in the instruction set to yield a result in a further floating-point register. Alternatively a single combined mask/round instruction can be provided to take a product sum in a floating-point register and a power of two specified in another floating-point register or in the instruction set to yield a carry product stored in a further floating-point register and a result stored in a still further floating-point register. The powers of two are preferably integer powers.

The basic instructions can be: 1. Round to a power of two (e. g. output = round (input, 2')).

2. Round to a power of two and shift/divide by the power of two (e. g. output = round shift (input, 2W)).

3. Mask to a power of two (e. g. output = mask (input, 2W)).

4. A combined round and mask instruction comprising a combination of 1 and 3 or 2and3.

Thus in this embodiment register hold the values to be used by the processor to perform the operations. The provision of specific instructions reduces the processing time in this modified processor.

Although the present invention has been described hereinabove with reference to specific embodiments, it will be apparent to a skilled person in the art that modifications lie within the spirit and scope of the present invention.