Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR EFFICIENT ENCRYPTION
Document Type and Number:
WIPO Patent Application WO/2007/117277
Kind Code:
A2
Abstract:
Methods and apparatuses are provided for accelerating the throughput and or reducing the power consumption of symmetric cryptography algorithms. Certain computations of a symmetric encryption or decryption algorithm are performed during a first phase, the results are saved to memory, and the results are retrieved to encode data during a second phase. If the first phase is implemented while the battery (2) is being charged and the second phase is implemented while the system runs on battery power, the battery life is significantly extended compared to the battery life when all phases are implemented using solely battery power.

Inventors:
LEVENTHAL DAVID H (US)
BIRDSALL WILLIAM M (US)
CURRIE EDWARD (US)
Application Number:
PCT/US2006/041138
Publication Date:
October 18, 2007
Filing Date:
October 20, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SUB CRYPTO SYSTEMS INC (US)
LEVENTHAL DAVID H (US)
BIRDSALL WILLIAM M (US)
CURRIE EDWARD (US)
International Classes:
H04K1/00; H04L9/28; H04K1/04; H04K1/06; H04L9/00; H04L9/30
Foreign References:
US20030103554A1
Attorney, Agent or Firm:
MACPHERSON, Alan, H. (2033 Gateway DriveSuite 40, San Jose CA, US)
Download PDF:
Claims:

1. A method for encrypting or decrypting data comprising: during a first phase, generating a result based on inputs comprising a key and a first input, and storing the result in a memory; and during a second phase, retrieving the result stored in memory, and generating an output based on inputs comprising the retrieved result and a data block.

2. The method of claim 1, wherein the steps of the first phase are executed prior to the second phase.

3. The method of claim 1, wherein the steps of the first phase are executed concurrently with the second phase.

4. The method of claim 1, wherein generating a result based on inputs comprises encrypting the first input using the key.

5. The method of claim 4, wherein: the key is a private key; the data block is a plaintext block; and the output is a ciphertext block.

6. The method of claim 5, wherein encrypting the first input comprises encrypting the first input using a symmetric cryptography algorithm.

7. The method of claim 4, wherein encrypting the first input comprises encrypting the first input using the Advanced Encryption Standard.

8. The method of claim 4, wherein encrypting the first input comprises encrypting the first input using the Data Encryption Standard.

9. The method of claim 4, wherein encrypting the first input comprises encrypting the first input using the Triple Data Encryption Standard.

iO. "" The " method of claim 1, wherein generating an output comprises performing an XOR operation between the bits of the result and the bits of the data block.

11. The method of claim 4, wherein the first input comprises an initialization value and a counter value.

12. A method for encrypting or decrypting data comprising: dividing the data into data blocks; applying the method of claim 11 to each data block; wherein the counter value is varied for each data block.

13. The method of claim 12, wherein the counter value is incremented for each data block.

14. The method of claim 1, wherein retrieving the result stored in memory comprises loading the result stored in memory into a memory buffer, and retrieving the result from the memory buffer.

15. An apparatus for encrypting or decrypting data comprising: an algorithm for, during a first phase, generating a result based on inputs comprising a key and a first input; and a memory for, during the first phase, storing the result generated during the first phase; wherein the result stored during the first phase can be retrieved during a second phase.

16. The apparatus of claim 15, wherein the input comprises an initialization value and a counter value.

17. The apparatus of claim 15, wherein an output is generated based on a data block and the result retrieved during the second phase.

18. "" Tle " apparatus of claim 15, further comprising a battery for supplying power to the apparatus, wherein the first phase comprises a period of time when the battery is being recharged.

19. The apparatus of claim 15, wherein the second phase comprises a period of time when the battery supplies power to the apparatus.

20. The apparatus of claim 15, wherein the algorithm comprises an encryption algorithm according to the Advanced Encryption Standard.

21. The apparatus of claim 20, wherein the algorithm is executed in counter (CTR) mode.

22. The apparatus of claim 20, wherein the algorithm is executed in output feedback (OFB) mode.

23. The apparatus of claim 15, further comprising a sensor for sensing a first period of time during which the apparatus has more computational resources than during a second period of time.

24. The apparatus of claim 23, wherein the apparatus operates in the first phase in response to the sensor sensing the first period.

25. The apparatus of claim 23, wherein the apparatus operates in the second phase in response to the sensor sensing the second period.

26. The apparatus of claim 23, wherein the computational resources comprise electrical power.

27. The apparatus of claim 23, wherein the computational resources comprise computational bandwidth.

28. An apparatus for encrypting or decrypting data comprising: an algorithm for, during a first phase, generating a result based on inputs comprising a key and a first input; and

a memory for, during the first phase, storing the result generated during the first phase; and a means for retrieving during a second phase the result stored during the first phase.

Description:

METHOD AND APPARATUS FOR EFFICIENT ENCRYPTION

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority of U.S. patent application no. 11/281,138, filed November 16, 2005, incorporated herein by reference.

FIELD OF THE INVENTION

[0002] This invention relates to the encryption and decryption of digital data.

BACKGROUND OF THE INVENTION

[0003] Today's information systems feature large storage capacity and network bandwidth. This has increased the need for secured transmission and storage of digital data. Cryptographic techniques including the use of symmetric algorithms have been developed for this purpose, hi a symmetric algorithm, two or more parties of a secure channel use a shared private key to encrypt and decrypt data sent and received over the channel. There are many symmetric encryption algorithms in use today, including Advanced Encryption Standard (AES), and its predecessors Data Encryption Standard (DES) and Triple-DES. For the specification of AES, see Federal Information Processing Standards (FIPS) Publication 197, "Advanced Encryption Standard," the contents of which are herein incorporated by reference.

[0004] One challenge in implementing such cryptographic techniques in general is the great computational power needed to perform encryption and decryption, as measured by the number of required clock cycles. Using conventional microprocessor technology, AES requires approximately 320 clock cycles to encode

a 16-byte block, whereas DES requires as many as 668 clock cycles (1728 clock cycles in Triple-DES). The number of clock cycles directly affects the power consumption and available processor speed of a device. Power consumption is critical for mobile devices, such as cell phones or personal digital assistants (PDA's), that operate with a limited power source. Available processor speed is important in high-speed applications such as high-end servers and high-speed routers, which typically have limited processing resources due to demanding application requirements.

[0005] Speed and power consumption of encryption algorithms are also important in military applications. High bandwidth sensor networks, or video surveillance systems, have a typical transmission rate of 35-40 Mbps due to the combination of high definition video, audio, and control signals. When capturing, transmitting, or recording video data in a military intelligence environment, quality of service is of utmost importance. The signals from surveillance videos receive only a cursory examination in the field. The real work is done back in a lab where the video signals are carefully reviewed with the aid of computer enhancement. It is not reasonable for an embedded class processor to perform the computation necessary to provide secure communication while operating in the field under severe power constraints. Another concern in the military is the total thermal signature of a device in low light conditions. Thermal night vision is often able to detect mobile devices with a large thermal signature, so minimizing the thermal signature is of great importance.

[0006] As described, the issue of computational resources affects all classes of computing systems. Because of their demanding computational requirements,

encryption algorithms for high-performance or low-power environments have traditionally been implemented in dedicated hardware. Changing such algorithms ' thus entails modifying hardware, which is relatively costly compared to modifying software code.

[0007] Much of the effort to reduce the number of clock cycles in video recording and transmission has been focused on selective frame encryption. In this scheme, certain frames or other units of a transmission are selected for encryption, while other frames or units are not encrypted, thus decreasing the amount of actual data to be encrypted. While selective frame encryption saves clock cycles, the scheme is problematic in that it may not be truly secure, as significant portions of data may be left unencrypted.

[0008] It is thus desirable to have a method of robustly encrypting and decrypting data that efficiently utilizes the scarce power and bandwidth resources in today' s mobile, high-performance, and military systems.

[0009] Another problem in networks using symmetric encryption algorithms is that system security may be compromised if one of the nodes is breached, since all nodes in the network share the same key. In a conventional system, the only way to address such a breach would be to change the keys of all the devices. It would thus also be desirable to have a symmetric encryption system wherein breaching the security of one node would not necessarily compromise the security of all nodes.

SUMMARY OF THE INVENTION

[0010] The invention provides methods and apparatuses for accelerating the throughput of and or reducing the power consumption of cryptography algorithms by reducing the number of processor clock cycles required during critical operation. This is accomplished by performing computation-intensive cryptographic operations during periods of time when computational resources are more abundant, while minimizing the computations performed during periods of time when resources are less abundant. In a mobile device, these periods may correspond respectively to when the device is being powered by a battery charger, and when the device is being powered by a battery.

[0011] The theoretical basis for the invention rests on three observations:

1) With some cryptographic algorithms, it is possible to perform certain demanding computations independently of the data to be encrypted or decrypted, and store the results of those computations for retrieval at a later time.

2) The cost of storing computation for later retrieval is relatively low due to the availability of relatively power-efficient and inexpensive

( memory components, such as FLASH memory. 3) The marginal cost of power and computation is lower when there are more abundant resources for power and computation.

[0012] For example, a mobile device according to the present invention operates according to two distinct phases. During Phase I, the device is powered by a battery charger. During this phase, the CPU is operated using the power available from the

battery charger, which is relatively inexpensive, to perform the computationally intensive cryptographic operations. The results of these computations are written to secondary storage. During Phase II, the device can be operating on battery power. During this phase, the results of Phase I are retrieved from secondary storage and used to encrypt or decrypt the data, typically by applying a simple XOR operation.

[0013] By partitioning the computation into these two phases, the algorithm utilizes the charger power to perform the more intensive computations, thus sparing the battery from having to power those same computations later when the charger is unplugged. Pre-computing and storing the results in memory trades hardware

(available memory) for time, since fewer computations are performed during battery- powered operation. As a result, the present invention can perform AES encryption with 64 times fewer clock cycles and Triple DES with 345 times fewer clock cycles during battery-powered operation than a conventional system. This increases battery life by drawing less power during battery-powered operation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] The accompanying drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. The applications disclosed are for illustrative purposes only, and are not meant to restrict the scope of the present invention. Embodiments in accordance with the present invention are relevant to all types of data.

[0015] FIG. IA is a schematic of the apparatus in charging phase (Phase I).

[0016] FIG. IB is a schematic of the apparatus in operating phase (Phase II).

[0017] FIG.2 is a schematic of the apparatus demonstrating the method of reduced processor utilization and the consequent reduction in power consumption and size of thermal signature.

[0018] FIG.3 is a schematic of the apparatus demonstrating the method of accelerating of data throughput when encrypting and decrypting.

[0019] FIG.4 is a schematic of the apparatus demonstrating the method of reduced memory requirements.

[0020] FIG.5 is a schematic of the apparatus demonstrating the method of not storing a private key as a result of using the present invention.

[0021] FIG.6 is a schematic of the apparatus demonstrating the method of handling a data burst.

[0022] FIG.7 is a schematic of the apparatus demonstrating the method of video transmission over wireless.

[0023] FIG.8 is a schematic of the apparatus demonstrating the method of video recording or playback.

[0024] FIG.9 is a schematic of the apparatus demonstrating the method of a mobile video data server.

[0025] FIG.10 shows the methods and apparatus being used in a sensor network configuration.

[0026] FIG.1 1 shows the methods and apparatus being used in a mobile network configuration.

[0027] FIG.12 shows the methods and apparatus being used in a mobile storage configuration.

[0028] FIG.13 shows the methods and apparatus being used in a high-end server configuration.

[0029] FIG.14 shows the methods and apparatus being used in a high-speed router configuration.

[0030] FIG. 14A shows the methods and apparatus being used in a high-speed router and high-end database server configuration.

[0031] FIG.15 shows the methods and apparatus being used to support high-end server logging.

[0032] FIG. 16 shows a 128-bit block cipher counter-mode encryption scheme for encrypting a single 128-bit block of plaintext.

[0033] FIG. 17A shows the operations that are performed in Phase I according to the present invention.

[0034] FIG. 17B shows the operations that are performed in Phase II according to the present invention.

DETAILED DESCRIPTION OF THE PREFERED EMBODIMENTS

[0035] FIG. 16 shows a 128-bit block cipher counter-mode (CTR) encryption scheme for encrypting a single 128-bit block of plaintext. For a description of CTR as well as other modes of encryption, see National Institute of Standards and Technology (NIST) Special Publication 800-38A, "Recommendation for Block Cipher Modes of Operation," the contents of which are herein incorporated by reference. Note that the CTR-mode implementation is described herein for illustrative purposes only. In general, the invention will work with any cipher mode of operation that transforms a block cipher into a stream cipher, including, but not limited to, CTR (counter) mode, OFB (output feedback) mode, and CFB (cipher feedback), as well as with asymmetric encryption algorithms. Also, the present invention can be implemented with keys comprising any number of bits, and is thus not limited to either 128-bit block size or 128-bit key encryption.

[0036] In an embodiment of the invention, the N bits of an arbitrary initialization sequence 11 are appended to the (128 - N) bits of a counter 12 to form a 128-bit

input 13 to the encryption algorithm 14. The initialization sequence 11 may be a fixed sequence, whereas the counter 12 has a value that can be incremented each time the encryption algorithm is performed on a 128-bit block of data. A 128-bit private key 15 is also supplied to the encryption algorithm 14. The encryption algorithm 14 may be, for example, the Advanced Encryption Standard (AES) algorithm adopted by the National Institute of Standards and Technology. Note that a 128-bit key is used only to illustrate a specific embodiment of the invention. Other embodiments may employ keys of arbitrary length, for example, 192 -bit or 256-bit keys.

[0037] From the key 15 and the input 13, the encryption algorithm 14 generates a 128-bit result 16. According to the present invention, the 128-bit result is stored in secondary storage 20. At a later time, the result 16 is retrieved from the secondary storage 20 and XOR' ed bit-by-bit with a 128-bit block of plaintext 17 using the XOR operation 19. The result of the XOR operation is known as ciphertext 18, and is the encrypted version of the 128-bit block of plaintext. Note that while FIG 16 shows an embodiment wherein the key length and the block length are both 128 bits, in general they do not have to be the same number of bits. For example, a 256-bit key can be used to encrypt 128-bit blocks of plaintext.

[0038] To encrypt multiple plaintext blocks of 128 bits, the algorithm shown in Figure 16 may be performed multiple times, with the counter value 12 being incremented for each plaintext block to be encrypted. Incrementing the counter assigns each successive 128-bit plaintext block a distinct counter value 12, thereby also generating a distinct 128-bit stored result 21 to be XOR' ed with each successive block of plaintext.

[0039] Because the successive computations of results 16 do not depend on the content of the successive plaintext blocks 17, the algorithm up to the stored results 21 can be precomputed independently of the plaintext blocks 17 for an arbitrary number of blocks, and stored in secondary storage 20. The computations can thus be cleanly divided into two phases: Phase I and Phase II, with the operation just described up to placing the stored result in secondary storage 20 performed in Phase I, and the subsequent operation to yield ciphertext 18 performed in Phase II.

[0040] Fig 17A illustrates the operations that can be performed in Phase I. As in Fig 16, the value of a counter is input to an encryption algorithm. (Note that the other inputs to the encryption algorithm, including the initialization value and the private key are not shown in Fig 17A, as they can remain constant throughout Phase I.) The computations for the encryption algorithm 14 are performed, and the result 16 is written to a location in the secondary storage addressed by a secondary storage pointer. After the result is written, the secondary storage pointer is incremented, as shown in 22. The counter 12 is also incremented. The operations are then repeated, with successive results 16 being written to successive locations in memory. After a certain number of results have been written, Phase II is ready to commence. Note that the number of results written to memory in Phase I depends on several factors, including how long the device remains connected to a charger, how much the memory can store, and how much data is later to be encrypted.

[0041] Fig 17B illustrates the operations that are performed in Phase II. First, the secondary storage pointer 23 addresses the location of the first result in memory

written during Phase I. The stored result 21 at that location is read from memory, and the value is XOR' ed bit-by-bit with a first block of plaintext 17 to produce a first block of ciphertext 18. Then, the secondary storage pointer is incremented, and the next stored result is read from memory. The operations are repeated for each block of plaintext 17, thus continuously generating blocks of ciphertext 18. hi this way, the only computations performed during Phase II are memory reads and XOR operations. This substantially lessens the computational burden on the processor during Phase II.

[0042] Note that the embodiment described in Fig 16 can also be used in a decryption system. In a decryption system, the plaintext 16 in Fig 16 is replaced by ciphertext, and the ciphertext 18 is replaced by plaintext. In other words, when ciphertext is XOR' ed with the stored result 21, the plaintext is generated.

[0043] FIGS. IA and IB further show how Phase I and Phase II can be divided according to an embodiment of the present invention in a mobile device. FIG. IA shows Phase I, during which the apparatus is connected to a charger. During this phase, power supply 1 is charging battery 2 and powering the processing unit simultaneously. Battery 2 has a quantifiable charging period defined by the physical properties of Battery 2 and the processing unit is able to operate for that time period. Power is supplied to the CPU 3, Physical Memory 4, and Secondary Storage 9.

During this time period the apparatus operates a standard symmetric cryptography algorithm in keyed stream mode, as described in conjunction with FIG. 16, and stores the result in Secondary Storage 9. The code for the symmetric algorithm 5 and the Private Key 8 are stored and accessed in Physical Memory 4.

[0044] FIG. IB shows the apparatus in Phase II, or operating phase. Battery 2 is supplying power to the Processing Unit. Power is supplied to the CPU 3, Physical Memory 4, and Secondary Storage 9. CPU 3 is able to encrypt and decrypt data by reading pre-computation data from Secondary Storage 9 and XOR' ing it with a plaintext or cipher text stream, as previously described in FIG 16. There is no resident symmetric cryptography code 5 or Private Key 8 in Physical Memory 4, since all the operations requiring that information have already been performed in Phase I.

[0045] From the above description of the preferred embodiments, it can be seen that Phase I uses power from the charger to pre-compute results that are written to secondary storage. While the unit is charging, a secondary storage device such as a hard drive or flash memory drive is written with the pre-computation results. In this manner the processor performs the majority of the computation involved in the algorithm while being connected to a power source.

[0046] hi contrast, phase 2 performs only relatively simple computations. When operating on battery power, data is combined with the retrieved results using a simple XOR instruction. Typically, a 32-bit microprocessor can perform an XOR operation on 4 bytes in 1 clock cycle. In one embodiment, a 16-byte block can be encrypted or decrypted using the apparatus by executing 1 clock cycle to retrieve the stored results, and 4 clock cycles for the XOR operation, thus requiring only 5 clock cycles during battery-powered operation to encrypt a 128-bit data block. In contrast, as noted earlier, typical conventional algorithms operating on battery power would require 320 clock cycles to encrypt a same-size data block using AES, 668 clock

cycles using DES, and 1728 clock cycles using Triple-DES. Note the numbers of clock cycles cited here are merely to illustrate the potential benefits of the invention, and are not meant to limit the invention. Other processor configurations may require fewer or more clock cycles.

[0047] As an example of the operations performed, Appendix A of this patent application provides psuedocode for an implementation of Phase I and Phase II according to an embodiment of the present invention. Appendix B provides source code in C-language for an implementation of Phase I and Phase II according to an embodiment of the present invention. These examples are intended only to serve as illustrations of embodiments of the present invention, and are not meant to limit the scope of the present invention.

[0048] Since the invention executes fewer cycles during Phase II, it also consumes less precious battery power. The use of secondary storage in the apparatus draws a minimal amount of power, especially, with the use of FLASH memory technology.

[0049] Note the throughput of the invention is directly related to the transfer rate of the secondary storage. With present-day storage system technology having sustained transfer rates of up to lOOMB/Sec, the apparatus can accelerate conventional symmetric algorithms by up to 80 times their normal operational speed.

[0050] The apparatus can be configured depending on the data communication needs as well as the amount of available storage. For example, if the device does not remain on a charger long enough to perform and save pre-computations for all the

data to be encrypted or decrypted, the device can always switch to a conventional mode of operation during Phase II once the stored pre-computation results have been exhausted. This can be accomplished by providing software code on the apparatus that notes how many pre-computation results have been stored in memory during Phase I, detecting when the results have been exhausted in Phase II, and switching the apparatus to conventional operation thereafter. The same applies if there is not enough memory available to store all the pre-computation results corresponding to a large number of plaintext blocks.

[0051] To further minimize the power consumption, one can buffer the data and switch the secondary storage device into an idle, or low-power, state. Specifically, some number of pre-computed results from the secondary storage device can be first transferred to an arbitrarily sized RAM buffer, whereupon the storage device can enter low power states when not being used. This saves power because operating a RAM buffer consumes less power than operating a secondary storage device. Note the rate at which the key stream is buffered by the apparatus is variable, and can be adjusted based on the amount of physical memory.

[0052] Note that when the apparatus is capturing data (such as real time video) and storing it to memory, then little additional power is required to implement the present invention. This is because the ' memory needs to be powered on anyway for the data to be recorded to memory, and thus there is no overhead power expended in keeping the memory on for retrieving the stored results. Furthermore, the same secondary storage that stores the results of pre-computations can be used to store the captured data. This is because, once a stored result has been retrieved from secondary storage

and used to encrypt the captured data, then the captured data can be written to secondary storage at the location occupied by the stored result already retrieved. Thus in this configuration, no extra memory is required.

[0053] Note also that the software code required for the encryption algorithm is only used during Phase I, so that it can be stored in secondary storage the rest of the time. Since there is no need to load the software code for encryption into physical memory during Phase II, the invention can operate during Phase II using just a few bytes of software code in physical memory, namely, the software code required to implement the XOR operation. Standard symmetric algorithms have a predefined physical memory requirement of 50 to 100 times this size.

[0054] Another advantage of the present invention is that, like the encryption code, the private key need not be loaded into physical memory during Phase II. Since all computations using the key are performed during Phase I, the key can be discarded after the completion of Phase I. Thus if the memory contents of the device were somehow breached when the device is operating in Phase II, the intruder would not have direct access to the private key. Rather, the only way to ascertain the private key would be to examine the stored results in the secondary storage, and this task is tantamount to breaking the encryption algorithm. This makes the present invention inherently secure, as it is based entirely on the security of whatever encryption algorithm is used to generate the key stream. For additional security, a new private key can be used every time the device re-enters Phase I.

[0055] Note that Phase I and Phase II need not be restricted to the times when a device is charging and operating from a battery, respectively. The invention can be applied whenever computational or power resources are known to be more abundant during one time period than during another. For example, software code may detect when a processor is relatively idle, for example by examining when the processor is consuming a small portion of its computational bandwidth, and initiate the execution of the Phase I operations during that period. The results can be stored for later retrieval during Phase II, when processor resources are less abundant. Thus, rather than saving power, the invention would allow the processor to run faster during Phase II.

[0056J Furthermore, note that Phase I and Phase II need not be mutually exclusive in time. In one embodiment of the invention, a stand-alone or dedicated processor can be dedicated to continually performing Phase I operations, and save the results of those operations to secondary storage. Meanwhile, another system component with more limited power or computational resources can be continuously or intermittently operating in Phase II, by retrieving the results saved to secondary storage by the dedicated processor. FIG 14A describes an embodiment in a high-speed router that utilizes this aspect of the invention.

[0057] Further illustrations of the preferred embodiment and the associated advantages will be described herein. FIG.2 shows the apparatus in operating phase (Phase II) and the method of reduced power consumption. Battery 2 is supplying low power to the Processing Unit. Power is supplied to the CPU 3, Physical Memory 4, and Secondary Storage 9. CPU 3 is able to encrypt and decrypt data by reading pre-

computation data from Secondary Storage 9 and executing 64 to 345 times fewer clock cycles. The result of this is a reduction in processor utilization, power consumption, and size of thermal signature.

[0058] FIG.3 shows the apparatus in operating phase (Phase II) and demonstrating the method of acceleration of data throughput rate. Battery 2 is supplying high power to the Processing Unit. Power is supplied to the CPU 3, Physical Memory 4, and Secondary Storage 9. CPU 3 is able to encrypt and decrypt data at a higher throughput by reading pre-computation data from Secondary Storage 9 and using 50 to 100 times less clock cycles during Phase II to encode the same amount of data, therefore achieving a higher throughput rate.

[0059] FIG.4 shows the apparatus in operating phase (Phase II) and demonstrates the method of reduced memory requirements. Physical Memory 4 contains resident code that can be as small as a few bytes due to the fact that it is just executing XOR instructions in a loop.

[0060] FIG.5 shows the apparatus in operating phase (Phase II) and demonstrates the method of not storing private key. There is no Private Key 8 in Physical Memory 4 because the Private Key 8 is not necessary for the computations performed in Phase II.

[0061] FIG.6 shows the apparatus in operating phase (Phase II) and demonstrates the method of handling data burst rates. Since CPU 3 is underutilized it is capable of

handling burst data rates. The maximum data rate is determined by the maximum data rate of Secondary Storage 9.

[0062] FIG.7 shows the apparatus in operating phase (Phase II) and demonstrates the method of video transmission over wireless. Here Video CODEC 12 captures video, the Processing Unit encrypts the digital data, and it is transmitted from Network Adapter 13.

[0063] FIG.8 shows the apparatus in operating phase (Phase II) and demonstrates the method of video recording. Here Video CODEC 12 captures video, the Processing Unit encrypts the digital data, and it is recorded to Secondary Storage 9. Here the same secondary storage device can be used to both store the pre-computation data, and the encrypted video, due to the fact that once the pre-computation data is read, it can be replaced with the video data.

[0064] FIG.9 shows the apparatus in operating phase (Phase II) and demonstrates the method of a mobile video data server. Here data can be read or written over Network Adapter 13 and use minimal power or achieve a higher throughput rate than would normally be possible.

[0065] FIG.10 shows the apparatus being used in a sensor network configuration. Each Video Surveillance Sensor 20 transmits encrypted video to one or more Base Station Devices (30, 31, 32, 33). These devices range from notebook computers down to handhelds. High definition video is possible due to 54Mbps bandwidth over wireless.

[0066] FIG.l 1 shows the apparatus being used in a mobile network configuration. Each device capable of capturing video (21, 22, 23, 24, 25, 26) transmits encrypted video to one or more Mobile Devices (30, 31, 32, 33). High definition video is possible due to 54Mbps bandwidth over wireless.

[0067] FIG.12 shows the apparatus being used in a mobile storage configuration. Each Mobile Device (30, 31, 32, 33) is capable of downloading data to the Mobile Storage Device 40. Here the Mobile Storage Device 40 is able to operate at a low level of power consumption due to the invention.

[0068] FIG.13 shows the apparatus being used in a high-end server configuration. Each High-end Server Device (50, 51, 52) is capable of writing data to the Mobile Storage Device Pairs 41 and 42. Data burst rates (higher than possible with standard symmetric cryptography) as well as potentially lower power consumption are possible due to the invention.

[0069] FIG.14 shows the apparatus being used in a high-speed router configuration. The high-speed router 70 sends, and receives, data over high-speed network links 61. The router uses pre-computed data in secondary storage 41 (which is calculated during idle time) to handle data burst rates and reduce processor utilization during Phase II.

[0070] FIG.15 shows the apparatus being used in a high-end server configuration to support logging of massive amounts of data. The high-end server 50 processes data

over high-speed network links 61. The server uses pre-computed data in secondary storage 42 (which is calculated during idle time) to enable high-speed logging in secondary storage 41 and reduce processor utilization during Phase II.

[0071] Reference will now be made in detail to various applications of the invention, examples of which are illustrated in the accompanying drawings. The applications disclosed are for illustrative purposes only, and are not meant to restrict the scope of the present invention. Embodiments in accordance with the present invention are relevant to all types of data.

[0072] One application of the invention is to a sensor network (FIG 10). In this configuration many small sensors capture digital video and transmit back to one or more base stations. These base stations range in size, and power, from notebooks to handheld devices.

[0073] Another application of the invention is to a mobile network (FIG 11). In this configuration, one or more mobile devices capture digital video and transmit encrypted video to one, or more, other mobile device or a mobile data server (FIG 12). These computers range in size and power from notebooks to handheld devices.

[0074] Another application of the invention is to a video recorder or player. (FIG 8). In this configuration the invention can be used to encrypt and decrypt video written or read from storage.

[0075] Yet another application of the invention is to a mobile data server (FIG 12). In this configuration one or more mobile devices connect to a mobile data storage unit and upload or download digital video at a high throughput rate.

[0076] Another application of the invention is to a high-end server (FIG 13). In this configuration the invention can be used to increase throughput rates in order to handle burst data. It can also be used to provide high-speed encryption for logging allowing the server to log everything (FIG 15).

[0077] Another application of the invention is to a high-speed router (FIG 14). In this configuration the invention can be used to increase throughput rates in order to handle burst data traffic. It can also be used to provide high-speed encryption for logging allowing the router to log everything.

[0078] FIG 14A shows an application of the invention to a high-speed router. In this embodiment, Stream Servers 1 through N compute the Phase I operations, and save the results to secondary storage (not shown). Meanwhile, the router 141 and the database server 142 perform the Phase II operations by retrieving the saved results from secondary storage as needed. For easy access, the results can be retrieved from secondary storage using the Internet Protocol (IP) via a switch 143.

[0079] While certain embodiments have been described above, other embodiments will be obvious in view of the above description to those skilled in the art. For example, the invention will work with encryption methods such as AES, DES, or Triple-DES, in which a block cipher can be transformed into a stream cipher

in certain modes of operation such as CTR (counter) mode, OFB (output feedback) mode, and CFB (cipher feedback) mode. The invention will also apply to asymmetric encryption algorithms. Also, the present invention can be implemented with keys or blocks comprising any number of bits. Therefore, it should be understood that the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description above is not intended to be exhaustive or to limit the invention to the precise form disclosed. It should be understood that the invention can be practiced with modification and alteration and that the invention be limited only by the claims and the equivalents thereof.

Appendix A. Pseudocode for an embodiment of the present invention SetKeyAES-CTR ( )

Key setup for the Advanced Encryption Standard (AES) in Counter Mode (AES- CTR) according to Federal Information Processing Standards (FIPS) Publication 197. For an implementation of this function in C programming language, see function aes_set_key in Appendix B.

EncryptAES-CTR ( )

Encrypts given buffer using the Advanced Encryption Standard (AES) in Counter Mode (AES-CTR) according to FIPS Publication 197. For an implementation of this function in C programming language, see function AES-CTR in Appendix B.

Pseudo-Code for Phase I (while running on AC) Function PhaseOne

Variable Buffer

SetKeyAES-CTR O

While ( FreeSpaceOnSecondaryStorage ( ) AND BatteryConnected ( ) )

ZeroBuffer ( Buffer ) EncryptAES-CTR ( Buffer ) WriteToSecondaryStorage ( Buffer )

EndWhile EndFimction

Pseudo-Code for Phase II (while running on battery) Function PBCEncode ( Bufferl, Buffer2 )

Variable BlockNumber

Variable PlaintextBlock, CiphertextBlock, PrecomputedBlock

BlockNumber = O

While ( BlockNumber < NumberOfBlocks ( Bufferl ) )

PlaintextBlock = Bufferl [BlockNumber] PrecomputedBlock = Buffer2 [BlockNumber]

CiphertextBlock = PlaintextBlock XOR PrecomputedBlock Bufferl [BlockNumber] = CiphertextBlock BlockNumber = BlockNumber + 1 EndWhile EndFunctioπ

Function PBCDecode ( Bufferl, Buffer2 )

Variable BlockNumber

Variable PlaintextBlock, CiphertextBlock, PrecomputedBlock

BlockNumber = 0

While ( BlockNumber < NumberOfBlocks ( Bufferl ) )

CiphertextBlock = Bufferl [BlockNumber] PrecomputedBlock = Buffer2 [BlockNumber]

PlaintextBlock = CiphertextBlock XOR PrecomputedBlock Bufferl [BlockNumber] = PlaintextBlock BlockNumber = BlockNumber + 1 EndWhile EndFunction

Function PhaseTwoEncrypt

Variable Bufferl, Buffer2 While ( PlaintextAvailable ( ) )

ReadPlaintext ( Bufferl ) ReadFromSecondaryStorage ( Buffer2 ) PBCEncode ( Bufferl , Buffer2 ) WriteCiphertext ( Bufferl )

EndWhile EndFunction

Function PhaseTwoDecrypt

Variable Bufferl, Buffer2 While (CiphertextAvailable ( ) )

ReadCiphertext ( Bufferl ) ReadFromSecondaryStorage ( Buffer2 ) PBCEncode ( Bufferl , Buffer2 ) WritePlaintext ( Bufferl )

EndWhile EndFunction

Appendix B. Source code in C programming language for an embodiment of the present invention

/* * * Copyright (C) 2005, Sub-Crypto Systems, LLC, All Rights Reserved. * * */

****%:* 7

#indude "aes.h"

#define BUFFER_SIZE 1024

void PBC(unsigned char *PlaintextBuffer, unsigned char *PrecompBuffer) { unsigned int i, *PlaintextPtr, *PrecompPtr, NumBlocks;

PlaintextPtr = (unsigned int *) PlaintextBuffer; PrecompPtr = (unsigned int *) PrecompBuffer;

// each block is 4 bytes

NumBlocks = BUFFER_SIZE / sizeof(unsigned int);

// XOR plaintext and pre-computed data for(i = 0; i < NumBlocks; ++i) {

*PlaintextPtr++ λ = *PrecomρPtr++;

void AES-CTR(unsigned char *PrecomρBuffer) { unsigned int i, NumBlocks; unsigned char *PrecompPtr;

hugeint Counter; PrecompPtr = PrecompBuffer;

// each block is 16 bytes NumBlocks = BUFFER__SIZE / 16;

// start counter at nonce (number used once) value Counter = NONCE; for(i = 0; i < NumBlocks; ++i) {

CounterPtr = (unsigned char *)Counter;

// encrypt counter block aes_encrypt(&aes, CounterPtr, CounterPtr);

// copy resulting block into buffer memcpy(ProcompPtr, CounterPtr, 16);

// increment counter Counter++;

// move pointer to next block PrecompPtr += 16;

}

void PhaseOne(FILE *PrecompFile, unsigned char *Key) { unsigned char PrecompBuffer[BUFFER_SIZE];

// set key for aes aes_set_key(&aes, Key, 256);

while(InPhaseOneO) {

// zero the buffer memset(PrecompBuffer, OxOO, BUFFER_SIZE);

// fill buffer with AES-CTR output AES-CTR(PrecompBuffer);

// write buffer to secondary storage _write(PrecomρFile, PrecompBuffer, BUFFER_SIZE);

void PhaseTwo(FILE *PrecompFile, unsigned char *PlaintextBuffer) { unsigned char PrecompBuffer[BUFFER_SIZE] ;

// read pre-computed data from secondary storage _read(File, PrecompBuffer, BUFFER_SIZE);

// perform XOR between plaintext and pre-computed data PBC(PlaintextBuffer, PrecompBuffer);

}

File: aes.c

/*

* FIPS-197 compliant AES implementation

*

* Copyright (C) 2001-2004 Christophe Devine

*

* This program is free software; you can redistribute it and/or modify

* it under the terms of the GNU General Public License as published by

* the Free Software Foundation; either version 2 of the License, or

* (at your option) any later version. *

* This program is distributed in the hope that it will be useful,

* but WITHOUT ANY WARRANTY; without even the implied warranty of

* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

* GNU General Public License for more details. *

* You should have received a copy of the GNU General Public License

* along with this program; if not, write to the Free Software

* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

*/

#include "aes.h"

/* uncomment the following line to run the test suite */

/* #define TEST */

/* uncomment the following line to use pre-computed tables */ /* otherwise the tables will be generated at the first run */

/* #define FIXED TABLES */

#ifndef FIXED_TABLES /* forward S-box & tables */ uint32 FSb[256]; uint32 FT0[256]; uint32 FTl[256]; uint32 FT2[256]; uint32 FT3[256];

/* reverse S-box & tables */ uint32 RSb[256]; uint32 RT0[256]; uint32 RT1[256]; uint32 RT2[256]; uint32 RT3[256];

/* round constants */ uint32 RCON[IO];

/* tables generation flag */ int do_init = 1;

/* tables generation routine */

#define ROTR8(x) ( ( ( x « 24 ) & OxFFFFFFFF ) | \ ( ( x & OxFFFFFFFF ) » 8 ) )

#define XTIME(x) ( ( x « 1 ) λ ( ( x & 0x80 ) ? OxIB : OxOO ) ) #define MUL(x,y) ( ( x && y ) ? pow[(log[x] + log[y]) % 255] : 0 ) void aes_gen_tables( void )

{ int i; uint8 x, y; uint8 pow[256]; uint8 log[256];

/* compute pow and log tables over GF(2 λ 8) */ for( i = 0, x = 1; i < 256; i++, x λ = XTIME( x ) )

{

ρow[i] = x; log[x] = i;

}

/* calculate the round constants */ for( i = 0, x = 1; i < 10; i++, x = XTIME( x ) )

{ RCONp] = (uint32) x « 24;

}

/* generate the forward and reverse S-boxes */

FSb[OxOO] = 0x63; RSb[0x63] = OxOO; for( i = 1; i < 256; i++ )

{ x = pow[255 - log[i]];

FSb[i] = x; RSb[X] = i;

}

/* generate the forward and reverse tables */ for( i = 0; i < 256; i++ )

{ x = (unsigned char) FSb[i]; y = XTIME( x );

FT0[i] = (uint32) ( x λ y ) ( (uint32) x « 8 ) ( (uint32) x « 16 ) ( (uint32) y « 24 );

FTOfi] &= OXFFFFFFFF;

FTl [i] = ROTR8( FT0[i] ); FT2[i] = ROTR8( FTl[i] ); FT3[i] = ROTR8( FT2[i] );

y = (unsigned char) RSb[i];

RT0[i] = ( (uint32) MUL( OxOB, y ) ) λ ( (uint32) MUL( OxOD, y ) « 8 ) λ ( (uint32) MUL( 0x09, y ) « 16 ) λ ( (uint32) MUL( OxOE, y ) « 24 );

RT0[i] &= OXFFFFFFFF;

RTl[i] = ROTR8( RT0[i] ); RT2[i] = ROTR8( RTlfi] ); RT3[i] = ROTR8( RT2[i] );

}

}

#else

/* forward S-box */ static const uint32 FSb[256] =

{

0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, OxOl, 0x67, 0x2B, OxFE, 0xD7, OxAB, 0x76, OxCA, 0x82, 0xC9, 0x7D, OxFA, 0x59, 0x47, OxFO, OxAD, 0xD4, 0xA2, OxAF, 0x9C, 0xA4, 0x72, OxCO, 0xB7, OxFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, OxCC, 0x34, 0xA5, 0xE5, OxFl 5 0x71, 0xD8, 0x31, 0x15, 0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, OxEB, 0x27, 0xB2, 0x75, 0x09, 0x83, 0x2C, OxIA, OxIB, 0x6E, 0x5A, OxAO, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84, 0x53, OxDl, OxOO, OxED, 0x20, OxFC, OxBl, 0x5B, 0x6A, OxCB, OxBE, 0x39, 0x4A, 0x4C, 0x58, OxCF, OxDO, OxEF, OxAA, OxFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8, 0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, OxBC, 0xB6, OxDA, 0x21, 0x10, OxFF, 0xF3, 0xD2, OxCD, OxOC, 0x13, OxEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73, 0x60, 0x81, 0x4F, OxDC, 0x22, 0x2A, 0x90, 0x88, 0x46, OxEE, 0xB8, 0x14, OxDE, 0x5E, OxOB, OxDB, OxEO, 0x32, 0x3A, OxOA, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, OxAC, 0x62, 0x91, 0x95, 0xE4, 0x79, 0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, OxEA, 0x65, 0x7A, OxAE, 0x08,

OxBA, 0x78, 0x25, 0x2E, OxIC, 0xA6, 0xB4, 0xC6, 0xE8, OxDD, 0x74, OxIF, 0x4B, OxBD, 0x8B, 0x8A, 0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, OxOE, 0x61, 0x35, 0x57, 0xB9, 0x86, OxCl, OxID, 0x9E, OxEl, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, OxIE, 0x87, 0xE9, OxCE, 0x55, 0x28, OxDF, 0x8C, OxAl, 0x89, OxOD, OxBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, OxOF, OxBO, 0x54, OxBB, 0x16

};

/* forward tables */

#define FT \

\

V(C6,63,63,A5), V(F8,7C,7C,84), V(EE,77,77,99), V(F6,7B,7B,8D), \ V(FF,F2,F2,0D), V(D6,6B,6B,BD), V(DE,6F,6F,B1), V(91,C5,C5,54), \ V(60,30,30,5O), V(02,01,01,03), V(CE,67,67,A9), V(56,2B,2B,7D), \ V(E7,FE,FE,19), V(B5,D7,D7,62), V(4D,AB,AB,E6), V(EC,76,76,9A), \ V(8F,CA,CA,45), V(1F,82,82,9D), V(89,C9,C9,40), V(FA,7D,7D,87), \ V(EF,FA,FA,15), V(B2,59,59,EB), V(8E,47,47,C9), V(FB,F0,F0,0B), \ V(41,AD,AD,EC), V(B3,D4,D4,67), V(5F,A2,A2,FD), V(45,AF,AF,EA), \ V(23,9C,9C,BF), V(53,A4,A4,F7), V(E4,72,72,96), V(9B,C0,C0,5B), \ V(75,B7,B7,C2), V(El 5 FD 5 FD, 1C), V(3D,93,93 5 AE), V(4C,26,26,6A), \ V(6C,36,36,5A), V(7E,3F,3F,41), V(F5,F7,F7,02), V(83,CC,CC,4F), \ V(68,34,34,5C), V(51,A5,A5,F4), V(D1,E5,E5,34), V(F9,F1,F1,O8), \ V(E2,71,71,93), V(AB,D8,D8,73), V(62,31,31,53), V(2A,15,15,3F), \ V(08,04,04,0C), V(95,C7,C7,52) 5 V(46,23,23,65), V(9D,C3,C3,5E), \ V(30,18,18,28), V(37,96,96,A1), V(0A,05,05,0F), V(2F,9A,9A,B5), \ V(0E,07,07,09), V(24,12,12,36), V(1B,8O,8O,9B), V(DF,E2,E2,3D), \ V(CD,EB,EB,26), V(4E,27,27,69), V(7F,B2,B2,CD), V(EA,75,75,9F), \ V(12,O9,O9,1B), V(1D,83,83,9E), V(58,2C,2C,74), V(34,1A,1A,2E), \ V(36,1B,1B,2D), V(DC,6E,6E,B2), V(B4,5A 5 5A,EE), V(5B,A0,A0,FB), \ V(A4,52,52,F6), V(76,3B,3B,4D), V(B7,D6,D6,61), V(7D,B3,B3,CE), \ V(52,29,29,7B), V(DD,E3,E3,3E), V(5E,2F,2F,71), V(13,84,84,97), \ V(A6,53,53,F5), V(B9,D1,D1,68), V(00,00,00,00), V(C1,ED,ED,2C), \ V(40,20,20,60), V(E3,FC,FC,1F), V(79,B1,B1,C8), V(B6,5B,5B,ED), \ V(D4,6A,6A,BE), V(8D,CB,CB,46), V(67,BE,BE,D9), V(72,39,39,4B), \ V(94,4A,4A,DE), V(98,4C,4C,D4), V(B0,58,58,E8), V(85,CF,CF,4A), \ V(BB,D0,D0,6B), V(C5,EF,EF,2A), V(4F,AA,AA,E5), V^D,FB,FB,16), \ V(86,43,43,C5), V(9A,4D,4D,D7), V(66,33,33,55), V(11,85,85,94), \ V(8A,45,45,CF), V(E9,F9,F9,10), V(04,02,02,06), V(FE,7F,7F,81), \ V(A0,50,50,F0), V(78,3C,3C,44), V(25,9F,9F,BA), V(4B,A8,A8,E3), \ V(A2,51,51,F3), V(5D,A3,A3,FE), V(80,40,40,C0), V(05,8F,8F,8A), \ V(3F,92,92,AD), V(21,9D,9D,BC), V(70,38,38,48), V(Fl,F5,F5,04), \ V(63,BC,BC,DF) 5 V(77,B6,B6,C1), V(AF,DA,DA,75), V(42,21,21,63), \ V(20,10,10,30), V(E5,FF,FF,1A), V(FD,F3,F3,0E), V(BF,D2,D2,6D), \

V(81,CD,CD,4C), V(18,0C,0C,14), V(26,13,13,35), V(C3,EC,EC,2F), \ V(BE,5F,5F,E1), V(35,97,97,A2), V(88,44,44,CC), V(2E,17,17,39), \ V(93,C4,C4,57), V(55 A7,A7,F2), V(FC,7E,7E,82), V(7A,3D,3D,47), \ V(C8,64,64,AC), V(BA,5D,5D,E7), V(32,19,19,2B), V(E6,73,73,95), \ V(C0,60,60,A0), V(19,81,81,98), V(9E,4F,4F,D1), V(A3,DC,DC,7F), \ V(44,22,22,66), V(54,2A,2A,7E), V(3B,90,90,AB), V(0B,88,88,83), \ V(8C,46,46,CA), V(C7,EE,EE,29), V(6B,B8,B8,D3), V(28,14,14,3C), \ V(A7,DE,DE,79), V(BC,5E,5E,E2), V(16,OB,OB,1D), V(AD,DB,DB,76), \ V(DB 5 E0,E0,3B), V(64,32,32,56) 5 V(74,3A,3A,4E), V(14,OA,OA J 1E), \ V(92,49,49,DB), V(0C,06,06,0A), V(48,24,24,6C), V(B8,5C,5C,E4), \ V(9F,C2,C2,5D), V(BD,D3,D3,6E), V(43 ,AC, AQEF), V(C4,62,62,A6), \ V(39,91,91,A8), V(31,95,95, A4), V(D3,E4,E4,37), V(F2,79,79,8B), \ V(D5,E7,E7,32), V(8B,C8,C8,43), V(6E,37,37,59), V(DA,6D,6D,B7), \ V(01,8D,8D,8C), V(B1,D5,D5,64), V(9C,4E,4E,D2), V(49,A9,A9,E0), \ V(D8,6C,6C,B4), V(AC,56,56,FA), V(F3,F4,F4,07), V(CF,EA,EA,25), \ V(CA,65,65,AF), V(F4,7A,7A,8E), V(47,AE,AE,E9), V(10,08,08,18), \ V(6F,BA,BA,D5), V(F0,78,78,88), V(4A,25,25,6F), V(5C,2E,2E,72), \ V(38,1C,1C,24), V(57,A6,A6,F1), V(73,B4,B4,C7), V(97,C6,C6,51), \ V(CB,E8,E8,23), V(A1,DD,DD,7C), V(E8,74,74,9C), V(3E,1F,1F,21), \ V(96,4B,4B,DD), V(61,BD,BD,DC), V(0D,8B,8B,86), V(0F,8A,8A,85), \ V(E0,70,70,90), V(7C,3E,3E,42), V(71,B5,B5,C4), V(CC,66,66,AA), \ V(90,48,48,D8), V(06,03,03,05), V(F7,F6,F6,01), V(1C,OE,OE,12), \ V(C2,61,61,A3), V(6A,35,35,5F), V(AE,57,57,F9), V(69,B9,B9,D0), \ V(17,86,86,91), V(99,C1,C1,58), V(3A,1D,1D,27), V(27,9E,9E,B9), \ V(D9,E1,E1,38), V(EB,F8,F8,13), V(2B,98,98,B3), V(22,ll,ll,33), \ V(D2,69,69,BB), V(A9,D9,D9,70), V(07,8E,8E,89), V(33,94,94,A7), \ V(2D,9B,9B,B6), V(3C,1E,1E,22), V(15 ,87,87,92), V(C9,E9,E9,20), \ V(87,CE,CE,49), V(AA,55,55,FF), V(50,28,28,78), V(A5,DF,DF,7A), \ V(03,8C,8C,8F), V(59,A1,A1,F8), V(09,89,89,80), V(1A,OD,OD,17), \ V(65,BF,BF,DA), V(D7,E6,E6,31), V(84,42,42,C6), V(D0,68,68,B8), \ V(82,41,41,C3), V(29,99,99,B0), V(5A,2D,2D,77), V(1E,OF,OF,11), \ V(7B,B0,B0,CB), V(A8,54,54,FC), V(6D,BB,BB,D6), V(2C,16,16,3A)

#define V(a,b,c,d) Ox##a##b##c##d static const uint32 FT0[256] = { FT }; #undef V

#define V(a,b,c,d) Ox##d##a##b##c static const uint32 FT1[256] = { FT }; #undef V

#define V(a,b,c,d) Ox##c##d##a##b static const uint32 FT2[256] = { FT }; #undef V

#define V(a,b,c,d) Ox##b##c##d##a

static const uint32 FT3[256] = { FT }; #undef V

#undef FT

/* reverse S-box */ static const uint32 RSb[256] =

{

0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38, OxBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, OxFB, 0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, OxFF, 0x87, 0x34, 0x8E, 0x43, 0x44, 0xC4, OxDE, 0xE9, OxCB, 0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D, OxEE, 0x4C, 0x95, OxOB, 0x42, OxFA, 0xC3, 0x4E, 0x08, 0x2E, OxAl, 0x66, 0x28, 0xD9, 0x24, 0xB2, 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, OxDl, 0x25, 0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xD4, 0xA4, 0x5C, OxCC, 0x5D, 0x65, 0xB6, 0x92, 0x6C, 0x70, 0x48, 0x50, OxFD, OxED, 0xB9, OxDA, 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84, 0x90, 0xD8, OxAB, 0x00, 0x8C, OxBC, 0xD3, OxOA, 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06, OxDO, 0x2C, OxIE, 0x8F, OxCA, 0x3F, OxOF, 0x02, OxCl, OxAF, OxBD, 0x03, 0x01, 0x13, 0x8A, 0x6B, 0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, OxDC, OxEA, 0x97, 0xF2, OxCF, OxCE, OxFO, 0xB4, 0xE6, 0x73, 0x96, OxAC, 0x74, 0x22, 0xE7, OxAD, 0x35, 0x85, 0xE2, 0xF9, 0x37, 0xE8, OxIC, 0x75, OxDF, 0x6E, 0x47, OxFl, OxIA, 0x71, OxID, 0x29, 0xC5, 0x89, 0x6F, 0xB7, 0x62, OxOE, OxAA, 0x18, OxBE, OxIB, OxFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20, 0x9A, OxDB, OxCO, OxFE, 0x78, OxCD, 0x5A, 0xF4, OxIF, OxDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31, OxBl, 0x12, 0x10, 0x59, 0x27, 0x80, OxEC, 0x5F, 0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, OxOD, 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, OxEF, OxAO, OxEO, 0x3B, 0x4D, OxAE, 0x2A, 0xF5, OxBO, 0xC8, OxEB, OxBB, 0x3C, 0x83, 0x53, 0x99, 0x61, 0x17, 0x2B, 0x04, 0x7E, OxBA, 0x77, 0xD6, 0x26, OxEl, 0x69, 0x14, 0x63, 0x55, 0x21, OxOC, 0x7D

};

/* reverse tables */ #define RT \

\

V(51,F4,A7,50), V(7E,41,65,53), V(1A,17,A4,C3), V(3A,27,5E,96), \ V(3B,AB,6B,CB), V(1F,9D,45,F1), V(AC,FA,58,AB), V(4B,E3,03,93), \ V(20,30,FA,55), V(AD,76,6D,F6), V(88,CC,76,91), V(F5,02,4C,25), \ V(4F,E5,D7,FC), V(C5,2A,CB,D7), V(26,35,44,80), V(B5,62,A3,8F), \ V(DE,B1,5A,49), V(25,BA 5 1B,67), V(45,EA,0E,98), V(5D,FE,CO,E1), \ V(C3,2F,75,02), V(81,4C,F0,12), V(8D,46,97,A3), V(6B,D3,F9,C6), \ V(03,8F,5F,E7), V(15,92,9C,95), V(BF,6D,7A,EB), V(95,52,59,DA), \ V(D4,BE,83,2D), V(58,74,21,D3), V(49,E0,69,29), V(8E,C9,C8,44), \ V(75,C2,89,6A), V(F4,8E,79,78), V(99,58,3E,6B), V(27,B9,71,DD), \ V(BE,E1,4F,B6), V(F0,88,AD,17), V(C9,20,AC,66), V(7D,CE,3A,B4), \ V(63,DF,4A,18), V(E5,1A,31,82), V(97 5 51,33,60), V(62,53,7F,45), \ V(B1,64,77,EO), V(BB,6B,AE,84), V(FE,81,AO,1C), V(F9,08,2B,94), \ V(70,48,68,58), V(8F,45,FD,19), V(94,DE,6C,87), V(52,7B,F8,B7), \ V(AB,73,D3,23), V(72,4B,02,E2), V(E3,1F,8F,57), V(66,55,AB,2A), \ V(B2,EB,28,07), V(2F,B5,C2,03), V(86,C5,7B,9A), V(D3,37,08,A5), \ V(30,28,87,F2), V(23,BF,A5,B2), V(02,03,6A,BA), V(ED,16,82,5C), \ V(8A,CF,1C,2B), V(A7,79,B4,92), V(F3,07,F2,F0), V(4E,69,E2,A1), \ V(65,DA,F4,CD), V(06,05,BE,D5), V(D1,34,62,1F), V(C4,A6,FE,8A), \ V(34,2E,53,9D), V(A2,F3,55,A0), V(O5,8A,E1,32), V(A4,F6,EB,75), \ V(0B,83,EC,39), V(40,60,EF,AA), V(5E,71,9F,06), V(BD,6E,10,51), \ V(3E,21,8A,F9), V(96,DD,06,3D), V(DD,3E,05,AE), V(4D,E6,BD,46), \ V(91,54,8D,B5), V(71,C4,5D,05), V(04,06,D4,6F), V(60,50,15,FF), \ V(19,98,FB,24), V(D6,BD,E9,97), Y(89,40,43,CC), V(67,D9,9E,77), \ V(B0,E8,42,BD), V(07,89,8B,88), V(E7,19,5B,38), V(79,C8,EE,DB), \ V(A1,7C,OA,47), V(7C,42,0F,E9), V(F8,84,1E,C9), V(00,00,00,00), \ V(09,80,86,83), V(32,2B,ED,48), V(IE, 11,70,AC), V(6C,5A,72,4E), \ V(FD,0E,FF,FB), V(0F,85,38,56), V(3D,AE,D5,1E), V(36,2D,39,27), \ V(0A,0F,D9,64), V(68,5C,A6,21), V(9B,5B,54,D1), V(24,36,2E,3A), \ V(0C,0A,67,Bl), V(93,57,E7,0F), V(B4,EE,96,D2), V(1B,9B,91,9E), \ V(80,C0,C5,4F), V(61,DC,20,A2), V(5A,77,4B,69), V(1C,12,1A,16), \ V(E2,93,BA,0A), V(C0,A0,2A,E5), V(3C,22,E0,43), V(12,1B,17,1D), \ V(0E,09,0D,0B), V(F2,8B,C7,AD), V(2D,B6,A8,B9), V(14,1E,A9,C8), \ V(57,F1,19,85), V(AF,75,07,4C), V(EE,99,DD,BB), V(A3,7F,60,FD), \ V(F7,01,26,9F), V(5C,72,F5,BC), V(44,66,3B,C5), V(5B,FB,7E,34), \ V(8B,43,29,76), V(CB,23,C6,DC), V(B6,ED,FC,68), V(B8,E4,F1,63), \ V(D7,31,DC,CA), V(42,63,85,10), V(13,97,22,40), V(84,C6,ll,20), \ V(85,4A,24,7D), V(D2,BB,3D,F8), V(AE,F9,32,11), V(C7,29,A1,6D), \ V(1D,9E,2F,4B), V(DC,B2,30,F3), V(0D,86,52,EC), V(77,C1,E3,DO), \ V(2B,B3,16,6C), V(A9,70,B9,99), V(11,94,48,FA), V(47,E9,64,22), \ V(A8,FC,8C,C4), V(AO,FO,3F,1A), V(56,7D,2C,D8), V(22,33,90,EF), \ V(87,49,4E,C7), V(D9,38,D1,C1), V(8C,CA,A2,FE), V(98,D4,0B,36), \ V(A6 5 F5,81,CF), V(A5,7A,DE,28), V(DA,B7,8E,26), V(3F,AD,BF,A4), \ V(2C,3A,9D,E4), V(50,78,92,0D), V(6A,5F,CC,9B), V(54,7E,46,62), \ V(F6,8D,13,C2), V(90,D8,B8,E8), V(2E,39,F7,5E), V(82,C3,AF,F5), \ V(9F,5D,80,BE), V(69,D0,93,7C), V(6F,D5,2D,A9), V(CF,25,12,B3), \

V(C8,AC,99,3B), V(10,18,7D,A7), V(E8,9C,63,6E), V(DB,3B,BB,7B), \ V(CD,26,78,09), V(6E,59,18,F4), V(EC,9A,B7,01), V(83,4F,9A,A8), \ V(E6,95,6E,65), V(AA,FF,E6,7E), V(21,BC,CF,08), V(EF,15,E8,E6), \ V(BA,E7,9B,D9), V(4A,6F,36,CE), V(EA,9F,09,D4), V(29,B0,7C,D6), \ V(31,A4,B2,AF), V(2A,3F,23,31), V(C6,A5,94,30), V(35,A2,66,C0), \ V(74,4E,BC,37), V(FC,82,CA,A6), V(EO,90,D0,BO), V(33,A7,D8,15), \ V(F1,O4,98,4A), V(41,EC,DA,F7), V(7F,CD,50,0E), V(17,91,F6,2F), \ V(76,4D,D6,8D), V(43,EF,B0,4D), V(CC,AA,4D,54), V(E4,96,04,DF), \ V(9E,D1,B5,E3), V(4C,6A,88,1B), V(C1,2C,1F,B8), V(46,65,51,7F), \ V(9D,5E,EA,04), V(01,8C,35,5D), V(FA,87,74,73), V(FB,0B,41,2E), \ V(B3,67,1D,5A), V(92,DB,D2,52), V(E9,10,56,33), V(6D,D6,47,13), \ V(9A,D7,61,8C), V(37λ1,OC,7A), V(59,F8,14,8E), V(EB,13,3C,89), \ V(CE,A9,27,EE), V(B7,61,C9,35), V(E1,1C,E5,ED), V(7A,47,B1,3C), \ V(9C,D2,DF,59), V(55,F2,73,3F), V(18,14,CE,79), V(73,C7,37,BF), \ V(53,F7,CD,EA), V(5F,FD,AA,5B), V(DF,3D,6F,14), V(78,44,DB,86), \ V(CA,AF,F3,81), V(B9,68,C4,3E), V(38,24,34,2C), V(C2,A3,40,5F), \ V(16,1D,C3,72), V(BC,E2,25,0C), V(28,3C,49,8B), V(FF,0D,95,41), \ V(39,A8,01,71), V(08,0C,B3,DE), V(D8,B4,E4,9C), V(64,56,C1,9O), \ V(7B,CB,84,61), V(D5,32,B6,70), V(48,6C,5C,74), V(D0,B8,57,42)

#define V(a,b,c,d) Ox##a##b##c##d static const uint32 RT0[256] = { RT }; #undef V

#define V(a,b,c,d) Ox##d##a##b##c static const uint32 RTl [256] = { RT }; #undef V

#define V(a,b,c,d) Ox##c##d##a##b static const uint32 RT2[256] = { RT }; #undef V

^define V(a,b 5 c,d) Ox##b##c##d##a static const uint32 RT3[256] = { RT }; #undef V

#undef RT

/* round constants */ static const uint32 RCON[IO] =

{ 0x01000000, 0x02000000,0x04000000,0x08000000,

0x10000000, 0x20000000, 0x40000000, 0x80000000, OxlBOOOOOO, 0x36000000

};

int do_init = 0; void aes_gen_tables( void )

{ }

#endif

/* platform-independant 32-bit integer manipulation macros */

#define GET_UINT32(n,b,i) \

{ \

(n) = ((uint32)(b)[(i) ]«24) \

|((uint32)(b)[(i) + l]«16) \ |((uint32)(b)[(i) + 2]« 8) \ I ( (uint32) (b)[(i) + 3] ); \

}

#define PUT_UINT32(n,b,i) \

{ \

(b)[(i) ] = (uint8)((n)»24); \

(b)[(i) + l] = (uint8)((n)»16); \ (b)[(i) + 2] = (uint8)((n)» 8); \ (b)[(i) + 3] = (uint8)((n) ); \

}

/* decryption key schedule tables */ int KT_init = 1; uint32 KT0[256]; uint32 KT1[256]; uint32 KT2[256]; uint32 KT3[256];

/* AES key scheduling routine */ int aes_set_key( aes_context *ctx, uint8 *key, int nbits )

{ inti; uint32 *RK, *SK; if( do_init )

{ aes_gen_tables();

do_init = 0; } switch( nbits )

{ case 128: ctx->nr = 10; break; case 192: ctx->nr = 12; break; case 256: ctx->nr = 14; break; default : return( 1 );

}

RK = ctx->erk; for( i = 0; ϊ < (nbits » 5); i++ )

{ GET_UINT32( RK[i], key, i * 4 );

}

/* setup encryption round keys */ switch( nbits )

{ case 128: for( i = 0; i < 10; i++, RK += 4 )

{ RK[4] = RK[O] λ RCON[i] λ

( FSb[ (uintδ) ( RK[3] » 16 ) ] « 24 ) ( FSb[ (uint8) ( RK[3] » 8 ) ] « 16 ) " ( FSb[ (uintβ) ( RK[3] ) ] « 8 ) λ ( FSb[ (uint8) ( RK[3] » 24 ) ] );

RK[5] = RK[I] λ RK[4]; RK[6] = RK[2] λ RK[5]; RK[7] = RK[3] λ RK[6];

} break; case 192: for( i = 0; i < 8; i++, RK += 6 )

{ RK[6] = RK[O] λ RCON[i] λ

( FSb[ (uintδ) ( RK[5] » 16 ) ] « 24 ) λ ( FSb[ (uintδ) ( RK[5] » 8 ) ] « 16 ) λ

( FSb[ (uintδ) ( RK[5] ) ] « 8 ) λ ( FSb[ (uintδ) ( RK[5] » 24 ) ] );

RK[7] = RK[I] λ RK[6]; RK[8] = RK[2] λ RK[7]; RK[9] = RK[3] λ RK[S]; RK[IO] = RK[4] λ RK[9]; RK[Il] = RK[5] λ RK[IO];

} break; case 256: for( i = 0; i < 7; i++, RK += 8 )

{ RK[8] = RK[O] λ RCON[i] λ

( FSb[ (uint8) ( RK[7] » 16 ) ] « 24 ) λ ( FSb[ (uintδ) ( RK[7] » 8 ) ] « 16 ) λ ( FSb[ (uintδ) ( RK[7] ) ] « 8 ) λ ( FSb[ (uintδ) ( RK[7] » 24 ) ] );

RK[9] = RK[I] λ RK[8]; RK[IO] = RK[2] λ RK[9]; RK[Il] = RK[3] λ RK[IO];

RK[12] = RK[4] λ

( FSb[ (uint8) ( RK[Il] » 24 ) ] « 24 ) ' ( FSb[ (uint8) ( RK[Il] » 16 ) ] « 16 ) ' ( FSb[ (uint8) ( RK[H] » 8 ) ] « 8 ) λ ( FSb[ (uint8) ( RK[Il] ) ] );

RK[13] = RK[5] λ RK[12]; RK[14] = RK[6] λ RK[13]; RK[15] = RK[7] λ RK[14];

} break;

}

/* setup decryption round keys */ if( KT_init )

{ for( i = 0; i < 256; i++ )

{ KT0[i] = RT0[ FSb[i] ];

KTl[i] = RT1[ FSb[i] ];

KT2[i] = RT2[ FSb[i] ]; KT3[i] = RT3[ FSb[i] ];

}

KTJnit = 0; }

SK = ctx->drk;

*SK++ = *RK++; *SK++ = *RK++; *SK++ = *RK++; *SK++ = *RK++; for( i = 1; i < ctx->nr; i++ )

{ RK -= 8;

*SK++ = KTOf (uintδ) ( *RK » 24 ) ] ' KT1[ (uintδ) ( *RK » 16 ) ] λ KT2[ (uint8) ( *RK » 8 ) ] λ KT3[ (uint8) ( *RK ) ]; RK++;

*SK++ = KT0[ (uintδ) ( *RK » 24 ) ] ' KT1[ (uintδ) ( *RK » 16 ) ] λ KT2[ (uintδ) ( *RK » 8 ) ] λ KT3[ (uintδ) ( *RK ) ]; RK++;

*SK++ = KT0[ (uint8) ( *RK » 24 ) ] ' KTl [ (uintδ) ( *RK » 16 ) ] λ KT2[ (uint8) ( *RK » 8 ) ] λ KT3[ (uint8) ( *RK ) ]; RK++;

*SK++ = KT0[ (uintδ) ( *RK » 24 ) ] λ KT1[ (uintδ) ( *RK » 16 ) ] λ KT2[ (uintδ) ( *RK » δ ) ] λ KT3[ (uintδ) ( *RK ) ]; RK++;

}

RK -= δ;

*SK++ = *RK++; *SK++ = *RK++; *SK++ = *RK++; *SK++ = *RK++;

return( 0 );

}

/* AES 128-bit block encryption routine */ void aes_encrypt( aes_context *ctx, uint8 input[16], uintδ output[16] )

{ uint32 *RK, XO, Xl, X2, X3, YO, Yl, Y2, Y3;

RK = ctx->erk;

GET_UINT32( XO, input, 0 ); XO λ = RK[O]; GET_UINT32( Xl, input, 4 ); Xl λ = RK[I]; GET_UINT32( X2, input, 8 ); X2 λ = RK[2]; GET_UINT32( X3, input, 12 ); X3 λ = RK[3];

#define AES_FROUND(XO,X1,X2,X3,YO,Y1,Y2,Y3) \

{ \

RK += 4; \

\ XO = RK[O] λ FT0[ (uint8) ( YO » 24 ) ] λ \

FT1[ (uintδ) ( Yl » 16 ) ] λ \

FT2[(uint8)(Y2» 8)] λ \

FT3[(uint8)(Y3 )]; \

\ Xl = RK[I] λ FT0[ (uintδ) ( Yl » 24 ) ] λ \

FT1[ (uint8) ( Y2 » 16 ) ] λ \

FT2[ (uintδ) ( Y3 » 8 ) ] λ \

FT3[ (uintδ) (YO )]; \

\ X2 = RK[2] λ FT0[ (uintδ) ( Y2 » 24 ) ] λ \

FTl [ (uintδ) (Y3» 16) ] λ \

FT2[ (uintδ) ( YO » δ)] λ \

FT3[ (uintδ) (Yl )]; \

\ X3 = RK[3] λ FT0[ (uintδ) ( Y3 » 24 ) ] λ \

FTl[(uintδ)(Y0»16)] λ \

FT2[ (uintδ) ( Yl » δ)] λ \

FT3[ (uintδ) (Y2 )]; \

}

AES_FROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 1 */ AES_FROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 2 */ AES_FROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 3 */ AES_FROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 4 */ AES_FROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 5 */

AES_FROUND( XO, Xl 5 X2, X3, YO, Yl, Y2, Y3 ); /* round 6 •/

AES_FROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 7 */

AES_FROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 8 */

AES_FROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 9 */ if( ctx->nr > 10 )

{ AES_FROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 10 */

AES_FROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 11 */ } if( ctx->nr > 12 )

{ AES_FROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 12 */

AES_FROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 13 */ }

/* last round */ RK += 4;

XO = RK[O] λ ( FSb[ (uintδ) ( YO » 24 ) ] « 24 ) λ ( FSb[ (uintδ) ( Yl » 16 ) ] « 16 ) λ ( FSb[ (uintδ) ( Y2 » 8 ) ] « 8 ) λ ( FSb[ (uint8) ( Y3 ) ] );

Xl = RK[I] λ ( FSb[ (uint8) ( Yl » 24 ) ] « 24 ) λ ( FSb[ (uint8) ( Y2 » 16 ) ] « 16 ) λ ( FSb[ (uintδ) ( Y3 » 8 ) ] « 8 ) λ ( FSb[ (uint8) ( Y0 ) ] );

X2 = RK[2] λ ( FSb[ (uint8) ( Y2 » 24 ) ] « 24 ) λ ( FSb[ (uint8) ( Y3 » 16 ) ] « 16 ) λ ( FSb[ (uint8) ( YO » 8 ) ] « 8 ) λ ( FSb[ (uint8) ( Yl ) ] );

X3 = RK[3] λ ( FSb[ (uint8) ( Y3 » 24 ) ] « 24 ) λ ( FSb[ (uint8) ( YO » 16 ) ] « 16 ) λ ( FSb[ (uint8) ( Yl » 8 ) ] « 8 ) λ ( FSb[ (uint8) ( Y2 ) ] );

PUT_UINT32( XO, output, 0 ); PUT_UINT32( Xl, output, 4 ); PUT_UINT32( X2, output, 8 ); PUT_UINT32( X3, output, 12 );

}

/* AES 128-bit block decryption routine */ void aes_decrypt( aes_context *ctx, uintδ input[16], uint8 output[16] )

{ uint32 *RK, XO, Xl, X2, X3, YO, Yl, Y2, Y3;

RK = ctx->drk;

GET_UINT32( XO, input, 0 ); XO λ = RK[O]; GET_UINT32( Xl, input, 4 ); Xl λ = RK[I]; GET_UINT32( X2, input, 8 ); X2 λ = RK[2]; GET_UINT32( X3, input, 12 ); X3 λ = RK[3];

#define AES_RROUND(XO,X1,X2,X3,YO,Y1,Y2,Y3) \

{ \

RK += 4; \

\ XO = RK[O] λ RT0[ (uintδ) ( YO » 24 ) ] \

RT1[ (uint8) ( Y3 » 16 ) ] λ \

RT2[(uint8)(Y2» 8)] λ \

RT3[(uint8)(Yl )]; \

\ Xl = RK[I] λ RT0[ (uint8) ( Yl » 24 ) ]

RT1[ (uint8) ( YO » 16 ) ] λ \

RT2[(uint8)(Y3» 8)] λ \

RT3[(uint8)(Y2 )]; \

\ X2 = RK[2] λ RT0[ (uint8) ( Y2 » 24 ) ]

RTl[(uint8)(Yl»16)] λ \

RT2[(uint8)(Y0» 8)] λ \

RT3[(uint8)(Y3 )]; \

\ X3 = RK[3] λ RT0[ (uint8) ( Y3 » 24 ) ] '

RT1[ (uint8) ( Y2 » 16 ) ] λ \

RT2[(uint8)(Yl» 8)] λ \

RT3[(uint8)(Y0 )]; \

}

AES_RROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 1 */ AES_RROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 2 */ AES_RROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 3 */ AES_RROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); AES_RROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 5 */ AES_RROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 6 */ AES_RROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 7 */

AES_RROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 8 */ AES_RROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 9 */ if( ctx->nr > 10 )

{ AES_RROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /• round 10 •/

AES_RROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 11 */ } if( ctx->nr > 12 )

{ AES_RROUND( XO, Xl, X2, X3, YO, Yl, Y2, Y3 ); /* round 12 */

AES_RROUND( YO, Yl, Y2, Y3, XO, Xl, X2, X3 ); /* round 13 */ }

/* last round */ RK += 4;

XO = RK[O] λ ( RSb[ (uintδ) ( YO » 24 ) ] « 24 ) λ ( RSb[ (uintδ) ( Y3 » 16 ) ] « 16 ) λ ( RSb[ (uintδ) ( Y2 » 8 ) ] « 8 ) λ ( RSb[ (uint8) (Yl ) ] );

Xl = RK[I] λ ( RSb[ (uint8) ( Yl » 24 ) ] « 24 ) λ ( RSb[ (uintδ) ( YO » 16 ) ] « 16 ) λ ( RSb[ (uintδ) ( Y3 » 8 ) ] « 8 ) λ ( RSb[ (uint8) ( Y2 ) ] );

X2 = RK[2] λ ( RSb[ (uint8) ( Y2 » 24 ) ] « 24 ) λ ( RSb[ (uint8) ( Yl » 16 ) ] « 16 ) λ ( RSb[ (uintδ) ( YO » 8 ) ] « δ ) λ ( RSb[ (uintδ) ( Y3 ) ] );

X3 = RK[3] λ ( RSb[ (uintδ) ( Y3 » 24 ) ] « 24 ) λ ( RSb[ (uintδ) ( Y2 » 16 ) ] « 16 ) λ ( RSb[ (uintδ) ( Yl » δ ) ] « 8 ) λ ( RSb[ (uintδ) ( YO ) ] );

PUT_UINT32( XO, output, 0 ); PUT_UINT32( Xl, output, 4 ); PUT_UINT32( X2, output, δ ); PUT_UINT32( X3, output, 12 ); }

#ifdefTEST

#include <string.h> #iriclude <stdio.h>

/*

* Rijndael Monte Carlo Test: ECB mode

* source: NIST - rijndael-vals.zip

*/ static unsigned char AES_enc_test[3][16] =

{

{ OxAO, 0x43, 0x77, OxAB, 0xE2, 0x59, OxBO, OxDO,

0xB5, OxBA, 0x2D, 0x40, 0xA5, OxOl, 0x97, OxIB }, { 0x4E, 0x46, 0xF8, 0xC5, 0x09, 0x2B, 0x29, 0xE2,

0x9A, 0x97, OxIA, OxOC, OxDl, 0xF6, 0x10, OxFB }, { OxIF, 0x67, 0x63, OxDF, 0x80, 0x7A, 0x7E, 0x70,

0x96, OxOD, 0x4C, 0xD3, OxIl, 0x8E, 0x60, OxIA }

>; static unsigned char AES_dec_test[3][16] =

{

{ 0xF5, OxBF, 0x8B, 0x37, 0x13, 0x6F, 0x2E, OxIF,

0x6B, OxEC, 0x6F, 0x57, 0x20, 0x21, 0xE3, OxBA }, { OxFl, 0xA8, OxIB, 0x68, 0xF6, 0xE5, 0xA6, 0x27,

OxIA, 0x8C, 0xB2, 0x4E, 0x7D, 0x94, 0x91, OxEF }, { 0x4D, OxEO, 0xC6, OxDF, 0x7C, OxBl, 0x69, 0x72,

0x84, 0x60, 0x4D, 0x60, 0x27, OxIB, 0xC5, 0x9A }

}; int main( void )

{ int m, n, i, j; aes_context ctx; unsigned char buf[16]; unsigned char key[32]; for( m = 0; m < 2; m++ )

{ printf( "\n Rijndael Monte Carlo Test (ECB mode) - " ); if( m == 0 ) printf( "encryption\n\n" ); if( m == 1 ) printf( "decryption\n\n" ); for( n = 0; n < 3; n++ )

{ printf( " Test %d, key size = %3d bits: ",

n + 1, 128 + n * 64 ); fflush( stdout ); memset( buf, 0, 16 ); memset( key, 0, 16 + n * 8 ); for( i = 0; i < 400; i++ )

{ aes_set_key( &ctx, key, 128 + n * 64 ); for(j = 0;j<9999;j++)

{ if( m == 0 ) aes_encrypt( &ctx, buf, buf ); if( m == 1 ) aes_decrypt( &ctx, buf, buf);

} if(n>0)

{ for(j = 0;j<(n«3);j++)

{ key[j] λ = buf[j + 16 - (n « 3)];

} } if( m == 0 ) aes_encrypt( &ctx, buf, buf ); if( m == 1 ) aes_decrypt( &ctx, buf, buf ); for(j = 0;j<16;j++)

{ key[j+(n«3)] λ =buf[j];

} } if( ( m == 0 && memcmp( buf, AES_enc_test[n], 16 ) != 0 ) || ( m == 1 && memcmp( buf, AES_dec_test[n], 16 ) != 0 ) )

{ printf( "failed!\n" ); return( 1 ); } printf( "passed.\n" );

}

} printf( "\n" );

return( 0 ); }

#endif

File: aes.h

#ifndef_AES_H

#define _AES_H

#ifndefuint8

#define uint8 unsigned char

#endif

#ifndef uint32

#define uint32 unsigned long int

#endif typedef struct

{ uint32 erk[64]; /* encryption round keys */ uint32 drk[64]; /* decryption round keys */ int nr; /* number of rounds */

} aes_context; int aes_set_key( aes_context *ctx, uintδ *key, int nbits ); void aes_encrypt( aes_context *ctx, uintδ input[16], uintδ output[16] ); void aes_decrypt( aes_context *ctx, uint8 input[16], uintδ output[16] );

#endif /* aes.h */