Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD AND SYSTEM FOR STORING DATA AND ACCESSING DATA
Document Type and Number:
WIPO Patent Application WO/2019/123346
Kind Code:
A1
Abstract:
A method of storing one or more files on a storage system in an untrusted environment. The storage system has at least two servers in electronic communication with each other and at least one electronic user device capable of data communication with the at least two servers over a data network. The method is executed by a one or more processors associated with the at least one electronic user device. The method involves dividing each file of the one or more files into a plurality of data blocks, obfuscating each file such that each file comprises a plurality of data blocks and dummy blocks, assigning each data block and each dummy block an identifier, creating an index of the identifiers associated with the obfuscated file, and transmitting the obfuscated file and the index for storage in the storage system.

Inventors:
RUSSELLO GIOVANNI (NZ)
ASGHAR MUHAMMAD RIZWAN (NZ)
CUI SHUJIE (NZ)
GALBRAITH STEVEN (NZ)
Application Number:
PCT/IB2018/060392
Publication Date:
June 27, 2019
Filing Date:
December 20, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AUCKLAND UNISERVICES LTD (NZ)
International Classes:
G06F21/60; G06F12/00
Foreign References:
US8649521B22014-02-11
US20160277185A12016-09-22
US20140013112A12014-01-09
US20140150120A12014-05-29
Other References:
CUI, S. ET AL.: "Secure and Practical Searchable Encryption: A Position Paper", AUSTRALASIAN CONFERENCE ON INFORMATION SECURITY AND PRIVACY ACISP 2017, vol. 10342, no. 558, Chapt 14, 31 May 2017 (2017-05-31), pages 266 - 281, XP047415575
Attorney, Agent or Firm:
AJ PARK (NZ)
Download PDF:
Claims:
CLAIMS

1. A method of storing one or more files on a storage system in an untrusted environment, the storage system comprising at least two servers in electronic communication with each other, at least one electronic user device capable of data communication with the at least two servers over a data network, the method executed by one or more processors associated with the at least one electronic user device comprising the steps of:

dividing each file of the one or more files into a plurality of data blocks, obfuscating each file such that each file comprises a plurality of data blocks and dummy blocks,

assigning each data block and each dummy block an identifier, creating an index of the identifiers associated with the obfuscated file, transmitting the obfuscated file and the index for storage in the storage system.

2. A method of storing one or more files according to claim 1 wherein the step of obfuscating a file comprises the additional steps of:

generating one or more dummy blocks, and

interspersing the dummy blocks with the data blocks.

3. A method of storing one or more files according to claim 1 wherein the or each data block and each dummy block is randomly assigned an identifier.

4. A method of storing one or more files according to any one of claims 1 to 3 wherein the index comprises information associated with file including the identifier of the first data block associated with the file.

5. A method of storing one or more files according to claim 4 wherein the index for a file comprises the identifiers for the data blocks associated with the data blocks corresponding to the file and at least some dummy blocks in order to conceal the size of the file.

6. A method of storing one or more files according any one of the preceding claims wherein the step of transmitting comprises transmitting the obfuscated file as a series of data blocks and dummy blocks.

7. A method of storing one or more files according to any one of the preceding claims wherein the method comprises the step of a rranging the data blocks and dummy blocks based on their identifier.

8. A method of storing one or more files according to claim 7 wherein the identifier is a numeral and, the data blocks and dummy blocks are arranged in ascending on descending order based on the identifier of the data block and dummy block.

9. A method of storing one or more files according to any one of the preceding claims wherein the method comprises the step of encrypting each data block and each dummy block prior to the step of transmitting the obfuscated file and the index.

10. A method of storing one or more files according to any one of the preceding claims wherein the method comprises the step of encrypting the index prior to the step of transmitting the obfuscated file and the index.

11. A method of storing one or more files according to claim 9 and 10 wherein the step of transmitting the obfuscated file and index comprises transmitting the encrypted block and the encrypted index.

12. A method of storing one or more files according to claim 9 or claim 10 wherein each data block, dummy block and/or index are encrypted using one or more nonces, wherein at least the index is encrypted using a nonce that is unique and separate to a nonce used to encrypt each data block and/or dummy block.

13. A method of storing one or more files according to claim 9 wherein at least each data block is encrypted using a two-layer encryption.

14. A method of storing one or more files according to claim 13 wherein the first layer of encryption is such that can be decrypted by an authorised user and the second layer of encryption can be decrypted by at least one server.

15. A method of accessing one or more files from a storage system in an untrusted environment, the storage system comprising at least one storage server, at least one shuffling server for shuffling data, wherein the storage server and shuffling server are arranged in communication with each other, at least one electronic user device capable of data communication with the storage server over a data network, the method executed by a one or more processors associated with the at least one storage server comprising the steps of:

receiving a request to access the one or more files, the request comprising a file name,

identifying one or more data blocks based associated with the file name, transmitting the one or more data blocks in combination with dummy blocks and/or data blocks associated with other files, such that the requested file can be recreated based on the transmitted one or more data blocks.

16. A method of accessing one or more files according to claim 15 wherein the step of identifying one or more data blocks comprises the steps of:

determining an identifier from an index of identifiers corresponding to the file provided in the request, and

the one or more data blocks a re identified based on the identifier.

17. A method of accessing one or more files according to claim 15 or claim 16 comprising the additional step of ra ndomly adding dummy blocks and/or data blocks associated with other files prior to the step of transmitting .

18. A method of accessing one or more files according to claim 17 wherein the step of adding dummy blocks and/or data blocks associated with other files obfuscates the size of the data or identity of blocks or number of blocks associated with a file defined in the request.

19. A method of accessing one or more files according to any one of cla ims 15 to 18 wherein the received request comprises an encrypted file name or the request is encrypted, and the method comprises the additional step of decrypting the file name or decrypting the request to recognise the file name.

20. A method of accessing one or more files according to any one of cla ims 15 to 19 comprising the additional step of transmitting a set of blocks and a plurality of identifiers to a shuffling server for shuffling by the shuffling server, and wherein the blocks in the set of blocks are each associated with an identifier of the plurality of identifiers.

21. A method of accessing one or more files according to claim 20 wherein the blocks and/or the identifiers are randomly shuffled.

22. A method of accessing one or more files according to claim 20 or claim 21 wherein shuffling the blocks comprises adding additional dummy blocks such that a different number of blocks is received by the storage server as compared to the number of blocks transmitted to the shuffling server.

23. A method of accessing one or more files according to claim 20 or claim 21 comprising the steps of:

receiving the shuffled blocks and identifiers, and

storing the received blocks and identifiers.

24. A method of accessing one or more files according to claim 16 comprising the step of updating an index of blocks within the server and an index of identifiers within the storage server.

25. A method of accessing one or more files according to claim 15 wherein the request to access the one or more files comprises a nonce in addition to the file name.

26. A method accessing one or more files according to any one of claims 15 to 25 wherein the storage server and shuffling server are separate to each other.

27. A method of accessing one or more files according to any one of claims 15 to 26 comprising the additional steps of: receiving one or more certificates comprising information corresponding to the one or more files defined in the request and wherein the identifier is determined based on the one or more certificates.

28. A method of accessing one or more files from a storage system in an untrusted environment, the storage system comprising at least one storage server, at least one shuffling server configured to shuffle blocks, at least one electronic user device capable of data communication with the storage server and/or the shuffling server over a data network, the method executed by one or more processors associated with the at least one user device comprising the steps of:

transmitting a request to access the one or more files, the request comprising a file name and a nonce, wherein the nonce is provided to the shuffling server and the file name is provided to the storage server,

receiving one or more data blocks in combination with dummy blocks and/or data blocks associated with other files not defined in the request from the storage server, and recreating the one or more files defined in the request based on the received data blocks.

29. A method of accessing one or more files according to claim 28 wherein request comprises one or more encrypted file names corresponding to the one or more files defined in the request and at least one associated nonces.

30. A method of accessing one or more files according to claim 29 wherein the at least one nonce is used to encrypt the one or more file names prior to transmitting the request to access one or more files.

31. A method of accessing one or more files according to any one of claims 28 to 30 wherein the step of recreating the one or more files comprises decrypting only the data blocks associated with the one or more files and not decrypting the dummy blocks or blocks associated with other files not defined in the request.

32. A method of accessing one or more files according to claim 31 wherein the step of recreating the one or more files comprises the additional step of arranging the blocks in an order defined within the blocks, wherein arranging the blocks results in recreation of the one or more files.

33. A method of obfuscating data in a storage system within an untrusted environment, the storage system comprising at least one storage server and at least one shuffling server configured for communication with each other over a data network, the method executed by one or more processors associated with the at least one shuffling server comprising the steps of:

receiving a set of blocks comprising one or more blocks,

rearranging the one or more blocks within the set of blocks, and transmitting the rearranged set of blocks comprising the one or more rearranged blocks.

34. A method of obfuscating data in a storage system according to claim 33 comprising the additional steps of:

receiving an index comprising one or more identifiers associated with the one or more blocks,

rearranging the one or more identifiers within the index, and transmitting a rearranged index comprising the one or more rearranged identifiers.

35. A method of obfuscating data in a storage system according to claim 33 or claim 34 comprising the additional steps of: processing the received set of blocks to discern blocks that are related to each other or associated with a file, and the step of rearranging comprises rearranging the location or identity of one or more related blocks.

36. A method of obfuscating data in a storage system according to any one of claims 33 to 35 wherein a relationship between one or more blocks associated with a file remains unchanged through the obfuscation process.

37. A method of obfuscating data in a storage system according to any one of claims 33 to 36 wherein the step of rearranging comprising the additional step of: adding one or more additional dummy blocks within the set of blocks prior to transmitting the rearranged set of blocks.

38. A method of obfuscating data in a storage system according to any one of claims

33 to 37 comprising the additional step of re-encrypting the rearranged set of blocks and/or rearranged index prior to the step of transmitting.

Description:
A METHOD AND SYSTEM FOR STORING DATA AND ACCESSING DATA

FIELD OF THE INVENTION

The invention relates to a method and system for storing data and accessing data, in particular but not exclusively to a method and system for storing data and accessing one or more files from an untrusted or unsecured environment.

BACKGROUND

Outsourcing data to cloud services such as for example Dropbox or Box is becoming very popular and quite commonly used for transferring data e.g. files or sets of files. These tools are used to store data and access data. However, there is often a possibility that the data owner can lose control over the data, since these cloud service providers are generally exposed to the public i.e. may be located in untrusted environments.

If the cloud service provider is hacked, the attacker may fairly easily access the data that is stored. Typical encryption solutions can only protect data as long as cloud service providers does not have access to the encryption key. With traditional encryption the data is "useless" i.e. no meaningful computation can be performed on the data while encrypted with traditional encryption scheme.

In both commercial and academic areas, there exist a large number of searchable encryption solutions that support encrypted search over encrypted file storage. These proposed approaches can often still leak sensitive information, including search access and size patterns, file size patterns. This leakage of information can make these approaches vulnerable to leakage based attacks such as for example, count, file injection or communication attacks, where attackers may be able to recover content of queries and files from the patterns of leakage.

SUMMARY OF THE INVENTION

It is an object of at least some embodiments of the invention to provide a method and system for storing and accessing data, or at least provide the public with a useful alternative. Other objects of the invention may become apparent from the following description.

In one aspect, the invention broadly comprises a data storage system, for example a cloud based data storage system, that reduces leakage of data and is substantially resistant against leakage based attacks. In some configurations, the data storage system may also support file sharing and/or multi user access and/or search capabilities.

In one aspect, the invention broadly comprises a method of storing one or more files on a storage system in an untrusted environment, the storage system comprising at least two servers in electronic communication with each other, at least one electronic user device capable of data communication with the at least two servers over a data network, the method executed by a one or more processors associated with the at least one electronic user device comprising the steps of:

dividing each file of the one or more files into a plurality of data blocks, obfuscating each file such that each file comprises a plurality of data blocks and dummy blocks,

assigning each data block and each dummy block an identifier,

creating an index of the identifiers associated with the obfuscated file, a nd transmitting the obfuscated file and the index for storage in the storage system.

In an embodiment, the step of obfuscating a file comprises the additional steps of: generating one or more dummy blocks, and

interspersing the dummy blocks with the data blocks.

In an embodiment, each data block and each dummy block is randomly assigned an identifier.

In an embodiment, the index comprises information associated with file including the identifier of the first data block associated with the file.

In an embodiment, the index for a file comprises the identifiers for the data blocks associated with the data blocks corresponding to the file and at least some dummy blocks in order to conceal the size of the file.

In an embodiment, the step of transmitting comprises transmitting the obfuscated file as a series of data blocks and dummy blocks.

In an embodiment, the method comprises the step of arranging the data blocks and dummy blocks based on their identifier.

In an embodiment, the identifier is a numeral and, the data blocks a nd dummy blocks are arranged in ascending on descending order based on the identifier of the data block a nd dummy block.

In an embodiment, the method comprises the step of encrypting each data block and each dummy block prior to the step of transmitting the obfuscated file and the index.

In an embodiment, the method comprises the step of encrypting the index prior to the step of transmitting the obfuscated file and the index.

In an embodiment, the step of transmitting the obfuscated file and index comprises transmitting the encrypted block and the encrypted index.

In an embodiment, each data block, dummy block and/or index are encrypted using one or more nonces, wherein at least the index is encrypted using a nonce that is unique and sepa rate to a nonce used to encrypt each data block and/or dummy block.

In an embodiment, at least each data block is encrypted using a two layer encryption. In an embodiment, the first layer of encryption is such that can be decrypted by an authorised user and the second layer of encryption can be decrypted by at least one server. The encryption may be performed by the shuffling server.

In another aspect, the invention broadly comprises in a non-transitory computer readable medium, storing instructions, wherein the instructions when read by a programmable processor cause the processor to execute the method of storing one or more files according to any one or more of the above methods or steps.

In another aspect, the invention broadly comprises a method of accessing one or more files from a storage system in an untrusted environment, the storage system comprising at least one storage server, at least one shuffling server for shuffling data, wherein the storage server and shuffling server are arranged in communication with each other, at least one electronic user device capable of data communication with the storage server over a data network, the method executed by a one or more processors associated with the at least one storage server comprising the steps of:

receiving a request to access the one or more files, the request comprising a file name,

identifying one or more data blocks based associated with the file name, and transmitting the one or more data blocks in combination with dummy blocks and/or data blocks associated with other files, such that the requested file can be recreated based on the transmitted one or more data blocks.

In an embodiment, the step of identifying one or more data blocks comprises the steps of:

determining an identifier from an index of identifiers corresponding to the file provided in the request, the one or more data blocks being identified based on the identifier.

In an embodiment, the method of accessing one or more files comprises the additional step of randomly adding dummy blocks and/or data blocks associated with other files prior to the step of transmitting .

In an embodiment, the step of adding dummy blocks and/or data blocks associated with other files obfuscates the size of the data or identity of blocks or number of blocks associated with a file defined in the request.

In an embodiment, the received request comprises an encrypted file name or the request is encrypted, a nd the method comprises the additional step of decrypting the file name or decrypting the request to recognise the file name.

In an embodiment, the method of accessing one or more files comprises the additional step of transmitting a set of blocks and a plurality of identifiers to a shuffling server for shuffling by the shuffling server, and wherein the blocks in the set of blocks are each associated with an identifier of the plurality of identifiers.

In an embodiment, the blocks and/or the identifiers are randomly shuffled . The shuffling server may be configured to shuffle the blocks using any suitable shuffling protocol .

In an embodiment, shuffling the blocks comprises adding additional dummy blocks such that a different number of blocks is received by the storage server as compared to the number of blocks transmitted to the shuffling server.

In an embodiment, the method of accessing one or more files comprises the steps of:

receiving the shuffled blocks and identifiers, and

storing the received blocks and identifiers.

In an embodiment, the method of accessing one or more files comprises the step of updating an index of blocks within the server and an index of identifiers within the storage server.

In an embodiment, the request to access the one or more files comprises a nonce in addition to the file name.

In an embodiment, the storage server and shuffling server are separate to each other.

In an embodiment, the method of accessing one or more files from a storage system comprises the additional steps of: receiving one or more certificates comprising information corresponding to the one or more files defined in the request and wherein the identifier is determined based on the one or more certificates.

In another aspect, the invention broadly comprises a non-transitory computer readable medium, storing instructions, wherein the instructions when read by a programmable processor cause the processor to execute the method of accessing one or more files according to any one or more of the above methods or steps. The non-transitory computer readable medium may be associated with or may form part of the storage server.

In another aspect, the invention broadly comprises a method of accessing one or more files from a storage system in an untrusted environment, the storage system comprising at least one storage server, at least one shuffling server configured to shuffle blocks, at least one electronic user device capable of data communication with the storage server and/or the shuffling server over a data network, the method executed by a one or more processors associated with the at least one user device comprising the steps of: transmitting a request to access the one or more files, the request comprising a file name and a nonce, wherein the nonce is provided to the shuffling server and the file name is provided to the storage server, receiving one or more data blocks in combination with dummy blocks and/or data blocks associated with other files not defined in the request from the storage server, and recreating the one or more files defined in the request based on the received data blocks.

In an embodiment, the request comprises one or more encrypted file names corresponding to the one or more files defined in the request and at least one associated nonces.

In an embodiment, the at least one nonce is used to encrypt the one or more file names prior to transmitting the request to access one or more files.

In an embodiment, the step of recreating the one or more files comprises decrypting only the data blocks associated with the one or more files and not decrypting the dummy blocks or blocks associated with other files not defined in the request.

In an embodiment, the step of recreating the one or more files comprises the additional step of a rranging the blocks in an order defined within the blocks, wherein arranging the blocks results in recreation of the one or more files.

In another aspect, the invention broadly comprises a non-transitory computer readable medium, storing instructions, wherein the instructions when read by a programmable processor cause the processor to execute the method of accessing one or more files according to any one or more of the above methods or steps. The non-transitory computer readable medium may be associated with or may form part of the user device.

In a further aspect the invention broadly comprises a method of obfuscating data in a storage system within an untrusted environment, the storage system comprising at least one storage server and at least one shuffling server configured for communication with each other over a data network, the method executed by a one or more processors associated with the at least one shuffling server comprising the steps of:

receiving a set of blocks comprising one or more blocks,

rearranging (shuffling) the one or more blocks within the set of blocks, and transmitting the rearra nged set of blocks comprising the one or more rearranged (shuffled) blocks.

In an embodiment, the method of obfuscating data in a storage system comprises the additional steps of:

receiving an index comprising one or more identifiers associated with the one or more blocks,

rearranging (shuffling) the one or more identifiers within the index, and transmitting a rearranged index comprising the one or more rearranged (shuffled) identifiers. In an embodiment, the method of obfuscating data in a storage system comprises the additional steps of: processing the received set of blocks to discern blocks that are related to each other or associated with a file, and the step of rearranging (shuffling) comprises rearranging the location or identity of one or more related blocks.

In an embodiment, a relationship between one or more blocks associated with a file remains unchanged through the obfuscation process.

In an embodiment, the step of rearranging (shuffling) comprises the additional step of: adding one or more additional dummy blocks within the set of blocks prior to transmitting the rearranged set of blocks.

In an embodiment, the method of obfuscating data comprises the additional step of re-encrypting the rearranged set of blocks and/or rearranged index prior to the step of transmitting.

In another aspect, the invention broadly comprises a non-transitory computer readable medium, storing instructions, wherein the instructions when read by a programmable processor cause the processor to execute the method of obfuscating data i .e. files according to any one or more of the above methods or steps. The non- transitory computer readable medium may be associated with or may form part of shuffling server of the storage system.

In a further aspect, the invention broadly comprises a file transfer system comprising :

a storage system, the storage system comprising at least a storage server and a shuffling server, the servers configured to electronically communicate with each other and one or more user devices,

the storage server configured to store one or more files or one or more data elements, the shuffling server configured to shuffle one or more data elements,

wherein the storage server is configured to facilitate access of one or more files by a user device, wherein the method of accessing one or more files comprises the steps of:

receiving a request to access the one or more files, the request comprising a file name,

identifying one or more data blocks based associated with the file name, and

transmitting the one or more data blocks in combination with dummy blocks and/or data blocks associated with other files, such that the requested file can be recreated based on the transmitted one or more data blocks.

In an embodiment, the step of identifying one or more data blocks comprises the steps of: determining an identifier from an index of identifiers corresponding to the file provided in the request, the one or more data blocks being identified based on the identifier.

In a further aspect, the invention broadly comprises a file transfer system comprising :

a storage system, the storage system comprising at least a storage server and a shuffling server, the servers configured to electronically communicate with each other and one or more user devices,

the storage server configured to store one or more files or one or more data elements, the shuffling server configured to shuffle one or more data elements,

wherein the shuffling server configured to obfuscate data elements, the method of obfuscating data comprising the steps of:

receiving a set of blocks comprising one or more blocks,

rea rranging (shuffling) the one or more blocks within the set of blocks, and transmitting the rearra nged set of blocks comprising the one or more rearranged (shuffled) blocks.

In an embodiment, the method of obfuscating data in a storage system comprises the additional steps of:

receiving an index comprising one or more identifiers associated with the one or more blocks,

rearranging (shuffling) the one or more identifiers within the index, and transmitting a rearranged index comprising the one or more rearranged (shuffled) identifiers.

In an embodiment, the shuffling server may execute one or more method steps according to the method of obfuscating data as described earlier.

In a further aspect, the invention broadly comprises a method for file transfer using a file transfer system comprising at least one storage server, at least one shuffling server for shuffling data, wherein the storage server and shuffling server are arranged in communication with each other, at least one electronic user device capable of data communication with the storage server over a data network, the method comprising the steps of:

receiving a request to access the one or more files, the request comprising a file name,

identifying one or more data blocks based associated with the file name, transmitting the one or more data blocks in combination with dummy blocks and/or data blocks associated with other files, such that the requested file can be recreated based on the transmitted one or more data blocks, obfuscating data following transmission of one or more data blocks or following receipt of a request, wherein the obfuscating data comprises the steps of:

receiving a set of blocks comprising one or more blocks from the storage server by the shuffling server,

rearranging the one or more blocks within the set of blocks by the shuffling server, and

transmitting the rearranged set of blocks comprising the one or more rearranged blocks to the storage server.

The storage server is configured to implement any one or more of the method steps according to the method of accessing files as executed by one or more processors of the storage server, as defined in relation to the earlier aspects of the invention.

Any aspect of the invention above may have any one or more features described in respect of any one or more of the other aspects of the invention described above.

The term "file" as used in this specification and claims, unless the context suggests otherwise, relates to a collection of data or information that is defined by a name i.e. a file name. The term "file" encompasses any type of file e.g. a data file, text file, program file, directory file and so on. Each different type of file can store different types of information.

The term "untrusted" as used in this specification and claims, unless the context suggests otherwise, is intended to mean publicly accessible and unsecured from an access perspective.

The term "trusted" as used in this specification and claims, unless the context suggests otherwise, is intended to mean not accessible by the public i.e. a secured for access and can only be accessed by authorized persons.

The phrase "user device" as used in this specification and claims, unless the context suggests otherwise, refers to a computing device associated with a user. User device can encompass a data owner or a data receiver or any other user that may use a system described herein.

The term "identifier" as used in this specification and claims, unless the context suggests otherwise, means a numeral or letter or symbol or any other suitable facet that is used to uniquely identify a block or data element. The terms, "ID", "id", "identifier" mean the same and are interchangeably used in this specification.

The term "WSS" as used in this specification and claims, unless the context suggests otherwise, is intended to mean Witness and Shuffle service. The term WSS, shuffling server, shuffling service all mean the same and can be interchangeably used to refer to the same component.

The term "SSS" as used in this specification and claims, unless the context suggests otherwise, is intended to mean Storage and Search service. The term SSS, storage server, storage service all mean the same and can be interchangeably used to refer to the same component.

The term "block" as used in this specification and claims, unless the context suggests otherwise, refers to a unit or element that comprises data or defines data.

The term "comprising" as used in this specification and claims means 'consisting at least in part of'. When interpreting statements in this specification and claims which include the term 'comprising', other features besides the features prefaced by this term in each statement can also be present. Related terms such as 'comprise' and 'comprised' are to be interpreted in a similar manner.

It is intended that reference to a range of numbers disclosed herein (for example, 1 to 10) also incorporates reference to all rational numbers within that range (for example, 1, 1.1, 2, 3, 3.9, 4, 5, 6, 6.5, 7, 8, 9 and 10) and also any range of rational numbers within that range (for example, 2 to 8, 1.5 to 5.5 and 3.1 to 4.7) and, therefore, all sub- ranges of all ranges expressly disclosed herein are hereby expressly disclosed. These are only examples of what is specifically intended and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this application in a similar manner.

As used herein the term '(s)' following a noun means the plural and/or singular form of that noun.

As used herein the term 'and/or' means 'and' or 'or', or where the context allows both.

The invention consists in the foregoing and also envisages constructions of which the following gives examples only.

As used herein "(s)" following a noun means the plural and/or singular forms of the noun.

In the following description, specific details are given to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, software modules, functions, circuits, etc., may be shown in block diagrams in order not to obscure the embodiments in unnecessary detail. In other instances, well- known modules, structures and techniques may not be shown in detail in order not to obscure the embodiments.

Also, it is noted that the embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process is terminated when its operations are completed. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc., in a computer program. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or a main function.

Aspects of the systems and methods described below may be operable on any type of general purpose computer system or computing device, including, but not limited to, a desktop, laptop, notebook, tablet or mobile device. The term "mobile device" includes, but is not limited to, a wireless device, a mobile phone, a smart phone, a mobile communication device, a user communication device, personal digital assistant, mobile hand-held computer, a laptop computer, an electronic book reader and reading devices capable of reading electronic contents and/or other types of mobile devices typically carried by individuals and/or having some form of communication capabilities (e.g., wireless, infrared, short-range radio, etc.), including wearable computing devices such as for example smart watches, smart glasses or headsets or the like.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention will now be described by way of example only and with reference to the drawings, in which:

Figure 1 shows an exemplary embodiment of a system for storing data and allowing access of the stored data.

Figure la shows an exemplary embodiment of a generalised server and its components that can be used as part of the storage service and/or shuffling service of the system illustrated in figure 1.

Figure 2 shows an exemplary method for storing one or more files on or within the system of figure 1 to store data in accordance with an embodiment.

Figure 3 shows a schematic diagram of the operations on the user side (i.e. data owner side) to upload a single file in accordance with an embodiment.

Figure 3a shows a schematic diagram of the operations on the user side for uploading a set of files in accordance with an embodiment.

Figure 4 shows the interactions between the various system components, of the system of figure 1, in order to upload a file in accordance with an embodiment.

Figure 5 shows an exemplary table storing the encrypted blocks in accordance with an embodiment.

Figure 6 shows an exemplary table storing the encrypted file indexes in accordance with an embodiment.

Figure 7 shows an exemplary table storing the nonces in accordance with an embodiment.

Figure 8 shows a flow chart of an exemplary method of accessing one or more files from the system of figure 1 in accordance with an embodiment. Figure 9 shows an illustration of the system interactions between the various system components of the system of figure 1 in accordance with an embodiment.

Figure 10 illustrates a flow chart of an exemplary method of performing a keyword search using the system of figure 1 in accordance with an embodiment.

Figure 11 shows a flow chart of an exemplary method of obfuscating data i.e. shuffling data in accordance with an embodiment.

Figure 12 shows an illustration of the system interactions between the shuffling service and storage service as part of the method of obfuscating data in accordance with an embodiment.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

This disclosure is broadly related to a method and system for securely uploading, storing and accessing one or more files in an untrusted (i.e. public) storage system. This disclosure also relates to a public storage system with search capabilities over encrypted data that allows a user to perform search operations on the storage system. The system and method in at least some embodiments provides a more secure approach to storing data in an untrusted i.e. publicly accessible storage system (i.e. storage environment). In at least some embodiments, the system and method has one or more of the following characteristics i) hides the operation pattern executed by clients, ii) hides any specific files being accessed from the storage system, iii) the physical location of the data being accessed is hidden.

In at least some embodiments, the system and method is advantageous because protects data against leakage based attacks from an attacker. In at least some embodiments, the system and method is advantageous because it provides a system and method for uploading data and/or searching and/or accessing data from a public storage system while protecting against attackers. In at least some embodiments, the system and method is also advantageous because the system as described herein provides one or more of the following functions; i) resists against leakage based attacks by concealing search patterns, size patterns and access patterns of data, ii) supports sub linear keyword searching and iii) supports multi-user access and file sharing.

Broadly speaking the invention of this disclosure provides a system comprising at least two servers, wherein one server functions as a storage server and one server functions as a shuffling server. The storage server and shuffling server are in electronic communication with each other and are also configured to electronically communicate with one or more user devices. The electronic communications may be wired or wireless communications and take place over suitable communication protocols. The system also includes a plurality of user computing devices that are configured to electronically communicate with the storage server and/or the shuffling server. The system allows a data owner to transfer files or other data to one or more data receivers. The storage server is configured to receive and store the one or more files in an obfuscated format. The one or more other data receivers can search and/or access a specific file from the storage server. The shuffling server is configured to shuffle elements of the obfuscated file as well as add and/or shuffle dummy elements. The shuffling server is further configured to shuffle identifiers associated with each file element and each dummy element. The data receivers are provided a plurality of elements including elements associated with a file and dummy elements in order to obfuscate search queries, physical locations of the data, specific files being accessed by the data receivers and prevent leakage based attacks. The system provides a secure way of file transfer over public networks and in untrusted environments. The system provides a secure file transfer system and method, e.g . like dropbox or box for file (i .e. data) transfer across public networks e.g . cloud networks that may be unsecured while minimising risks of cyber-attacks and data leakage. The storage server and shuffling server a re preferably independent of each other. The storage server and shuffling server prefera bly operate independently of each other but interact with each other.

Figure 1 shows an exemplary embodiment of a system 100 for data transfer, i .e. a system for uploading data and accessing data . Referring to figure 1, the system 100 comprises a data owner device 110 associated with a data owner and one or more data receiver devices 120, 122. Collectively or individually these devices may be referred to as user devices. The user devices 110, 120, 122 are any suitable computing devices e.g . a desktop computer or laptop computer. Alternatively, the user devices 110, 120, 122 may be mobile devices such as smartphones or tablets etc. The system 100 is configured to allow communication between computing devices and/or mobile devices.

The system 100 comprises a storage system 190. The storage system 190 comprises a storage service 200 and a shuffle service 300. The storage service 200 and shuffle service 300 are arranged in electrical communication with each other via any suitable communication network. The storage service 200 and the shuffle service 300 may be a rranged in two-way communication with each other using any suitable communication protocol . The user devices 110, 120, 122 a re arranged in communication with either one or both of the storage service 200 and the shuffle service 300. The user devices 110, 120, 122 may also be configured to communicate with each other.

The storage service 200 comprises at least one server a storage server 200. The term storage service and storage server refer to the same feature and may be interchangeably used . The shuffling service 300 may comprise at least server, a shuffling server 300. The term shuffling service and shuffling server refer to the same feature and may be interchangeably used . The storage server 200 and shuffling server 300 are preferably computing devices that are configured to perform various functions described herein . The storage service 200 and shuffling service 300 may be implemented on a cloud service that is offered by a cloud service provider. The cloud service comprises at least one server, but may comprise a plurality of servers e.g. a server farm.

The user devices 110, 120, 122 are arranged in a trusted environment 140. The storage server 200 and the shuffling server are arranged in an untrusted environment 150. The trusted environment 140 and untrusted environment 150 are separated virtually by a virtual separation 160. The separation 160 can denote that the trusted environment 140 may be an encrypted environment or a computing environment that is isolated from the general public i.e. the public cannot readily access the user devices.

The system 100 can be used by a user e.g. a data owner to transfer files from the data owner 110 to one or more of the users 120, 122. The data owner 110 can upload files 130 (e.g . a file or a set of files) to the cloud service providers (i .e. storage service 200 and shuffling service 300) . Preferably the files 130 are encrypted a nd uploaded . The files 130 are preferably uploaded to the storage service 200 (i .e. storage server). The file or set of files 130 can be accessed by the data receivers 120, 122 by providing trap doors to the cloud service providers. The trap doors may be generated from file names or keywords. The blocks a nd index (or indices) can be received from the storage service 200 by the users 120, 122.

The storage service 200 (i .e. storage server or servers) and the shuffling service 300 (i .e. shuffling server or servers) represent independent cloud service providers. The storage server 200 and the shuffling server 300 are preferably independent servers that are separated or operate indecently. The storage server 200 and the shuffling server 300 may be geographically sepa rated from each other. The storage service 200 (i .e. storage server 200) stores files, preferably encrypted files. The storage service 200 is also configured to execute search queries within the stored files based on a search request that may be received from any one of the users 110, 120, 122. The shuffling service 300 is configured to provide a witness (i.e. a certificate) to the storage service 200 for sea rching during a search operation e.g . to access a stored file. After each query, the storage service 200 is configured to send a set of data to the shuffling service 300 for shuffling and re- randomising to protect privacy and obfuscate search related information such as for example search patterns, file sizes, file blocks a nd other information.

The servers are preferably implemented on separate and independent computing devices. However, in some alternative configurations the storage server and shuffling server functionalities may be implemented on a single computing device that may include partitioned electronics or independent electronic modules and independent software to allow each the SSS, WSS to function independently while residing on the same hardware computing device.

Figure 1A illustrates a schematic diagram a generalised server 400 that can form part of one of the cloud service providers 200, 300. The server 400 is an exemplary server. The cloud service providers 200, 300 may each comprise one or more servers. Each of the servers in the cloud service providers 200, 300 may

The server 400 comprises suitable components necessary to receive, store and execute appropriate computer instructions. The components may include a processing unit 412, read-only memory (ROM) 414, random access memory (RAM) 416, and input/output devices such as disk drives 418, input devices 410 such as an Ethernet port, a USB port, etc. and communications links 420. The server 400 includes instructions that may be included in ROM 414, RAM 416 or disk drives 418 and may be executed by the processing unit 412. There may be provided a plurality of communication links 420 which may variously connect to one or more computing devices such as a server, personal computers, terminals, wireless or handheld computing devices, such as for example the user devices 110, 120, 122. At least one of a plurality of communications link may be connected to an external computing network through a telephone line or other type of communications link.

The server 400 may include storage devices such as a disk drive 418 which may encompass solid state drives, hard disk drives, optical drives or magnetic tape drives. The server 400 may use a single disk drive or multiple disk drives. The server 400 may also have a suitable operating system 422 which resides on the disk drive or in the ROM of the server 400. The operating system 422 may be stored as computer readable instructions (i .e. as software).

The server 400 may comprise a database 430 residing on a disk or other storage device which is arranged to store at least one record which may comprise one or more user credentials that correspond to the users that can use the system 100. The database 430 may be a relational database or any other suitable database structure. The database 430 may also store other information such as an index of identifiers that correspond to one or more data blocks or any other suitable information. The various files for transfer may be stored in the memory units or disk drives 418. The server 400 may comprise additional memory units that may be incorporated within the server 400. Alternatively, the additional memory units may be remote memory units e.g . across a cloud service.

The server 400 may comprise software that defines computer readable instructions that define the server 400 operations. For example, the server 400 comprises software e.g . a software application that can be executed by the processor 412 to perform any of the functions described herein, e.g . functions of the storage server 200 or shuffling server 300. The storage server 200 and the shuffling server 300 may each comprise the components of the generalised server 400 as described herein. Functions of the servers 200, 300 will be described herein. The functions may be controlled by suitable software stored in the servers 200, 300, wherein the software comprises computer readable a nd executable instructions that define the functions.

The user devices 100, 120, 122 may also comprise a similar structure as the generalised server 400. The user devices 100, 120, 122 are computing devices that may include the hardware components as the generalised server 400. Each of the user devices 100, 120, 122 comprise at least a processor and an associated memory unit. The user devices 100, 120, 122 may also include a user input unit e.g . a keyboard or a touch screen and a user display e.g . an LCD, LED or any other screen to display information.

Generally speaking a method of storing one or more files on a storage system in an untrusted environment, the storage system comprising at least two servers (a storage server and a shuffling server) in electronic communication with each other, at least one electronic user device capable of data communication with the at least two servers over a data network, the method executed by a one or more processors associated with the at least one electronic user device (e.g . the data owner device 110) comprises the steps of: dividing each file of the one or more files into a plurality of data blocks, obfuscating each file such that each file comprises a plurality of data blocks and dummy blocks, assigning each data block and each dummy block an identifier, creating an index of the identifiers associated with the obfuscated file, transmitting the obfuscated file and the index for storage in the storage system. The step of obfuscating a file comprises the additional steps of: generating one or more dummy blocks, and interspersing the dummy blocks with the data blocks.

Each data block and each dummy block is randomly assigned an identifier. The index comprises information associated with file including the identifier of the first data block associated with the file. The index for a file comprises the identifiers for the data blocks associated with the data blocks corresponding to the file and at least some dummy blocks in order to conceal the size of the file. The step of transmitting comprises transmitting the obfuscated file as a series of data blocks and dummy blocks. The blocks may be encrypted prior to transmitting to the system for storing.

Referring to figures 2 to 4 there is shown an exemplary method of uploading a file to the storage service 200 (i .e. storage server 200) utilising the system 100. Figure 2 shows a flow chart of a method of uploading a file using the system 100. Figure 3 shows a schematic diagram of the operations on the user side (i .e. data owner side) to upload a file. Figure 3a shows a schematic diagram of the operations on the user side for uploading a set of files. Figure 4 shows the interactions between the various system components in order to upload a file.

Referring to figure 2, the method for uploading a file 1000 begins at step 1002. Step 1002 comprises encrypting each file name and each key word for a key word searching database or record. Each file name may be encrypted by computing

EN - H k (filename ) ® n wherein n is a nonce and H is a hash function. By XORing the hash function with the nonce, the encrypted file name is semantically secure. The nonce may be a cryptographic nonce that can be used for encrypting file names. Each file may be encrypted by a unique nonce that may be generated by the data owner device.

The method 1000 proceeds to step 1004. Step 1004 comprises dividing each file into a plurality of data blocks. The block size may be predefined within software instructions. The block size may be defined by at the time of set up. In one example the block size may be 1KB or bigger. Preferably all the blocks are the same size in order to help with obfuscation of the file and help to prevent leakage based attacks.

Following step 1004, the method proceeds to step 1006. Step 1006 comprises generating a set of dummy blocks. The dummy blocks are random blocks that are not related to any data contained within a file or related to any file. Each file is obfuscated such that each file comprises a plurality of data blocks and dummy blocks. Each file is defined by dummy blocks and data blocks in order to obfuscate the size, structure and information contained within the file. The file is preferably obfuscated by generating one or more dummy blocks and interspersing the one or more dummy blocks with the data blocks of the file such that the data of the file and other file parameters e.g. size etc are obfuscated. Alternatively step 1006 may be performed simultaneously with step 1004 in order to obfuscate a file.

Step 1008 comprises assigning each block (including data blocks and dummy blocks) an identifier. The identifier may be a numerical identifier or some other suitable identifier. Step 1008 may also comprise requesting and receiving one or more identifiers from the storage server 200. The number of identifiers may correspond to the number of blocks that are generated. Preferably the data blocks and dummy blocks are randomly assigned an identifier.

Step 1010 comprises the step of encrypting each block once each block has been assigned an identifier. Each block may comprise a two-layer encryption. In the first layer an AES based encryption is used and in the second layer a XOR based encryption may be used. The first layer of encryption is such that can be decrypted by an authorised user e.g. an authorised data receiver and the second layer of encryption can be decrypted by one of the servers of the storage system e.g. the storage server 200.

In one example each block may be encrypted by computing EB - (bid, seed, (nid,Enc k (block)) ® m) wherein bid is the identifier of each block e.g. the numbers shown in figure 4. seed is a short random value for generating a long nonce m. More specifically, m = T k {seed) r where T is a pseudo random function. T may be a pseudo random function like HMAC (hash based message authentication code), nid is the bid of the next block. For example, referring to figure 4, 75 is the nid of the first block, nid = 0 for the last block. The bids are assigned randomly to the blocks. The order of bids is not the correct order for recovering a file. The correct order of blocks can only be unknown to a user and is generally provided to the user as part of the access process described later. Blocks belonging to each file are preferably linked by padding each block with bid of its next block before encrypting it. In this case, the use e.g. data receiver only needs information regarding which block is the head (i.e. first block of the file), and the next block can be easily identified after decrypting the head block. The encryption function Enc can be implemented with AES-CBC or AES-GCM mode, for example, m is a nonce generated from the seed. By XORing m, the encrypted block can be efficiently re randomized during the shuffling process, executed by the shuffling server 300, by updating the nonce m. A file may be considered obfuscated once it has been split into blocks and optionally once each block has been encrypted as per above.

In some configurations each block may be encrypted using a two-layer encryption in step 1010. For example, an AES based encryption may be used as a first layer and a XOR based encryption may be used as a second encryption layer. Other encryption methods or protocols may be utilised.

Step 1012 comprises creating an index for each file. The index comprises the identifiers associated with the obfuscated file. The index may comprise a list to hold all the identifiers for data blocks and identifiers of dummy blocks associated with each file. The index may be defined by (EN, hid, bidlist) for each file. Hid is the bid of the head block, e.g. hid = 100 as per figure 3 or hid = 10 as per figure 3a. bidlist is a list of the bids indexing the blocks for a file, e.g. the file shown in figure 3 that is defined by a plurality of blocks. In order to hide the size of each file, bidlist also includes some bids of dummy blocks and data blocks of other files. For example, in figure 3a, for "file 1" (i.e. file 1 of a set of files) bidlist = {3, 5, 9 ....}, where 3 points to a dummy block and 9 is the bid of other file blocks. This information is unknown to the storage server 200.

When uploading one single file, only the bids of dummy blocks and the data blocks of that file are included in the bidlist e.g. as per figure 3. When accessing a file, it is unnecessary to decrypt the dummy blocks and the blocks of other files within the set. The number of data blocks, n is included in the head block in order to define the number of data blocks that define a file. For example, with reference to figure 3 n = 2 since two blocks 75, 100 define the file. With reference to figure 3a, file 1 is defined by blocks 10 and 5, and hence n = 2, to define that 2 blocks define file 1. Since the blocks are linked in order by nid, the user just need to decrypt the first n blocks for recovering the file. In one example the encrypted head blocks is defined as:

EH = (bid, seed, (nid, Enc k (blockdata, n)) ® m) .

Step 1014 comprises sorting out the blocks based on their identifier i.e. based on the id of each block i .e. the blocks are sorted based on their bids. The blocks (including data blocks and dummy blocks) may be sorted in ascending or descending order.

Step 1016 comprises transmitting information to the storage system. In particular step 1016 comprises transmitting the encrypted index (or indices) and encrypted blocks to the storage server 200. The nonces used to encrypt file names are provided to the shuffling server 300. As shown in figure 4, the data owner 110 transmits the encrypted index and encrypted blocks to the storage server 200 and the nonces are transmitted to the shuffling server 300. The transmission can be performed using any suitable communication protocol and using any suitable communication network. The transmission of the blocks and index and nonces are transmitted from a trusted environment to an untrusted environment (where the storage system 190 reside). Figure 4 shows the step 1016 being executed once steps 1002 to 1014 are executed . Steps 1002 to 1016 may be executed by the data owner device 110. The steps 1002 to 1016 may be defined as computer readable and executable instructions e.g . as a softwa re application.

Figure 3 shows a schematic diagram of exemplary operations on the user side to upload a single file. Figure 3 shows an exemplary implementation of the method 1000. Referring to figure 3 a file name is encrypted as per step 1022. The encrypted file name is labelled {File}Enc. Step 1022 corresponds to step 1002 as per figure 2, and the user device 110 is configured to encrypt the file name as per step 1002. Step 1024 comprises dividing the file into a plurality of data blocks. The file may be divided into the blocks according to step 1004 of figure 2. Step 1026 comprises generating a plurality of dummy blocks to mix with the data blocks. Each of the data blocks and dummy blocks a re coloured differently. The dummy blocks may be generated in accordance with step 1006 described earlier with reference to figure 2. Step 1028 comprises assigning each block with identifiers randomly. Step 1028 corresponds to step 1008 as per figure 2. As shown in figure 3, each block is assigned a numerical identifier. For example, the data blocks a re assigned number 100 and 75. The dummy blocks a re assigned 90 and 90 as identifiers. Step 1030 comprises encrypting each block. Each block may be encrypted using a two-layer encryption or encrypted as described in step 1010. Step 1030 corresponds to step 1010 of figure 2. Step 1032 comprises building an index that comprises the identifiers associated with each block. Step 1032 corresponds to step 1012. An example index is illustrated where hid is defined as block 100 and the block id list (i.e. bidlist) that includes block identifiers 75, 80 etc. The list of blocks include data blocks as well as dummy blocks. Each file may include a specific index associated with that particular file. Step 1034 comprises sorting the blocks based on their identifier. As shown in figure 3, the blocks are arranged in ascending order based on the value of their identifier. Step 1034 corresponds to step 1014 of figure 2.

Steps 1022 to 1034 are implemented by the user device for example the data owner device 110. Steps 1022 to 1034 are implemented by the user device prior to transmitting the index and encrypted blocks to the storage server 200 (i .e. SSS) .

Figure 3a illustrates a schematic diagram of an exemplary operations on the user side to upload a set of files. Figure 3a shows an exemplary implementation of the method 1000. Referring to figure 3a each file name from a set of files is encrypted at step 1042. The encrypted file names are represented as {(Filel)Enc (File2)Enc.... >. File 1 corresponds to a first file in the set of files. File 2 corresponds to a second file in the set of files. Step 1042 can correspond to step 1002 in method 1000. Step 1044 comprises dividing each file from the file set into a plurality of data blocks. Step 1044 corresponds to step 1004. Each data block comprises data from the file. The data blocks from each file are represented by unique rendering applied to the blocks. In the illustrated exa mple of figure 3a, each file is split into two data blocks. Step 1046 comprises generating a plurality of dummy blocks and the dummy blocks are represented as the two blocks on the far right of "file 3". The dummy blocks are also rendered differently to the data blocks. Step 1048 comprises assigning each block an identifier. The identifier in the illustrated example of figure 3a is a numeral . As shown in figure 3a, the data blocks of file 1 are assigned identifier 10 and 5, the data blocks of file 2 are assigned identifier 1 and 7 respectively and so on. The dummy blocks are also assigned identifiers. Step 1048 corresponds to step 1008 in figure 2, wherein the blocks are ra ndomly assigned identifiers. Step 1050 comprises encrypting each block including data blocks and dummy blocks. Step 1050 corresponds to step 1010 of figure 2. Step 1052 comprises building an index for each file. A unique index may be developed corresponding to each file within the set of files. Step 1052 corresponds to step 1012 of figure 2. Step 1054 comprises sorting out the blocks based on the identifier. In the illustrated example of figure 3a the blocks are sorted and arranged in an ascending order prior to the step of transmitting the blocks and index.

The encrypted blocks are stored in the storage server 200 (SSS) . Figure 5 shows an example of the encrypted blocks. Referring to the table 2000 of figure 5, each block is identified by a unique bid . The seed is stored in a separate field of ta ble 2000. The 5th, 10th and 100th blocks correspond to data blocks of "file 1". The 10th block is the head block, the 100th block is the second block and the 5th block is the last block of "file 1". The encrypted blocks may be stored in a memory unit associated with the storage server 200, and may be stored in table form or any other suitable format. In some configurations the encrypted blocks may be stored in a block database that resides in the storage server 200 e.g. database 430.

Figure 6 illustrates an exemplary table 2010 storing the encrypted indexes (i .e. indices) stored in the storage server 200 (SSS) . The first record in this table corresponds to the example blocks shown in table 2010. In the table 2010, 10 is the bid of "file l"s head block, and {3, 5, 9, 100} correspond to the bids of the rest blocks. The table 2010 may be stored in a memory unit or may be stored in an index database that resides in the storage server 200.

Figure 7 shows a table comprising a plurality of nonces that were used during encryption in method 1000. The table 2020 comprises nonces included in the index to support an encrypted search. The id-th nonce in this table corresponds to the id-th index in table 2010. Each nonce corresponds to each encrypted index and was used to encrypt the index and/or blocks.

In at least one configuration the tables 2000 and 2010 may be stored in any suitable format, e.g. the table format as illustrated . For example, these tables may be stored in a memory unit associated with the storage server 200. The table 2020 may be stored in a memory unit associated with the shuffling server 300. In some configurations the tables may be stored in or as part of a database associated with one or more of the servers 200, 300.

The method 1000 (and methods shown in figure 3 or figure 3a) is preferably implemented by the user device, in particular the data owner device. In an alternative embodiment the method 1000 (and methods shown in figures 3 or figure 3a) may be implemented by the storage server 200. In this alternative embodiment the storage server 200 may receive encrypted files and the storage server 200 may be configured to execute the method 1000. The method 1000 may comprise the additional step of rearranging i .e. shuffling the data blocks and dummy blocks. The index of identifiers associated with the obfuscated file may be rearranged or shuffled to correspond to the rearrangement of the obfuscated file (i .e. the order of the blocks) . The shuffling server 300 may be configured to perform the shuffling (i .e. rearranging) blocks.

One or more files can be accessed from the storage system 190 by one or more users e.g . the data receivers 120, 122. Generally speaking a method of accessing one or more files from a storage system in an untrusted environment, the storage system comprising at least one storage server, at least one electronic user device capable of data communication with the storage server over a data network, the method executed by a one or more processors associated with the at least one storage server comprises the steps of: receiving a req uest to access the one or more files, the request comprising a file name, identifying one or more data blocks based associated with the file name, transmitting the one or more data blocks in combination with dummy blocks and/or data blocks associated with other files, such that the requested file can be recreated based on the transmitted one or more data blocks.

The step of identifying one or more data blocks comprises the steps of: determining an identifier from an index of identifiers corresponding to the file provided in the request, and the one or more data blocks are identified based on the identifier. The method of accessing one or more files comprises the additional step of randomly adding dummy blocks a nd/or data blocks associated with other files prior to the step of transmitting. The step of adding dummy blocks and/or data blocks associated with other files obfuscates the size of the data or identity of blocks or number of blocks associated with a file defined in the request.

The method of accessing one or more files may comprise the additional step of transmitting a set of blocks and a plurality of identifiers to a shuffling server for shuffling by the shuffling server, and wherein the blocks in the set of blocks are each associated with an identifier of the plurality of identifiers. The blocks and/or the identifiers are randomly shuffled. Shuffling the blocks comprises adding additional dummy blocks such that a different number of blocks is received by the storage server as compared to the number of blocks transmitted to the shuffling server.

Figure 8 shows a flow chart of an exemplary method of accessing one or more files from the system 100. Figure 9 shows an illustration of the system interactions between the various system components of the system 100.

Figure 8 shows an exemplary method 3000 of accessing one or more files from the system 100. The one or more files may be accessed by a user e.g. a data receiver 120, 122. The method 3000 commences at step 3002 that comprises receiving a request to access a file. The request comprises an encrypted file name matching the file name that a user want's to access. The file name is encrypted by a nonce (e.g. a cryptographic nonce) . The nonce is provided to the WSS 300 and the encrypted file name is provided to the SSS 200. The file name is encrypted by the user device (i.e. data receiver device) . The user simply requires the file name. The user encrypts the file name by computing the following expression : EQ H k (filename) f h. EQ and h are transmitted to the storage server 200 and the shuffling server 300 respectively.

Step 3004 comprises the step of receiving a set of witnesses i .e. certificates at the storage service 200 (i.e. storage server) from the shuffling service 300 (i .e. shuffling server) . The shuffling server 300 generates the set of witnesses i .e. certificates with the nonce. Specifically for each nonce stored within the table shown in figure 7, the storage server 300 computes w ld H (n ld ® h) A set of witnesses e.g . W = {wl, w2 ....} is transferred from the shuffling server 300 to the storage server 200 for searching. Step 3006 comprises the storage server 200 determines the specific file index that matches the inserted file name and determine the specific block list corresponding to the specific file index. The search server 200 uses the witnesses W and EQ, to search the table shown by checking if the following expression is satisfied; H(EQ ® EN ld ) == w ld .

If the expression is satisfied then the storage server 200 stops searching and returns the blocks indexed by hidid a list of block identifiers i.e. bidlistid to the user device (i.e. device 120 or 122) . For example, if the user wants to access "file 1", the first index in table 2010 will match the query since:

= w ±

The first index in table 2010 of the indexes corresponds to "file 1". The 10th, 3rd, 5th, 9th and 100th records including the bids, seeds and encrypted blocks EBs from table 2000 will be sent to the user in order.

Step 3008 comprises the storage server sending all blocks indexed by the index corresponding to the requested file name including all data blocks, dummy blocks and/or blocks associated with other files, other than the requested file, to the user device. This is advantageous because the search queries, the size of the file and the data blocks of the file are hidden i.e. obfuscated from any attacker since a series of encrypted blocks and block identifiers are provided.

Step 3010 comprises the user device decrypting the received encrypted blocks. The user device 120, 122 only decrypts the blocks associated with the requested file to recover the file. The file may be recovered in plaintext format or any other suitable format. For example in the above context the user decrypts the 10th block. In this example the user decrypts the 10th block by performing the following function :

1) T k (seed 10 ) ® m

2) EB 10 f m ® (100, Enc k (block 0 , 3))

3) nid = 100

4) Dec k (Enc k (block 0 , 3)) ® (block 0 , 3)

Next the user decrypts the 100th block as above to get block 1 and the bid of the next block, which is 5. Finally, the user decrypts the 5th block in the same way as above to get block2 and then the whole file is recovered.

Step 3012 further comprises transmitting a set of blocks and associated file indices i.e. file indexes to the shuffling server 300 for shuffling i.e. rearranging. This helps to obfuscate the exact blocks or data elements that are searched and identified in any search. The shuffling helps to obfuscate the searching queries and also helps to reduce or prevent leakage based attacks. The shuffling process will be described in more detail later.

Step 3014 comprises receiving re-encrypted and shuffled data blocks and associated file indices (i.e. file indexes). The shuffling server 300 is configured to shuffle the data blocks and/or the file indexes, as well as re-encrypt the shuffled blocks and/or indexes. The re-encryption method can be any suitable re-encryption method. The storage server 200 is configured to receive these rearranged i.e. shuffled blocks and indexes.

Step 3016 comprises the storage server 200 updating the storage i.e. storing the received shuffled data blocks and associated indexes. The storage server 200 may be configured to update the tables 2000, 2010, 2020 with the shuffled data blocks and indexes. The storage server 200 may also store updated nonces that may be received from the shuffling server 300.

The method 3000 is preferably implemented by the storage server 200. The storage server 200 interacts with the shuffling server 300 and the data receiver (i.e. user) device 120, 122 in order to allow searching and accessing a file.

As shown in figure 9, the step 3002 comprises delivering a nonce to the WSS 300 and the encrypted file name is provided to the SSS 200. As shown in figure 9, the WSS 300 provides the witnesses to the SSS 200. Figure 9 shows step 3004 where the witnesses or a set of witnesses are provided to the storage server 200 from the shuffling server. The witnesses are generated by the WSS 300 from the nonces received by the WSS 300. The identified blocks are provided to the user devices from the SSS 200, according to step 3006. A set of blocks and indexes are transmitted from the SSS 200 to the WSS 300 for shuffling as per step 3012, shown in figure 9. Finally, once the shuffling process has been executed by the WSS, the WSS transmits the shuffled blocks and indexes to the SSS 200. The storage server 200 is configured to update its storage with the shuffled blocks and indexes.

Figure 10 illustrates an exemplary method of performing a keyword search 4000. The method 4000 is preferably implemented by the system 100 and its components. The method 4000 may be implemented by some components of the system 100, e.g. storage server 200 and shuffling server 300. The operations of performing keyword searches on the system 100 are similar to the search for file names. The difference is the keyword search is performed over the encrypted inverted keyword index, and the file access by name is performed over the encrypted file names index. Both indexes may be created, stored and updated in the storage server 200 and/or shuffling server 300, each time a new file is stored. Referring to figure 1, the method 4000 begins at step 4002. Step 4002 comprises receiving an encrypted keyword from the user device at the storage server 200. The keyword is encrypted using a nonce and the nonce is transmitted to shuffling server 300.

Step 4004 comprises receiving one or more witnesses at the storage server 200 from the shuffling server 300. The shuffling server 300 is configured to generate one or more witnesses i.e. a set of witnesses with the received nonce. The shuffling server 300 transmits the set of witnesses to the storage server 200. In one example the set of witnesses may be generated in a similar manner as described earlier with reference to figures 8 and 9.

Step 4006 comprises checking and identifying the inverted index that matches the interested keyword at the storage server 200. The identified inverted index is used to identify a list of encrypted file names within the storage server 200.

Step 4008 comprises transmitting the blocks of the files that correspond to the identified index. The blocks may be transmitted in two different options. In the first option the storage server 200 is configured to sea rch the file names over the file indexes (i .e. indices) one by one, such as for example using a brute force approach . Other systematic approaches may be used . The storage server 200 is configured to identify and send the matched blocks to the user. The second option is the storage server 200 is configured to first forward the matched file names to the user device. The user device is configured to decrypt the received matched file names and transmit the name of the required files to at least the storage server 200 (i .e. storage service 200) for searching . The storage server 200 is configured to perform a search within each encrypted file name and send the blocks of the required files to the user. The user device can assemble the blocks to recreate the file.

Step 4010 comprises transmitting the inverted indexes to the shuffling server 300 from the storage server 200, for shuffling . The shuffling process hides or obfuscates the keyword search pattern.

Step 4012 comprises receiving re-encrypted and shuffled indexes by the storage server 200, from the shuffling server 300. The shuffling server 300 is configured to perform a shuffling i .e. rearranging process that will be described later.

Step 4014 comprises storing the inverted indexes within the storage server 200. The inverted indexes may be similar to tables 2000, 2010 as described earlier. The inverted indexes correspond to keywords within files across the storage server 200. The inverted index may be created by the data owner and transmitted to the storage server 200, or the inverted index may be created by the storage server 200 as part of the data upload process. The method 4000 is preferably implemented using the components of the system 100. The method 4000 as described herein is implemented by the storage server 200. The method 4000 may be defined in computer readable and executable instructions e.g. defined in a software application.

In order to hide the access pattern, the record of the blocks and/or indexes should be obfuscated i.e. shuffled or rearranged. This obfuscates the data and search patterns. The shuffle operation may be performed after each file access, after a period of predetermined time or after each session or after any specified event. The number of records that are shuffled for each shuffling process may be predefined. For example, the number of records that may be included in the data set for shuffling may be based on the security requirements. In this example for higher security requirements, the shuffle frequency and size of the set of records (i.e. blocks and/or indexes) for shuffling may be greater as compared to lower security requirements. The use of the system and the type of information e.g. the sensitivity of the information being stored and accessed from the system 100 can determine the frequency of shuffling operations and the size of the records set for shuffling.

Preferably the entire record i.e. both the blocks and indexes are shuffled. For example, at least some elements from table 2000 and table 2010 are shuffled. If shuffling occurs after each file access, the storage server 200 does not learn anything useful and hence attackers would not learn or access any useful information. In order to ensure the performance of the system, the blocks and indexes are shuffled at the end of each day and a set parameter L to control the number of indexes and blocks for shuffling. Specifically, if M files were accessed in a day, the l * M indexes (e.g. from table 2010) and the associated blocks (e.g. blocks from table 2000) will be shuffled at the end of the day or alternatively at the start of the next day. In the example below the l = 3.

In this example a first record of table 2010 is accessed by the user device. After sending the associated blocks to the user, the storage server 200 randomly picks another 2 records from 2010 say for example the 2nd and 53rd records are selected for the shuffling.

Generally q method of obfuscating data in a storage system within an untrusted environment, the storage system comprising at least one storage server and at least one shuffling server configured for communication with each other over a data network, the method executed by a one or more processors associated with the at least one shuffling server comprising the steps of: receiving a set of blocks comprising one or more blocks, rearranging (shuffling) the one or more blocks within the set of blocks, transmitting the rearranged set of blocks comprising the one or more rearranged (shuffled) blocks. The method of obfuscating data in a storage system comprising the additional steps of: receiving an index comprising one or more identifiers associated with the one or more blocks, rea rranging (shuffling) the one or more identifiers within the index, transmitting a rea rranged index comprising the one or more rearranged (shuffled) identifiers.

Figure 11 shows an exemplary flow cha rt of a method of obfuscating data 5000 i .e. shuffling data . The method 5000 commences at step 5002.

Step 5002 comprises receiving a set of blocks from the storage server 200. Step 5002 also comprises receiving an index set from the storage server 200. As pa rt of step 5002 indexes matching data blocks and random indexes are received in addition to the associated blocks, from the storage server 200. For example, the shuffling server 300 receives an index set e.g . 10, (3, 5, 9, 100}),

1, {5, 7, 9}),

3, 30, {2, 3, 23})}

and associated block set EB 3 , EB 5 , EB 7 , EB 9 , EB 23 , EB 30 , EB 100 }.

Step 5004 comprises removing the nonce from each encrypted block EB by computing

As a result, the nid of each block is exposed to the shuffling server 300. The shuffling server 300 can determine which blocks belong to the same file and their order. To ensure the user can recover the files, the logical order among the linked blocks is maintained in this process.

Step 5006 comprises shuffling i .e. rea rranging the blocks in the Blockset. The shuffling server 300 is configured to randomly shuffle the blocks to obfuscate the block set. In order to maintain the logical order among linked blocks, the shuffling server only shuffles those blocks whose previous blocks and head blocks are also provided in the set of blocks i .e. Blockset. For example, the previous block of EB100 i.e. EB1 is in the Blockset, and the previous block of EB7 i.e. EB1 is also provided within the Blockset. The shuffling server 300, in this example, is configured to exchange the locations of EB7 and EB100. Following this, the bid of EB7 will be 100 after shuffling i.e. the block identifier of block 7 will be 100. The nid in EB1 should be updated to 100 and the nid in EB10 should be updated to 7. The bidlist in the index is also updated accordingly.

However, those blocks whose previous blocks are not in the Blockset e.g . EB9, will not be shuffled i .e. will be maintained as is. The shuffling server may change the size of each bidlist (i.e. list of block ids) by adding or removing bids (i.e. block ids) of other unrelated files to further obfuscate data in the storage system.

Step 5008 comprises generating a new random seed seed' a nd re-randomizing each block in the Blockset by computing : ( nid, Enc k (block )) ® T k (seed'). Step 5010 comprises removing the old nonces from each index in the Indexset by computing : EN ld 0 n id H k (filename). The index set is a set of indexes that corresponds to the blocks received at the shuffling server. n ld is determined from the table 2020 that is stored in the shuffling server 300.

Step 5012 comprises shuffling the indexes in the Indexset in order to obfuscate the indexes thereby reducing the vulnerability of data within the storage system. The shuffling is preferably a randomized shuffling.

Step 5014 comprises re-randomizing each index in the set of indexes i.e. Indexset by performing the function EN[ a H k (filename) 0 h' . The idth nonce in table 2020 will be updated into n'.

Step 5016 comprises the step of transmitting the updated set of indexes and the set of corresponding blocks to the storage server 200. For example, the Indexset and the Blockset are transmitted to the storage server 200 from the shuffling server 300. The storage server 200 is configured to update the table of encrypted blocks and table of indexes. In one example the storage server 200 is configured to update table 2000 and the table 2010 with the new values.

The method 5000 is preferably implemented by the shuffling server 300. The steps of method 5000 may be defined as computer readable and executable instructions e.g. as a software application. The shuffling server 300 can interact with other components of the system 100.

Figure 12 shows a schematic of the shuffling process as implemented by the shuffling server 300. Figure 12 also shows interactions between the shuffling server 300 and the storage server 200. Referring to figure 12, "file2" is accessed by the user. Step 5002 comprises receiving an index set and a block set at the shuffling server 300 (i.e. WSS) from the storage server 200 (SSS). The index set may in the form similar to table 2010 and the block set may be in a form similar to table 2000. Step 5004 shows the step of removing the old nonce from each encrypted block. Step 5006, shown in figure 12 comprises shuffling the blocks randomly. As shown in figure 12, block 1 and block 10 are headers. Blocks 9's, 3's, 5's, 7's and 100's previous are block 1, 5, 100, 7 and 10, respectively, which are in the block set. Except block 7, all the other blocks can be shuffled randomly. Step 5008 comprises re-randomizing each block by XORing a nonce. The nonce is preferably derived from a new seed. Step 5010 is not illustrated on figure 12. Step 5012 comprises updating the list of block ids i.e. bidlist. Certain bids may be removed or added for other files. Step 5014 comprises shuffling and re-randomising the indexes by XORing new nonces. The nonces are updated with the new nonces. Step 5014 comprises transmitting the indexes and blocks back to the storage server 200 (i.e. SSS 200). The storage server 200 (SSS) is configured to update the tables (e.g. tables 2000, 2010) with the updated values.

In an alternative configuration, at least one of the servers may be positioned within the trusted environment or at least partially within the trusted environment. For example, the shuffling server 300 (WSS) may be positioned or configured to be within the trusted environment or partially rest within the trusted environment. The shuffling server 300 may include appropriate security layers to allow the shuffling server 300 to be used in a trusted environment or partially within a trusted environment.

The presently described system for data transfer 100, allows uploading and storing data from a data owner as well as allowing a data receiver to access data. Further the system 100 is configured to allow keyword searching or searching of one or more files upon receipt of an appropriate request. The system for data transfer 100, and the associated methods in particular obfuscating data by using data blocks and shuffling the blocks and indexes associated with a file protects files in that are stored. The system is specifically useful for file transfer and protecting file related information from being leaked or accessed by attackers. The system is advantageous because the search operation is performed over indexes and the file is recovered by decrypting blocks. The blocks can include data blocks and dummy blocks, and may also include blocks unrelated to the requested file. The blocks are encrypted using a unique two-layer technique as described earlier herein.

The first layer encryption is used to protect the data from the cloud service providers i.e. from the storage service 200 and shuffling service 300 and can only be decrypted by users. The second layer of encryption is used to re-randomise blocks during the shuffling process and can be decrypted by both the WSS and the users. Each block is encrypted with a file-specific key (i.e. the first layer of encryption), i.e. the blocks of different files are encrypted with different keys. Second, the blocks of one file are linked together in order by putting in each block the identifier of its next block. A digest i.e. a record for each block is generated to ensure its integrity and appended to the encrypted block. The encrypted block may be XORed with a nonce such that the link between blocks is hidden from the storage service 200. This improves security of the files as the links between blocks are also obfuscated from a public attacker. The system improves security and reduces vulnerability of the files, file names, file sizes, search queries and results returned based on search queries when the files are stored and accessed from a storage system in an untrusted e.g. public environment.

In at least some embodiments, the system also helps to hide the size of each file by adding dummy blocks and by adding blocks of unrelated files. As described earlier, when inserting files, a small set of dummy blocks are generated to hide their sizes. Because of the encryption the SSS 200 (i.e. storage server) cannot distinguish whether the shuffled blocks belong to the same file or not. Thus, in each index, the specific identity of the blocks and the number of block identifiers are obfuscated i.e. hidden from the storage service 200.

The system 100 as described herein and the methods described herein can be used to create public file transfer systems to allow files to be transferred between various users. The file transfer system may be a cloud based system wherein the system components may be operated by one or more cloud service providers. The system may provide a search encryption solution that can be used over a data storage system in an untrusted environment e.g. a public environment. The system provides an outsourced cloud service for file storage and transfer that is less likely to get attacked and suffer from leakage based attacks. The system 100 provides an encrypted storage system with search capabilities over encrypted data that allows a user to perform search operations on the outsourced data i.e. data stored in the untrusted environment without the need for the cloud service provider to access the decrypted data or keying material. The decrypted data and keying material is isolated from the storage service 200 and the shuffling service 300. The storage service 200 and shuffling service 300 may be separate and remote and never transmit enough information between each other to allow leakage of data. The proposed system and methods are particularly suitable for storing, transferring and searching files that are encrypted and stored. The system is also advantageous because it provides a secure manner for two separate cloud service providers to operate together to provide data storing, search and data transfer functions, without compromising data, search strings, search patterns or other data parameters.

The system of data transfer and its associated functionality is advantageous because the access, search and size patterns are hidden. The shuffling and re-randomising operations by at least the WSS 300, reduces the chance of the servers 200, 300 from inferring the search patterns from the access pattern. The system is arranged and functions to resist against leakage based attacks. The shuffling process and splitting each file into blocks helps to prevent leakage based attacks. The data transfer system supports multi-user access and the user rights can be revoked without re-generating keys or re- encrypting data. The data transfer system as described herein is also advantageous because it can process all types of data efficiently including files.

Although not required, the embodiments described with reference to the Figures can be implemented to file an application programming interface (API) or as a series of libraries for use by a developer or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files the skilled person assisting in the performance of particular functions, will understand that the functionality of the software application may be distributed across a number of routines, objects or components to achieve the same functionality.

It will also be appreciated that the methods and systems of the invention are implemented by computing system or partly implemented by computing systems than any appropriate computing system architecture may be utilised. This will include stand-alone computers, network computers and dedicated computing devices. Where the terms "computing system" and "computing device" are used, these terms are intended to cover any appropriate arrangement of computer hardware for implementing the function described.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.