Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR MAKING A FILE-LEVEL BACKUP FROM A FIRST STORAGE TO A SECOND STORAGE
Document Type and Number:
WIPO Patent Application WO/2024/056182
Kind Code:
A1
Abstract:
Disclosed are a method (1) and a system (2) for making a file-level backup from a first storage(3) to a second storage (4). The method (1) comprises intercepting (11) a write operation to a file of a file system of the first storage (3); storing (12) a descriptor (5) of the write operation in a data structure (21) being preventive of false negative testing of an occurrence of the descriptor (5) in the data structure (21); and copying (13) a logical block of the file to the file-level backup maintained at the second storage (4) in accordance with the occurrence of the descriptor (5) in the data structure (21). This avoids an excessive memory consumption in connection with the (potentially huge) number of tracked changes in the file system.

Inventors:
NATANZON ASSAF (DE)
MOR YARON (DE)
LEV BINYAMIN (DE)
ZACH IDAN (DE)
ABUHASIRA ARIK (DE)
DUER EDDY (DE)
STERNBERG MICHAEL (DE)
Application Number:
PCT/EP2022/075724
Publication Date:
March 21, 2024
Filing Date:
September 16, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
NATANZON ASSAF (DE)
International Classes:
G06F11/14
Foreign References:
US8055613B12011-11-08
US20110066819A12011-03-17
US20190294508A12019-09-26
CN107678892A2018-02-09
US20130254479A12013-09-26
Attorney, Agent or Firm:
KREUZ, Georg M. (DE)
Download PDF:
Claims:
CLAIMS

1. A method (1) for making a file-level backup from a first storage (3) to a second storage (4), the method (1) comprising intercepting (11) a write operation to a file of a file system of the first storage (3); storing (12) a descriptor (5) of the write operation in a data structure (21) being preventive of false negative testing of an occurrence of the descriptor (5) in the data structure (21); copying (13) a logical block of the file to the file-level backup maintained at the second storage (4) in accordance with the occurrence of the descriptor (5) in the data structure (21).

2. The method (1) of claim 1, further comprising adding (14), to the file-level backup, a file of the file system of the first storage (3) that is not yet included in the file-level backup maintained at the second storage (4).

3. The method (1) of claim 1 or claim 2, further comprising deleting (15), from the file-level backup maintained at the second storage (4), a file that is not anymore included in the file system of the first storage (3).

4. The method (1) of any one of the claims 1 to 3, further comprising emptying (16) the data structure (21).

5. The method (1) of any one of the claims 1 to 4, wherein the descriptor (5) comprises a unique identifier (51) of the file being subject to the write operation; and a number (52) of a logical block of the file being subject to the write operation.

6. The method of claim 5, wherein the unique identifier (51) of the file comprises one of: a file path of the file being subject to the write operation; and a serial number of the file being subject to the write operation.

7. The method (1) of claim 5 or claim 6, wherein the number (52) of the logical block of the file depends on a block size of the file system.

8. The method (1) of any one of the claims 1 to 7, wherein the data structure (21) comprises a Bloom filter, comprising an array of m bits; and k different hash functions, being configured to generate respective bit positions associated with the descriptor (5) within the array in accordance with a uniform random distribution.

9. The method (1) of claim 8, wherein the quantity m is proportional to the quantity k and to an expected total quantity of write operations.

10. The method (1) of claim 8 or claim 9, wherein the quantity k comprises a constant which depends on a given false error rate e.

11. The method (1) of any one of the claims 8 to 10, wherein storing (12) the descriptor (5) of the write operation in the data structure (21) comprises: generating ( 121 ) k bit positions in accordance with the k different hash functions and the descriptor (5); and setting (122) the bits of the array corresponding to the k generated bit positions.

12. The method (1) of any one of the claims 8 to 11, wherein copying (13) the logical block of the file to the file-level backup maintained at the second storage (4) in accordance with the occurrence of the descriptor (5) in the data structure (21) comprises: encoding (131) a descriptor (5’) of a potential write operation to the logical block of the file; testing (132) the occurrence of the descriptor (5) in the data structure (21) using the descriptor (5’) of the potential write operation; and transferring (133) the logical block of the file to the file-level backup maintained at the second storage (4) in accordance with the tested occurrence.

13. The method (1) of claim 12, wherein encoding (131) the descriptor of the potential write operation to the logical block of the file comprises: encoding a descriptor of a respective potential write operation to each logical block of the file.

14. The method (1) of any one of the claims 1 to 13, wherein the file-level backup comprises a snapshot-based incremental backup; and wherein the method (1) applies between consecutive snapshots of the snapshot-based incremental backup.

15. The method (1) of any one of the claims 1 to 13, wherein the file-level backup comprises a real-time incremental backup; and wherein the method (1) applies during delay periods of the real-time incremental backup.

16. A computer program comprising a program code for performing the method (1) of any one of the claims 1 to 15, when executed on a computer.

17. A system (2) for making a file-level backup from a first storage (3) to a second storage (4), the system (2) comprising a data structure (21), being preventive of false negative testing of an occurrence of a descriptor (5) in the data structure (21); and a processing device (22), being configured to intercept (11) a write operation to a file of a file system of the first storage (3); and to store (12) the descriptor (5) of the write operation in the data structure (21); and to copy (13) a logical block of the file to the file-level backup maintained at the second storage (4) in accordance with the occurrence of the descriptor (5) in the data structure (21).

18. The system (2) of claim 17, being configured to perform the method (1) of any one of the claims 1 to 15.

Description:
METHOD AND SYSTEM FOR MAKING A FILE-LEVEL BACKUP FROM A FIRST STORAGE TO A SECOND STORAGE

TECHNICAL FIELD

The present disclosure relates generally to the field of data backups, and more particularly to a method and a system for making a file-level backup from a first storage to a second storage.

BACKGROUND ART

In a file-level backup system for first storage that supports volume snapshots, there is a need to ship the snapshots to a second storage in an efficient way.

Typically, the backup system scans the latest snapshot made from the first storage and the latest snapshot shipped to the second storage and looks for differences, as indicated by the last modified time attribute of the files, for example.

For each file of the first storage that was created, the new file is transferred to the second storage. For each file of the first storage that was deleted, a delete command for the deleted file is sent to the second storage.

In the case of a file that existed in the previous snapshot and still exists in the current snapshot, the backup system can either (i) transfer the entire file or (ii) transfer only the changes made to the file.

For the former option, the disadvantage is clear if, for example, only 1% of a huge file were changed, since the backup system would nevertheless copy the entire file.

For the latter option, the backup system needs to intercept the changes written to the file system, to track the changes (i.e., the logical blocks that were written) in each file, and to transfer only these logical blocks. However, in connection with the (potentially huge) number of tracked changes in the file system, an excessive memory consumption may result. SUMMARY

It is an object to overcome these and other drawbacks. This and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, a method is provided for making a file-level backup from a first storage to a second storage. The method comprises intercepting a write operation to a file of a file system of the first storage; storing a descriptor of the write operation in a data structure being preventive of false negative testing of an occurrence of the descriptor in the data structure; and copying a logical block of the file to the file-level backup maintained at the second storage in accordance with the occurrence of the descriptor in the data structure.

Precluding false negatives ensures that all the changed/written logical blocks are replicated to the file-level backup, such that a reliable file-level backup is achieved in snapshot-based backup systems (in other words, there may be a delay between the storing and copying steps).

Since the solution merely relies on snapshots, it applies to storage systems supporting a Diff API describing the difference between snapshots as well as to storage systems supporting only snapshots without Diff API.

A (data) backup as used herein may refer to (a process of making) a copy of computer data stored at a remote location so that the original computer data may be restored after a data loss event, such as data deletion or corruption.

A file-level backup as used herein may refer to making an incremental backup on a per-file basis.

An incremental backup as used herein may refer to making an backup that includes only the data that has changed since the preceding backup.

Logical blocks as used herein may refer to a linear addressing of data stored on a computer storage device, according to which the logical blocks are located by an integer index starting from zero. As such, each file comprises an ordered sequence of logical blocks being addressed by respective unordered numbers (indices). In a possible implementation form, the method may further comprise adding, to the file-level backup, a file of the file system of the first storage that is not yet included in the file-level backup maintained at the second storage.

This ensures a most efficient handling of new files, i.e., copying them as a whole.

In a possible implementation form, the method may further comprise deleting, from the filelevel backup maintained at the second storage, a file that is not anymore included in the file system of the first storage.

This ensures a most efficient handling of deleted files, i.e., merely sending a delete command.

In a possible implementation form, the method may further comprise emptying the data structure.

In a possible implementation form, the descriptor may comprise a unique identifier of the file being subject to the write operation; and a number (integer index) of a logical block of the file being subject to the write operation.

In a possible implementation form, the unique identifier of the file may comprise one of: a file path of the file being subject to the write operation; and a serial number of the file being subject to the write operation.

In a possible implementation form, the number of the logical block of the file may depend on a block size of the file system.

In a possible implementation form, the data structure may comprise a Bloom filter, comprising an array of m bits; and k different hash functions, being configured to generate respective bit positions associated with the descriptor within the array in accordance with a uniform random distribution. The Bloom filter implies that

- the time needed either to add a write operation or to check whether a write operation occurs in the data structure is a fixed constant, O(k), being completely independent of the quantity of write operations already inserted in the data structure, and a potentially huge number of changes in the file system may be tracked in a constant-size data structure, at the expense of a few false positives (i.e., transferring logical blocks which have not been changed).

In a possible implementation form, the quantity m may be proportional to the quantity k and to an expected total quantity n of write operations:

This relation minimizes the false positive probability e.

In a possible implementation form, the quantity k may comprise a constant which depends on a given false error rate e.

This relation minimizes the number of bits m per n.

In a possible implementation form, storing the descriptor of the write operation in the data structure may comprise: generating k bit positions in accordance with the k different hash functions and the descriptor; and setting the bits of the array corresponding to the k generated bit positions.

In a possible implementation form, copying the logical block of the file to the file-level backup maintained at the second storage in accordance with the occurrence of the descriptor in the data structure may comprise: encoding a descriptor of a potential write operation to the logical block of the file; testing the occurrence of the descriptor in the data structure using the descriptor of the potential write operation; and transferring the logical block of the file to the file-level backup maintained at the second storage in accordance with the tested occurrence.

In a possible implementation form, encoding the descriptor of the potential write operation to the logical block of the file may comprise: encoding a descriptor of a respective potential write operation to each logical block of the file. This ensures a comprehensive testing of logical blocks of a file (or of a data storage device) for occurrence of their descriptors 5 in the data structure 21 at minor computational complexity.

In a possible implementation form, the file-level backup may comprise a snapshot-based incremental backup; and the method may apply between consecutive snapshots of the snapshotbased incremental backup.

In a possible implementation form, the file-level backup may comprise a real-time incremental backup; and the method may apply during delay periods of the real-time incremental backup.

Since the solution merely relies on snapshots, it applies in individual snapshots of snapshotbased incremental backup as well as in response to unexpected issues preventing immediate replication in real-time incremental backup.

According to a second aspect, a system is provided for making a file-level backup from a first storage to a second storage. The system comprises a data structure and a processing device. The processing device is configured to intercept a write operation to a file of a file system of the first storage; and to store a descriptor of the write operation in the data structure. The data structure is preventive of false negative testing of an occurrence of the descriptor in the data structure. The processing device is further configured to copy a logical block of the file to the file-level backup maintained at the second storage in accordance with the occurrence of the descriptor in the data structure.

In a possible implementation form, the system may be configured to perform the method of the first aspect or any of its implementations.

According to a third aspect, a computer program is provided, comprising a program code for performing the method of the first aspect or any of its implementations when executed on a computer.

BRIEF DESCRIPTION OF DRAWINGS

The above-described aspects and implementations will now be explained with reference to the accompanying drawings, in which the same or similar reference numerals designate the same or similar elements. The drawings are to be regarded as being schematic representations, and elements illustrated in the drawings are not necessarily shown to scale. Rather, the various elements are represented such that their function and general purpose become apparent to those skilled in the art.

FIG. 1 illustrates a system in accordance with the present disclosure for making a file-level backup from a first storage to a second storage;

FIG. 2 illustrates a descriptor in accordance with the present disclosure of a (potential) write operation to a file of a file system of the first storage; and

FIG. 3 illustrates a method in accordance with the present disclosure for making a file-level backup from a first storage to a second storage.

DETAILED DESCRIPTIONS OF DRAWINGS

In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and which show, by way of illustration, specific aspects of embodiments of the invention or specific aspects in which embodiments of the present invention may be used. It is understood that embodiments of the invention may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.

For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding apparatus or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.

FIG. 1 illustrates a system 2 in accordance with the present disclosure for making a file-level backup from a first storage 3 to a second storage 4.

The system 2 comprises a data structure 21 and a processing device 22, such as a microprocessor, a field-programmable gate array (FPGA), a single-board computer (SBC), a plug-in processing card of a server computer, and the like.

The system 2, and more specifically the processing device 22, may be connected to the first storage 3 and to the second storage 4 by means of local/bus connectivity and/or remote/network connectivity, for example.

As will be explained in detail in connection with FIG. 3 below, the system 2 may be configured to perform a method 1 in accordance with the present disclosure for making a file-level backup from a first storage 3 to a second storage 4. A driver, i.e., a computer program comprising a program code for performing the method 1 when executed on a computer represents a typical implementation of the method 1.

In terms of device features, the processing device 22 is configured to intercept I l a write operation to a file of a file system of the first storage 3; to store 12 a descriptor 5 of the write operation in the data structure 21; and to copy 13 a logical block of the file to the file-level backup maintained at the second storage 4 in accordance with the occurrence of the descriptor 5 in the data structure 21.

The data structure 21 is preventive of false negative testing of an occurrence of the descriptor 5 in the data structure 21, and thereby ensures that all the changed/written logical blocks are replicated to the file-level backup, such that a reliable file-level backup is achieved in snapshotbased backup systems.

FIG. 2 illustrates a descriptor 5, 5’ in accordance with the present disclosure of a (potential) write operation to a file of a file system of the first storage 3. The descriptor 5 may comprise a unique identifier 51 of the file being subject to the write operation.

For example, the unique identifier 51 of the file may comprise one of a file path of the file being subject to the write operation; and a serial number, such as an inode number with or without a corresponding indoe generation number, of the file being subject to the write operation.

The descriptor 5 may further comprise a number 52 of a logical block (“block number”) of the file being subject to the write operation.

For example, N logical blocks may be represented by logical block numbers 0 to N-l.

The number 52 of the logical block of the file may depend on a block size of the file system.

For example, if the block size is 4 kB, and the write operation is to file a. txt in offset 4 kB + 100, size of 200 bytes, the descriptor 5 may comprise a unique identifier 51 (file path) of “a txt ” and a number 52 of 1 (the write operation affected logical block 1).

The block size is not limited to 4 kB, it can be any other size (e.g. 4 kB, 8 kB, 64 kB, 1 MB, and the like).

As indicated by the dashed box in FIG. 2, more than one number 52 corresponding to more than one logical block may form part of the descriptor 5.

FIG. 3 illustrates a method 1 in accordance with the present disclosure for making a file-level backup from a first storage 3 to a second storage 4.

A typical implementation of the method 1 may be a driver, i.e., a computer program comprising a program code for performing the method 1 when executed on a computer.

Initially, the method 1 comprises intercepting I l a write operation to a file of a file system of the first storage 3. However, the write operation is carried out despite this interception. The method 1 further comprises storing 12 a descriptor 5 of the write operation in a data structure 21 being preventive of false negative testing of an occurrence of the descriptor 5 in the data structure 21.

As such, false positive matches are possible, but false negatives are not - in other words, a test whether a particular descriptor 5 has already been written into the data structure 21 returns either "possibly in data structure" or "definitely not in data structure".

Exemplary data structures 21 having this characteristic may comprise a hash map/hash table, i.e., an associative array or dictionary relating keys to values, and a Bloom filter - the latter also being beneficial for reasons of time and memory efficiency.

The Bloom filter may comprise an array of m bits, initially all set to zero; and k different/independent hash functions, being configured to generate respective bit positions associated with the descriptor 5 within the array in accordance with a uniform random distribution.

For example, the k different hash functions may comprise non-cryptographic hash function such as MurmurHash or Jenkins hash functions.

To add a descriptor 5 of a write operation to the Bloom filter, the descriptor 5 may be fed to each of the k hash functions to obtain k array positions, and the bits at all these positions may be set to a logical 1.

To test for a descriptor 5 of a write operation (i.e., whether it has already been added to the Bloom filter), the descriptor 5 may again be fed to each of the k hash functions to get k array positions. If any of the bits at these positions corresponds to a logical 0, the descriptor 5 is definitely not in the set; if it were, then all the bits at these positions would have been set to logical 1 when it was added. If all bits correspond to a logical 1, then either the descriptor 5 has been added previously, or the bits at these positions have by chance been set to logical 1 during addition of other descriptors 5, resulting in a false positive match.

Regarding the aforementioned time and memory efficiency, using a Bloom filter as the data structure 21 implies that the time needed either to add a write operation or to test whether a write operation has been added to the data structure is a fixed constant, O(k being completely independent of the quantity of write operations already inserted in the data structure, and further implies that a potentially huge number of changes (write operations) in the file system may be tracked in a constant-size data structure, at the expense of a few false positives (i.e., transferring / backing up logical blocks which have not been changed).

Given the size m of the bit array, the quantity k of hash functions and an expected total quantity n of write operations, it can be proven that the false positive probability e is at most:

The quantity m may be proportional to the quantity k and to an expected total quantity n of write operations: m k = — In 2 n

Applying this dimensioning relation minimizes the false positive probability e.

The quantity k may comprise a constant which depends on a given false error rate E:

Applying this dimensioning relation (in addition to the minimized false positive probability E) minimizes the number of bits m per n.

The storing 12 step may comprise: generating 121 & bit positions in accordance with the k different hash functions and the descriptor 5; and setting 122 the bits of the array corresponding to the k generated bit positions.

The system 2 is snapshot-based - in other words, there may be a delay between the storing and copying steps, being indicated in FIG. 3 by a lapse of time symbol between steps 12 and 13. For example, the method 1 may apply between consecutive snapshots of an incremental backup, or during delay periods of a real-time incremental backup.

After the lapse of time, the method 1 further comprises copying 13 a logical block of the file to the file-level backup maintained at the second storage 4 in accordance with the occurrence of the descriptor 5 in the data structure 21.

The copying 13 step may comprise: encoding 131 a descriptor 5’ of a potential write operation to the logical block of the file.

Encoding 131 the descriptor of the potential write operation to the logical block of the file may comprise: encoding a descriptor 5’ of a respective potential write operation to each logical block of the file.

That is to say, the system 2 performs the following pseudo-code:

Num_bl ocks = Fil e_Si ze / Bl ock_Si ze

Num_bl ocks += Fil e_Si ze % Bl ock_Si ze : 1 ? 0

For 1=0 to Num_bl ocks :

If ( testing ( fil ename, i ) == true)

Send bl ock 1 to the target

Note that the test applies only for files that have been changed since the last snapshot, not for all files of the first storage 3. This approach enables comprehensive testing of all the logical blocks of a changed file (or of all the changed files of a data storage device) for occurrence of their descriptors 5 in the data structure 21 at modest computational complexity.

The copying 13 step may further comprise: testing 132 the occurrence of the descriptor 5 in the data structure 21 using the descriptor 5’ of the potential write operation.

The copying 13 step may further comprise: transferring 133 the logical block of the file to the file-level backup maintained at the second storage 4 in accordance with the tested occurrence. That is to say, if the descriptor 5’ of the potential write operation occurred in the data structure 21, this means that the descriptor 5 of the write operation to the logical block of the file has previously been added to the data structure 21, and that the logical block must be copied 13 to the file-level backup maintained at the second storage 4.

Given direct access between the first storage 3 and the second storage 4, the transferring 133 step may alternatively include instructing the first storage 3 to provide the logical block to the second storage 4 directly, or instructing the second storage 4 to retrieve the logical block from the first storage 3 directly.

The method may further comprise adding 14, to the file-level backup, a file of the file system of the first storage 3 that is not yet included in the file-level backup maintained at the second storage 4. This particularly applies to new files of the file system of the first storage 3.

The method may further comprise deleting 15, from the file-level backup maintained at the second storage 4, a file that is not anymore included in the file system of the first storage 3. This particularly applies to deleted files of the file system of the first storage 3.

The method may further comprise emptying 16 the data structure 21, for example by setting all the bits of the array of m bits of the data structure 21 to logical 0 so as to prepare the data structure 21 for a next snapshot.

The present disclosure has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed matter, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.