Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND DEVICE FOR BACKING UP FILE SYSTEM
Document Type and Number:
WIPO Patent Application WO/2022/135690
Kind Code:
A1
Abstract:
The present disclosure relates to a computer-implemented method and apparatus for backing up a file system. The method comprising receiving, by a processor, a user input providing an early backup time window, where the early backup window is prior to a scheduled daily back up time for the file system, identifying, by the processor, one or more files in the file system that are suitable for early back up, initiating, by the processor, an early back up for the one or more determined files during the early backup time window and initiating, by the processor, a scheduled daily back up at the scheduled daily back up time for a plurality of remaining files in the file system.

Inventors:
YEGER ASAF (DE)
KUVENT AVIV (DE)
NATANZON ASSAF (DE)
MOR YARON (DE)
Application Number:
PCT/EP2020/087598
Publication Date:
June 30, 2022
Filing Date:
December 22, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
YEGER ASAF (DE)
International Classes:
G06F16/11; G06F11/14
Foreign References:
US9336250B12016-05-10
US9588847B12017-03-07
Attorney, Agent or Firm:
KREUZ, Georg (DE)
Download PDF:
Claims:
CLAIMS

1. A computer-implemented method (100) for backing up a file system (202, 710, 814, 914); the method (100) comprising: receiving, by a processor (812), a user input providing an early backup time window, where the early backup window is prior to a scheduled daily back up time for the file system (202, 710, 814, 914); identifying, by the processor (812), one or more files in the file system (202, 710, 814, 914) that are suitable for early back up; initiating, by the processor (812), an early back up for the one or more determined files during the early backup time window; and initiating, by the processor (812), a scheduled daily back up at the scheduled daily back up time for a plurality of remaining files in the file system (202, 710, 814, 914).

2. The method (100) of claim 1, wherein identifying includes analyzing file metadata to identify one or more files that have been edited since a preceding scheduled daily back-up time.

3. The method (100) of claim 2, further comprising receiving, by the processor (812), the file metadata from an agent installed on the file system (202, 710, 814, 914), in response to a file edit event.

4. The method (100) of claim 2, further comprising sending, by the processor (812), queries to the file system (202, 710, 814, 914) at a predetermined time interval and receiving the file metadata in response.

5. The method (100) of any one of claims 2 to 4, wherein identifying includes determining a suitability for each file based on file native metadata, wherein the file native metadata comprises one or more of a file name, file size, file user permissions, file group permissions, creation time, last accessed time, last modification time and file type.

6. The method (100) of claim 5, wherein the file type indicates that edits to the file are append-only.

7. The method (100) of any one of claims 2 to 6, wherein identifying includes determining an expected last access time by analyzing file metadata collected over time.

29

8. The method (100) of any preceding claim, wherein the plurality of remaining files include those files not backed up during the early back up window.

9. The method (100) of any preceding claim, wherein the scheduled daily back up includes checking, by the processor (112), whether a file that was backed up during the early back up window has been edited further.

10. The method (100) of any preceding claim, wherein the processor (812) is part of a file system server (810) and initiating a back up includes sending the files to a backup server (204, 820).

11. The method (100) of any one of claims 1 to 9, wherein the processor (812) is part of a backup server (204, 830) and initiating a back up includes requesting the files from the file system server (810).

12. A computer-readable medium configured to store instructions which, when executed by a processor (812), cause the processor (812) to perform the method (100) of any preceding claim.

13. An apparatus (700, 912) for controlling back up of a file system (202, 710, 814, 914), comprising: an interface (702) configured to receive a user input providing an early backup time window, where the early backup window is prior to a scheduled daily back up time for the file system (202, 710, 814, 914); a file identification module (704) configured to identify one or more files in the file system (202, 710, 814, 914) that are suitable for early back up; and a file back up module (706) configured to: initiate an early back up for the one or more determined files during the early backup time window; and initiate a scheduled daily back up at the scheduled daily back up time for a plurality of remaining files in the file system (202, 710, 814, 914).

14. The apparatus (700, 912) of claim 13, wherein the file identification module (704) is further configured to analyze file metadata to identify one or more files that have been edited since a preceding scheduled daily back-up time.

30

15. The apparatus (700, 912) of claim 14, wherein the file identification module (704) is further configured to receive the file metadata from an agent installed on the file system (202, 710, 814, 914), in response to a file edit event.

16. The apparatus (700, 912) of claim 14, wherein the file identification module (704) is further configured to send queries to the file system (202, 710, 814, 914) at a predetermined time interval and receive the file metadata in response.

17. The apparatus (700, 912) of any one of claims 14 to 16, wherein the file identification module (704) is further configured to determine a suitability for each file based on file native metadata, wherein the file native metadata comprises one or more of a file name, file size, file user permissions, file group permissions, creation time, last accessed time, last modification time and file type.

18. The apparatus (700, 912) of claim 17, wherein the file type indicates that edits to the file are append-only.

19. The apparatus (700, 912) of any one of claims 14 to 18, wherein the file identification module (704) is further configured to determine an expected last access time by analyzing file metadata collected over time.

20. The apparatus (700, 912) of any one of claims 13 to 19, wherein the plurality of remaining files include those files not backed up during the early back up window.

21. The apparatus (700, 912) of any one of claims 13 to 20, wherein the scheduled daily back up includes checking whether a file that was backed up during the early back up window has been edited further.

22. The apparatus (700, 912) of any one of claims 13 to 21, wherein the apparatus (912) is part of a file system server (910) and initiating a back up includes sending the files to a backup server (920).

23. The apparatus (700, 912) of any one of claims 13 to 21, wherein the apparatus (912) is part of a backup server (930) and initiating a back up includes requesting the files from the file system server (910).

Description:
METHOD AND DEVICE FOR BACKING UP FILE SYSTEM

TECHNICAL FIELD

The present disclosure relates generally to the field of data protection and backup; and more specifically, to methods and apparatus for backing up a file system.

BACKGROUND

In this data-driven world, a vast amount of data is being generated and stored globally, on a regular basis. For example, social networks, internet of things, scientific experiments, commercial services, industrial services, banking services, business interactions, and the like, play a vital role in generating said data. In such instance, data backup becomes very important to ensure data protection as the data can easily get lost or get corrupted on account of disruptive events such as, system failures, power hazards, cyberattacks, natural disasters, communication failures, and the like. Presently, various data protection techniques are being employed for the data backup.

However, existing data protection techniques are associated with several limitations. Firstly, the existing data protection techniques generally do not allow the backup to defined by a user, such as during times of less activity. Secondly, existing data protection techniques provide only a limited number of points in time for the data backup associated with any data, such as a file. For example, in a snapshot-based data protection technique, snapshots (i.e. backup data images) are generated temporarily (i.e. eventually needs to be deleted), and since such snapshots consume considerable space they are generated in periodic manner, i.e. in a certain time interval. This results in creating the limited number of points in time available for the data backup associated with the file. Moreover, generating said snapshots are expensive, and their generation and deletion require considerable computational resources. The snapshot-based data protection technique requires a huge backup time when difference between two consecutive snapshots is considerable, for example between 15 minutes to several hours. This also leads to a considerable recovery point objective. In the snapshot-based data protection technique, when snapshots are mounted on an array to be read by a backup server, the snapshots reduce bandwidth provided by the array to production workloads. Moreover, the existing data protection techniques requires considerable bandwidth, are sensitive to bandwidth fluctuations, face difficulty in implementing data reduction (such as, a compression and a wide-area network (WAN) deduplication) using advanced algorithms, prone to significant latency and so forth. Thirdly, the existing data protection techniques require considerable computational resources and time in parsing a file system for changes in files of the file system required for data backup, consequently a backup window.

Typically, existing data utilization techniques are employed to optimize the amount of data to be backed up during a backup window, for example by performing host-side deduplication of the data in order to reduce the network utilization and bandwidth. Additionally, in case of blocks of data, some data utilization techniques aim to backup “cold” blocks (such as the cold storage blocks) prior to the scheduled backup window. However, when performing this at block-level, the solution cannot utilize properties that exist at the file-level to identify which data changes should be backed up at any point in time. Moreover, existing data utilization techniques further comprises data replication techniques. The data replication is done to a secondary location via either continuous or snapshot-based technique, specifically, continuous replication mirrors every input-output (I/O) to a remote server (such as a backup server), that maintains the latest states of the data, although at a larger network bandwidth cost.

Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with existing data protection or backup techniques.

SUMMARY

The present disclosure seeks to provide improved method, apparatus and a system for backing up a file system. The present disclosure seeks to provide a solution to the existing problems of limited number of points in time available for data backup, requirement of considerable computational resources for data backup, and limitation to a file system level data protection. An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in prior art and provide improved method and system for continuous data protection by employing continuous cataloguing, which provides data backup for required number of points in time, with minimal computational resource and to a block level data protection.

The object of the present disclosure is achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.

In an aspect, the present disclosure provides a computer-implemented method for backing up a file system. The method comprising: receiving, by a processor, a user input providing an early backup time window, where the early backup window is prior to a scheduled daily back up time for the file system, identifying, by the processor, one or more files in the file system that are suitable for early back up, initiating, by the processor, an early back up for the one or more determined files during the early backup time window; and initiating, by the processor, a scheduled daily back up at the scheduled daily back up time for a plurality of remaining files in the file system.

The method enables a user to perform or execute data backup (namely, restore or recover) for a data, such as a file, to any required point in time by providing an input relating to a user- defined backup window to minimize the implementation of backup during times of higher usage. This allows the user to back up the data, at more than one instance, such as at the end of day and a preferred time window. The processor may continuously record the log of meta data operations made to the files in the file system, such as in the form of a continuous catalog. In such instance, the processor maintains up-to-date information about the state of any file at a specific point in time, i.e. when any file is subjected to any operation or change. Typically, the processor recovers the file to a point in time, the file has the state, by searching the state of the file at that point in time in the continuous catalog. In this regard, the method facilitates a near zero recovery point objective. In this way, the method provides data security in case of loss of data on account of any disruptive events. The method enables minimal computational resources and time in parsing the file system for changes in files of the file system required for the data backup. The method is well suited for both a file system level data protection and a block level data protection. The method can be suitably implemented with a conventional computing system without making changes to architecture of the computing system.

In an implementation form, wherein identifying includes analyzing file metadata to identify one or more files that have been edited since a preceding scheduled daily back-up time. By virtue of analyzing file meta data to identify one or more files that have been edited, i.e. the files that are subjected or exposed to any operation and the operations made to such files, necessary information required for the backup of the files are precisely identified and maintained. This facilitates perform backup based on the file metadata, which in turn lessens the computational burden.

In another implementation form, the method further comprises receiving, by the processor, the file metadata from an agent installed on the file system, in response to a file edit event.

The method allows backing up of the files in the file system by virtue of the file edit event tracked by the agent installed on the file system and allowing a continuous protection with a finer-granularity, but requires dealing with cases of inconsistencies in the files, and discarding such inconsistent files.

In yet another implementation form, the method further comprising sending, by the processor, queries to the file system at a predetermined time interval and receiving the file metadata in response.

The method enables facilitating the data backup by virtue of sending or employing queries to ensure an efficient and query free operation.

In an implementation form, the method further comprises identifying includes determining a suitability for each file based on file native metadata, wherein the file native metadata comprises one or more of a file name, file size, file user permissions, file group permissions, creation time, last accessed time, last modification time and file type.

In operation, the method enables in determination of suitability for each file having the state to recover the file to the state. This allows a user to reliably recover (namely, restore) a required file having a specific state defined by the file native metadata. In other words, the method of the present disclosure provides improved searchability of files which are subjected to changes, and thereafter recovery of such files.

In another implementation form, the file type indicates that edits to the file are append-only.

In an implementation form, the method further comprising, identifying includes determining an expected last access time by analyzing file metadata collected over time. The determination of the expected last access time by analysis of the file metadata collected over time enables identification of the files suitable for backup and reduce the amount of data or files to be backed up and the time taken during backup at the scheduled daily backup time.

In an implementation form, the plurality of remaining files include those files not backed up during the early back up window.

The method allows the backup of all the files remaining files in the file system, not currently backed up by view of a latest state of the files at the scheduled daily backup time. The backup of the remaining files at the end reduces the time taken for parsing the files and consequently reduces the chances of an incomplete backup due to shortage of time.

In an implementation form, the scheduled daily back up includes checking, by the processor, whether a file that was backed up during the early back up window has been edited further.

This checking of the file enables the method to exclude the files that have already been backed up and reduces the associated computational effort and time.

In an implementation form, the processor is part of a file system server and initiating a back up includes sending the files to a backup server.

This implementation enables remote replication of data (by a computing device employing the computer implemented method) for continuous data protection as the primary storage i.e. the file system server being located onsite, and the secondary storage i.e. the backup server being located offsite (as a remote storage).

In another implementation form, the processor is part of a backup server and initiating a back up includes requesting the files from the file system.

This increases the versatility of the method by allowing the backup to be initiated by the processor of the backup server instead of the conventional way, i.e. via the processor of the file system server.

In another aspect, the present disclosure provides an apparatus for controlling back up of a file system, comprising: an interface configured to receive a user input providing an early backup time window, where the early backup window is prior to a scheduled daily back up time for the file system, a file identification module configured to identify one or more files in the file system that are suitable for early back up and a file back up module configured to: initiate an early back up for the one or more determined files during the early backup time window and initiate a scheduled daily back up at the scheduled daily back up time for a plurality of remaining files in the file system.

The apparatus enables a user to perform or execute data backup (namely, restore or recover) for a data, such as a file, to any required point in time by providing an input relating to a user- defined backup window to minimize the implementation of backup during times of higher usage. This allows the user to back up the data, at more than one instance, such as at the end of day and a preferred time window. The apparatus may continuously record the log of meta data operations made to the files in the file system, such as in the form of a continuous catalog. In such instance, the apparatus maintains up-to-date information about the state of any file at a specific point in time, i.e. when any file is subjected to any operation or change. Typically, the apparatus recovers the file to a point in time, the file has the state, by searching the state of the file at that point in time in the continuous catalog. In this regard, the apparatus facilitates a near zero recovery point objective. In this way, the apparatus provides data security in case of loss of data on account of any disruptive events. The apparatus enables minimal computational resources and time in parsing the file system for changes in files of the file system required for the data backup. The apparatus is well suited for both a file system level data protection and a block level data protection. The apparatus can be suitably implemented with a conventional computing system without making changes to architecture of the computing system.

The apparatus of this aspect achieves all the advantages and effects of the method.

In an implementation form, the file identification module is further configured to analyze file metadata to identify one or more files that have been edited since a preceding scheduled daily back-up time.

In another implementation form, the file identification module is further configured to receive the file metadata from an agent installed on the file system, in response to a file edit event.

In yet another implementation form, the file identification module is further configured to send queries to the file system at a predetermined time interval and receive the file metadata in response. In an implementation form, the file identification module is further configured to determine a suitability for each file based on file native metadata, wherein the file native metadata comprises one or more of a file name, file size, file user permissions, file group permissions, creation time, last accessed time, last modification time and file type.

In another implementation form, the file type indicates that edits to the file are append-only.

In an implementation form, the file identification module is further configured to determine an expected last access time by analyzing file metadata collected over time.

In an implementation form, the plurality of remaining files include those files not backed up during the early back up window.

In an implementation form, the scheduled daily back up includes checking whether a file that was backed up during the early back up window has been edited further.

In an implementation form, the apparatus is part of a file system server and initiating a back up includes sending the files to a backup server.

In another implementation form, the apparatus is part of the backup server and initiating a back up includes requesting the files from the file system server. Various implementation forms of the system achieve all the advantages and effects of the corresponding implementation forms of the method.

It has to be noted that all devices, elements, circuitry, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.

Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.

Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:

FIG. 1 illustrates a flowchart of a computer implemented method for backing up a file system, in accordance with an embodiment of the present disclosure;

FIGs. 2-6 illustrates an exemplary implementation scenario of a method for backing up a file system, in accordance with an embodiment of the present disclosure;

FIG. 7 is a block diagram of an apparatus for controlling back up of a file system, in accordance with an embodiment of the present disclosure;

FIGs. 8A illustrates a block diagram of a system for backing up a file system, in accordance with an embodiment of the present disclosure;

FIGs. 8B illustrates a block diagram of a system for backing up a file system, in accordance with another embodiment of the present disclosure;

FIG. 9A illustrates a block diagram of a system for controlling back up of a file system, in accordance with an embodiment of the present disclosure; and FIG. 9B illustrates a block diagram of a system for controlling back up of a file system, in accordance with another embodiment of the present disclosure.

In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the nonunderlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.

DETAILED DESCRIPTION OF EMBODIMENTS

The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.

Referring to FIG. 1, illustrated is a flowchart of a computer implemented method 100 for backing up a file system, in accordance with an embodiment of the present disclosure. As shown, the method 100 includes steps 102, 104, 106 and 108.

Throughout the present disclosure, the term "file system" refers to a data structure (or a process) implemented for accessing, organizing and storing files (or data) on a computing device, specifically, in the memory of the computing device. Examples of the file system may include, but are not limited to, a File Allocation Table (FAT) file system, a New Technology File System (NTFS), a Hierarchical File System (HFS), a Unix File System (UFS), a Virtual Machine File System (VMFS), an Extended (EXT) file system. Further, the term "file" refers to a resource for storing information (or data) in the computing system. Notably, the term file may be used interchangeably with the term “data” without limiting the scope of the disclosure. The file may be configured to store an image, a text, a video, an executable program, and the like. Typically, the file system comprises of a plurality of files and the file system is part of a primary memory of any computing system such as the hard-disk drive (HDD) of a computing device. Furthermore, the term "backing up" or “data back up” refers to the reserve or backup of data or files that may be recovered in the event of a primary memory data failure. Generally, primary data failures are a result of hardware or software failure, data corruption, or a human- caused event, such as a malicious attack (virus or malware), or accidental deletion of data. To avoid such an issue, the backup of the files on the file system is performed at a secondary memory such as a remote server, such that the file system may be accessed from the secondary memory when required.

The term "primary memory" refers to a memory (a volatile memory and/or a non-volatile memory) that is directly accessible associated with the computing system to handle a live data (for example, a file in the file system upon which operation(s) is/are being carried out). The primary storage may also be referred to as "main memory" or "internal memory" or "primary volume" . Examples of the primary storage may include, but are not limited to, a hard disk or storage array of the computing system. Typically, the computing system mentioned herein referred to as a host computing device includes at least one of a storage array, hard disk drive (HDD), solid state drive (SSD), a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a cache memory, a static random access memory (SRAM).

The term "secondary storage" refers to a non-volatile memory that is directly or indirectly accessed by the computing system or the processor. The secondary storage is a storage array that is used for storing files for backup. The secondary storage may be on-site or off-site with respect to the computing system. In an embodiment, the secondary storage is implemented as a remote storage. The term "remote storage" refers to an off-site storage that is physically distant from the computing system. In an example, the remote storage may be a hard disk of a remote computing device, or a cloud-based storage, or a remote server. In another example, the primary storage and the secondary storage may be referred to the two different storage sections of the same hard disk (for example, in the on-site scenario), amongst which the primary storage typically deals with the live data, whereas the secondary storage deals with stored or backup data. Examples of the secondary storage may include, but are not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, a portable computer diskette, a hard disk, a memory stick, an arrangement of servers (for example, cloud servers).

Throughout the present disclosure, the term " computing system" or “ computing device" refers to hardware, software, firmware or a combination of these configured to perform at least one computing task based upon input from the user. Examples of the computing system may include, but are not limited to, a computer, a virtual machine (VM). Generally, the computing system or the computing device includes computational elements such as a memory, a processor, a data communication interface, a network adapter and the like, to store, process and/or share files or information with other computing devices, such as another computation device, or server and the like.

Throughout the present disclosure, the term "processor" refers to a computational element that is operable to respond to and process instructions to perform the data backup operations. In an example, the processor may be a microprocessor, a microcontroller, a complex instruction set computing (CISC) microprocessor, a reduced instruction set (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, or any other type of processing circuit, for example as aforementioned. Notably, the processor may be operated individually or as a part of a computation system. Herein, the processor is configured to perform a plurality of operations according to the method 100 to back up the file system.

Existing data utilization techniques are employed to optimize the amount of data to be backed up during a backup window, for example by performing host-side deduplication of the data in order to reduce the network utilization and bandwidth. Additionally, in case of blocks of data, some data utilization techniques aim to backup “cold” blocks (such as the cold storage blocks) prior to the scheduled backup window. However, when performing this at block-level, the solution cannot utilize properties that exist at the file-level to identify which data changes should be backed up at any point in time. Moreover, existing data utilization techniques further comprises data replication techniques. The data replication is done to a secondary location via either continuous or snapshot-based technique. Typically, continuous replication mirrors every input-output (I/O) to a remote server (such as a backup server), that maintains the latest states of the data, although at a larger network bandwidth cost. Further, snapshot replication technique comprises periodic transmission of a snapshot of the file system, such that lower network bandwidth requirements are achieved, since only the latest state or update of the data is transferred and as a result, repeated and redundant transmissions are avoided. Moreover, data utilization technique further comprises hybrid replication of data, wherein both continuous replication and snapshot-based replication is employed at specific time windows or data in accordance with the implementation. In an example, in operation, hybrid replication technique selects data regions having a high probability of change is to be protected via snapshot technique, while data regions with lesser probability of change are protected via continuous replication. Moreover, existing data protection techniques offer a limited number of points in time for backup (namely, recovery points). In such a case, when a data loss occurs on account of a disruptive event, the data backup is available only up to last available recovery point. It will be appreciated that the method 100 for data backup allows a user to restore (or recover) the data from a required or specific point in time (i.e. a point in time when a last change or a corresponding last operation was made to the data), unlike the existing data protection techniques that offered restoration to a limited point in time. The method 100 also allows the user to perform the backup in a more efficient and faster manner.

At step 102, the method 100 comprises receiving, by a processor, a user input providing an early backup time window, where the early backup window is prior to a scheduled daily back up time for the file system. In other words, at step 102, the processor receives the user input providing the early backup time window. The term “user input” refers to any information or data provided by a user for further processing. Generally, the user input includes text, numbers for indicating a user requirement. In an example, the user provides the user input to provide an early backup time wherein the user is minimally or not at all, employing the computing device. In another example, the user provides the user input to provide the frequency, and/or duration of the early backup time window. The term “time window” refers to an interval of time during which an operation or activity (such as data backup) may take place. The time window may also be referred to as a “back up time window” or “backup window” throughout the present disclosure. Herein, the “early backup time window” refers to a user-defined time window in which an early data backup may take place. The early data backup corresponding to the early backup time window takes place prior the scheduled data backup corresponding to the scheduled daily backup time window. The “scheduled daily back up time” refers to a predefined point of time wherein a daily data backup takes place. The daily data backup refers to the comprehensive data back up of the file system taking place at a specified time daily. Optionally, the scheduled daily back up time is set based on the activity by the user i.e. the scheduled daily back up is implemented at times of minimal activity. The user input includes information relating to the early backup time window, wherein the early backup time window relates to the event of data backup. The user input includes at least one of a start time, end time of time window, duration of time window, number and duration of one or more time windows and so forth. Typically, the early backup time window is defined by the user input received by the processor, wherein the user input may define the time window in a linear or non-linear manner. In an example, the start time of the early backup window is defined such as, 2 AM or 4PM. In another example, duration of the early backup time window is defined, wherein the duration may comprise of 30 seconds, 1 minute, 2 mins, 3 mins, 5 mins, 10 mins, 30 mins, 1 hour, 2 hours and so forth. In an exemplary scenario, the scheduled daily back up time is set at a time T1 at 2 AM, wherein the early backup time window is set at an earlier time than the scheduled daily back up time. Notably, the method 100 may one or more early backup windows. In an example, the one or more time windows are defined or set in a linear and discrete manner, such as in equal duration of time intervals of 30 minutes. In another example, the one or more time windows are defined or set in a non-linear manner, such as one early backup time window of 1 hour duration and another early backup time window of 2 hours duration. Notably, the early backup time window is defined by the user, whereas the scheduled daily back up time may or may not be defined by the user.

At a step 104, the method 100 comprises identifying, by the processor, one or more files in the file system that are suitable for early back up. The method 100 comprises identifying using the processor, one or more files suitable for early backup. Typically, the processor is configured to determine the state of the files in the file system, whether a change has occurred or not and accordingly identify the files that are suitable for early backup. The file system comprises of a plurality of files, wherein the plurality of files may or may not change prior to the scheduled backup time. Typically, the early back up is implemented to backup one or more files from the file system, wherein a change is less likely to occur. The early backup reduces the amount of data or files to be backed up at the scheduled daily backup time and ensure a smooth backup operation. The processor determines the probability of the change occurring in each file before the schedule daily backup time to determine one or more files suitable for early backup. The probability of the change occurring the file may be determined using a machine learning algorithm while considering a plurality of parameters such as, but not limited to, past changes or history of the file, type of the file, scheduled tasks and so forth. Consequently, if the probability is below a pre-determined threshold value, for example, probability of a change occurring is 0.4 and threshold value is 0.25, in such a scenario the file is determined suitable for early backup. In another example, if the probability of change occurring is zero, the file is determined to be suitable for early backup.

In an embodiment, the method 100 comprising identifying includes analyzing file metadata to identify one or more files that have been edited since a preceding scheduled daily back-up time. The processor is configured to analyze the file metadata to identify the one or more files, wherein a change or edit has been made since the preceding scheduled daily back-up time. The term “file metadata’" refers to the information stored in any type of file in the file system. Generally, the file metadata includes, but is not limited to, author name, date created, date modified, file size, company or organization's name, identity of the computing system, identity (such as Internet protocol (IP) address, media access control (MAC) address) of the network server or storage drive (primary or secondary storage) wherein the files are stored, personalized comments and the names and times of previous authors, revisions, or versions. The “preceding scheduled daily back up time"" refers to an earlier event of the scheduled daily backup event, for example, in case the scheduled daily backup time is 2AM, then any changes made after 2AM in the one or more files of the file system are determined suitable to be backed up. In an example, the processor analyses the metadata information by comparing the snapshots taken at two or more different time windows, wherein specifically, one of the two different time windows is taken at the preceding scheduled daily back-up time and the other time window being the current time window. Alternatively stated, the processor identifies the changes in one or more files by analyzing the metadata between the two versions of the file, i.e. a snapshot taken at the preceding scheduled daily backup time and a snapshot taken at current time. Optionally, the processor is configured to identify any change in the file, such as an addition, deletion or any updates on the file that may occur after the scheduled daily backup time.

In another embodiment, the method 100 further comprises receiving, by the processor, the file metadata from an agent installed on the file system, in response to a file edit event. The processor is configured to receive the file metadata from the agent installed on the file system. The “agent"" refers to a small application installed on a server, such as the backup server, employed to perform a specific job or operation. Generally, each agent supports a specific function and consequently the method 100 may employ multiple agents installed on the file system to perform one or more operations. Herein, the agent installed on the file system is configured to transmit the metadata regarding files that are exposed to change(s) as a response the file edit event. The file edit event may include, but is not limited to, opening a file, closing a file, creating a file, deleting a file or a part of the file, rename a file, write to a file or add information to a file, flush a file, read from a file, move a file. In operation, the agent is a filter driver used by the processor to capture all input/outputs (I/Os) to a file. This allows continous protection of the files in the file system and a finer-granularity protection, since a part of the file may also be tracked by the agent but requires dealing with cases of inconsistencies in the files, and consequently discarding such inconsistent files. In yet another embodiment, the method 100 further comprising sending, by the processor, queries to the file system at a predetermined time interval and receiving the file metadata in response. The processor is configured to send or transmit queries to the file system at the predetermined time interval. The “ predetermined time interval” refers to a pre-defined time window or interval wherein the processor communicates with the file system. Typically, the processor transmits the queries to the file system to receive the file metadata as the response to the queries. The query is sent to the file system to facilitate the transmission of the file metadata from the file system as the response. For example, in case of a file system such as an Online Analytical Processing (OLAP) database or data source to facilitate business-intelligence queries and are optimized for querying and reporting, instead of processing transactions.

In an embodiment, the method 100 comprising identifying includes determining a suitability for each file based on file native metadata, wherein the file native metadata comprises one or more of a file name, file size, file user permissions, file group permissions, creation time, last accessed time, last modification time and file type. The processor is configured to analyze each file in the file system to further determine the suitability for each file. For example, a suitable file is indicative of a change or update in the corresponding file and hence presenting a requirement to be backed up. Typically, the processor determines the suitability for each file based on the file native metadata. The term “fde native metadata” refers to any objects such as models, packages, and queries that are based on any data source, such as the OLAP data source. The file native metadata comprises one or more of the file name, file size, file user permissions, file group permissions, creation time, last accessed time, last modification time and file type. The processor identifies the changes based on at least the file native metadata, and further correspondingly determines the suitability for the file. For example, a file is determined suitable for backup when a change in the file occurs after the preceding backup window (if any). Optionally, the processor identifies the files suitable for back up via analyzing the log files to infer a certain change pattern, wherein the change may be append-based and no overwrite or update is allowed and therefore changes to parts of the file can be backed up immediately upon identification by the processor. Moreover optionally, the processor determines the file suitable for backup using prior knowledge regarding past behavior of the applications generate the files. For example, if a file belongs to an application X, and the application is configured to either append-only to the file or is unlikely to change the file after certain time, then the processor may determine the file suitable for backup. Moreover optionally, the processor determines the file suitable for backup using a machine learning algorithm, wherein the backups performed and the snapshots taken are stored, accumulated and analyzed to infer a time period, wherein a change in the files is likely to occur, and wherein a change is less likely to occur. Additionally, the analysis of the files may be determined with finer granularity, i.e. further identifying the parts of the file changed at certain times. For example, a change in a first part of the file may be likely to occur during the morning for example, 10 AM, whereas a change in a second part of the file may be likely to occur in the evening, for example, 7 PM. Optionally, the processor is supplemented with additional computation methods, such as neural networks, and hierarchical clusters of pseudo-analog variable state machines implementing machine learning and artificial intelligence models and algorithms.

In another embodiment, the file type indicates that edits to the file are append-only. The processor is configured to identify the suitability of the file based on at least the file type, wherein the file type is indicative of the type of edits or changes made to the file. For example, the change or edit made to the file is append-only. In other words, only additions are made to the file in the file system and update or deletion of the files in the file system is disabled. Collectively, each file is analyzed separately and correspondingly grouped together for efficient operation. In an example, one or more files determined suitable for backup are grouped together, whereas the remaining files are group together separately to be backed up at a later time window or a later point in time.

In an embodiment, in the method 100, wherein identifying includes determining an expected last access time by analyzing file metadata collected over time. The processor is configured to determine the expected last access time for each file in a file system. Upon determining the expected last access time, the processor determines whether a change has occurred at the expected last access time to further determine the suitability of the file for back up. For example, if the expected last access time is before the scheduled daily backup time, the file may be backed up early at the early backup window defined by the user.

At step 106, the method 100 comprises initiating, by the processor, an early back up for the one or more determined files during the early backup time window. In other words, at step 106, the processor initiates the early back up for the one or more files determined suitable for the early back up during the early backup time window. The “early back up" refers to the backup operation performed for the one or more files during the early backup time window prior to the scheduled backup time window. Specifically, a copy or a snapshot of the one or more files are transmitted or stored at a remote location or a secondary memory such as a backup memory, configured to store the copy of the one or more files as a backup.

In an embodiment, the scheduled daily back up includes checking, by the processor, whether a file that was backed up during the early back up window has been edited further. The processor is configured to check prior to the scheduled daily backup time, whether the file backed up during the early backup time window has undergone any changes or updates. Typically, the processor compares the snapshots of the file backed up during the early backup time window against a current snapshot of the file. If a change or edit is inferred from the comparison, the processor is configured to back up the edited file and optionally, replace the early backup of the file.

At step 108, the method 100 comprises initiating, by the processor, a scheduled daily back up at the scheduled daily back up time for a plurality of remaining files in the file system. In other words, at step 108, upon initiating the early backup for the one or more files in the file system, the processor is configured to initiate the scheduled daily backup for the plurality of remaining files in the file system. Generally, the scheduled daily backup takes place at a fixed point in time, wherein all the remaining files are stored at a remote location. Beneficially, by initiating the early backup in the early backup window allows a reduced amount of data for backup during the scheduled backup time. As a result, the scheduled backup takes lesser time for the complete backup operation. Moreover, optionally, at the scheduled backup time, the earlier versions of the one or more files corresponding to the early backup windows, are accessible for employment if any specific version of a file is required.

In another embodiment, the plurality of remaining files include those files not backed up during the early back up window. Alternatively stated, the plurality of remaining files refer to the files in the file system that were not backed up during the early backup time window. In an example, the plurality of remaining files include the files determined by the processor, that were determined not suitable for the early backup due to a probability of change or update at a later point in time.

The present disclosure also provides a computer program product comprising a non-transitory computer-readable storage medium configured to store instructions or computer program code thereon, the instructions being executable by a processor to execute the method 100. Typically, the method 100 is for a computing device for carrying out data backup operations. Examples of implementation of the non-transitory computer-readable storage medium include, but is not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, a Secure Digital (SD) card, Solid-State Drive (SSD), a computer readable storage medium, and/or CPU cache memory. A computer readable storage medium for providing a non-transient memory may include, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.

Referring to FIGs. 2-6, illustrated is an exemplary implementation scenario of the computer implemented method 100 for backing up a file system 202 (similar to the file system), in accordance with an embodiment of the present disclosure. Typically, the method 100 relates to the backing up of one or more files in a file system 202 to a backup server 204 during an early backup time window, wherein the early backup window is provided by a user input. As shown in FIGs. 2-5, depicted are snapshots taken prior to the scheduled daily backup time, initiating at a time TO and ending at a time T8. The term “snapshot” refers to a set of reference markers for data or files in the computation system or file system 202 at a specific point in time. Generally, the snapshot comprises a detailed table of contents, providing the user with accessible copies of files that they may access at a later point of time. Beneficially, the snapshots are configured to detect a change in the files of the file system 202 and correspondingly storing a copy of the file upon meeting a certain criteria such as, but not limited to, the threshold criteria relating to the probability of changes occurring in the files. As shown in FIGs. 2-6, illustrated are the snapshots of the file system 202 and the backup server 204 at various points in time. The “backup server” 204 refers to a type of server configured to facilitate data, files, applications, and/or database backups. Typically, the backup server 204 comprises both hardware and software capabilities to manage and recover files from the backup. The backup server 204 is either a locally-based server or a remote backup server installed at a remote location such as a cloud server.

Referring to FIG.2, illustrated is a first snapshot 210 of the file system 202 and the backup server 204 during an early backup time window T, in accordance with an embodiment of the present disclosure. Typically, the file system 202 comprises of one or more files that are backed up during the early backup window T. As shown in FIG. 2, depicted is the first snapshot 210 at time TO, wherein the file system 202 comprises three files namely, a first file XI, a second file X2 and a third file X3. Notably, at a starting stage when the files are yet to be backed up in the backup server 204, the first snapshot 210 depicts only the files wherein a change is, since the backup server 204 is empty at time TO. The early backup time window T as defined by the user input may include one or more smaller time windows, such as TO, Tl, T2, T3 till T8. Typically, a time of the time window refers to the starting time of the specific time window, for example, time T4 refers to the starting time of the time window T4. However, the time T4 may also refer to the ending time of the time window T4 based on the implementation.

Referring now to FIG. 3, illustrated is a second snapshot 310 of the file system 202 and the backup server 204 during the early backup time window T, in accordance with an embodiment of the present disclosure. Typically, the file system 202 comprises of one or more files that are backed up during an early backup window T. As shown in FIG. 3, depicted is the second snapshot 210 at time TO, wherein the backup server 204 comprises a copy (or backup) of the first file XI. Typically, the backup server 204 is configured to determine the suitability of the remaining files in the file system 202 based on the properties of the files or on a machine learning algorithm or model considering the prior changes in the files of the file system 202. Specifically, the backup server 204 determines whether the second file X2 and the third file X3 in the file system 202 are likely to be changed again till the end of the early backup window T8 or prior to a scheduled daily backup time T10. In an example, the backup server 204 determines that the first file XI is unlikely to be changed till time T8 and hence determined to be suitable for early backup. Moreover, the backup server 204 determines that the second file X2 and the third file X3 are likely to be changed before the end of early backup time window at T8. As a result, the backup server 204 is configured to perform a first partial backup 320 of only the first file XI at the time TO during the early backup time window T.

Referring now to FIG. 4, illustrated is a third snapshot 410 of the file system 202 and the backup server 204 during the early backup time window T, in accordance with an embodiment of the present disclosure. As shown in FIG. 4, depicted is the third snapshot 410 at time T5, wherein the backup server 204 comprises a copy (or backup) of the first file XI. Herein, the backup server 204 continues to take snapshots after every time window till the time window T5, and at time T5, the backup server 204 determines that the second file X2 and third file X3 have changed to X2’ and X3’ respectively. Further, the backup server 204 determines that a fourth file X4 has also changed during any of the earlier time windows (such as from TO till T5) in the early backup time window T. Furthermore, the backup server 204 is configured to determine the suitability of the remaining files in the file system at time T5. In an example, the backup server 204 determines that the files X2’ and X4 are unlikely to be changed again until time T8 and are ready for early backup at time T5 and hence, as a result, the backup server 204 performs a second partial backup 420 of the changes in the files X2’ and X4 at time T5.

Referring now to FIG. 5, illustrated is a fourth snapshot 510 of the file system 202 and the backup server 204 during the early backup time window T, in accordance with an embodiment of the present disclosure. As shown in FIG. 5, depicted is the fourth snapshot 510 at time T8, wherein the backup server 204 comprises a copy (or backup) of the first file XI backed up via the first partial backup 320 and the second file X2’ and X4 via the second partial backup 420. Herein, the backup server 204 continues to take snapshots after every time window till the time window T8, and at time T8, the backup server 204 to detect any changes in the files of the file system 202. In an example, the backup server 204 determines that the third file X3’ has been changed or updated again to X3”, the fourth file X4 has changed to X4’. Notably, the determined likelihood of the change occurring in the fourth file X4 was wrong or false, since a change was detected and the file X4 was changed to X4’. Moreover, the backup server 204 determines that a fifth file X5 is also changed during any of the earlier time windows in the early backup time window T. Since, the time window T8 is the last time window of the early backup time window T, the backup server 204 performs the third partial backup 520 of the changes in the remaining files, namely the third file X3”, X4’ and X5 regardless of whether a change may occur in the remaining files or not. Beneficially, at time T8, the backup server 204 is configured to back up a lesser number of files in comparison to a conventional backup process to allow a faster backup. Typically, at the third partial backup 510, the remote server 204 is required to backup files X3”, X4’ and X5, whereas otherwise the backup server 204 may be required to back up the files XI, X2’, X3”, X4’and X5 during the time window T8.

Referring now to FIG. 6, illustrated is a fifth snapshot 610 of the backup server 204, in accordance with an embodiment of the present disclosure. As shown in FIG. 6, depicted is the fifth snapshot 610 at time T8, wherein the backup server 204 comprises a copy (or backup) of the first file XI backed up via the first partial backup 320, the second file X2’ and fourth file X4 via the second partial backup 420 and the third file X3”, updated fourth file X4’ and the fifth file X5 via the third partial backup 520. The backup server 202 is configured to combine the partial backups, namely the first partial backup 310, the second partial backup 410 and the third partial backup 510 to form a complete backup 610 of the file system 202. Optionally, the backup server 202 is configured to discard the earlier version of the files backed up due to an incorrect determination of the suitability of the files in the file system 202.

The snapshot technique enables a user to perform data backup of a file and access the file at any point in time. Optionally, the data backup comprises a continuous backup to automatically save a copy (or backup) of the data or file with respect to every change made to the data or file. As a result, essentially capturing each version of the data that a user (of the computing system) saves or reserves in the computing system. Optionally, a file system tracker is configured to continuously record the meta data operations made to the files in a continuous catalogue. In such a case, the continuous catalogue maintains up-to-date information about the state of any file at any point in time. This facilitates the backup server 204 to recover the file from a specific point in time the file by searching the state of the file at the point in time in the continuous catalogue. The snapshot technique for the method 100 is robust, may be implemented reliably in the real-world and requires minimal computational resources in parsing the file system 202 for changes in files. However, existing data protection techniques are associated with several limitations. Firstly, the existing data protection techniques provide only a limited number of points in time for the data backup associated with any data, such as a file. For example, in a snapshot-based data protection technique, snapshots (i.e. backup data images) are generated temporarily (i.e. eventually needs to be deleted), and since such snapshots consume considerable space they are generated in periodic manner, i.e. in a certain time interval. This results in creating the limited number of points in time available for the data backup associated with the file. Moreover, generating said snapshots are expensive, and their generation and deletion require considerable computational resources. The snapshot-based data protection technique requires a huge backup time when difference between two consecutive snapshots is considerable, for example between 15 minutes to several hours. This also leads to a considerable recovery point objective. In the snapshot-based data protection technique, when snapshots are mounted on an array to be read by the backup server 204, the snapshots reduce the bandwidth provided by the array to production workloads.

FIG. 7 illustrates a block diagram of an apparatus 700 for controlling back up of a file system 710, in accordance with an embodiment of the present disclosure. As shown, the apparatus 700 comprises an interface 702, a file identification module 704 and a file back up module 706 for controlling back up of the file system 710. The apparatus 700 of FIG. 7 should be read in line with FIGs. 1-6. Typically, the apparatus 700 is operable to perform the method 100 or 200 for controlling the backing up of the file system 710. It will be apparent that the apparatus 700 may be operatively coupled to other components, such as a processor, a memory and a backup server (as explained in FIGs. 2-6), for implementing the backing up of the file system 710. The term “apparatus” refers to refers to hardware, software, firmware or a combination of these configured to perform at least one computing task based upon input from the user. Examples of the apparatus 700 may include, but are not limited to, a computer, a virtual machine (VM). Generally, the apparatus 700 includes computational elements such as a memory, a processor, a data communication interface, a network adapter and the like, to store, process and/or share files or information with other apparatuses, such as another computation device, or server and the like.

The apparatus 700 comprises an interface 702 configured to receive a user input providing an early backup time window, where the early backup window is prior to a scheduled daily back up time for the file system 710. The term “interface” refers to a shared boundary across which two or more separate components of a computing system or apparatus (such as the apparatus 700) act on or communicate information. The exchange can be between software, computer hardware, peripheral devices, humans, and combinations of these. The interface 702 is employed to receive the user input associated to the early backup time window. For example, the interface 702 may be at least one of a command line interface (cli), a graphical user interface (GUI), a menu driven interface (mdi), a form based interface (fbi) or a natural language interface(nli). It will be appreciated that any type of interface may be employed by the apparatus 700 based on different implementations without limiting the scope of the disclosure.

The apparatus 700 further comprises a file identification module 704 configured to identify one or more files in the file system 710 that are suitable for early back up. The “file identification module” 704 refers to any hardware, software, firmware or a combination of these configured to perform at least one computing task based on the implementation. In operation, the file identification module 704 is configured to identify one or more files in the file system that are suitable for early back up. The file identification module 704 analyses each file in the file system 710 to identify the one or more files suitable for early back up. Typically, the file system 710 comprises of a plurality of files, wherein the plurality of files may or may not change prior to the scheduled backup time. Typically, the early backup is implemented to backup one or more files from the file system 710 determined suitable, wherein a change is less likely to occur. The file identification module 704 determines the probability of a change occurring in each file before the schedule daily backup time to determine one or more files suitable for early backup. The probability of the change occurring the file may be determined by the file identification module 704 using a machine learning algorithm or model while considering a plurality of parameters such as, but not limited to, past changes or history of the file, type of the file, user input, scheduled tasks and so forth. Optionally, the file identification module 704 is supplemented with additional computation methods, such as neural networks, and hierarchical clusters of pseudo-analog variable state machines implementing machine learning and artificial intelligence models and algorithms.

The apparatus 700 further comprises a file back up module 706 configured to initiate an early back up for the one or more determined files during the early backup time window. The file backup module 706 initiates the early back up for the one or more files determined suitable for the early back up during the early backup time window. The “early back up" refers to the backup operation performed for the one or more files during the early backup time window prior to the scheduled backup time window. Specifically, a copy or a snapshot of the one or more files are transmitted or stored at a remote location such as a remote server or a secondary memory such as a backup memory, configured to store the copy of the one or more files as a backup. The “file backup module" 706 refers to any hardware, software, firmware or a combination of these configured to perform at least one computing task based on the implementation. In operation, the file backup module 706 is configured to backup one or more files in the file system that are determined suitable for early back up.

The file backup module 706 is further configured to initiate a scheduled daily back up at the scheduled daily back up time for a plurality of remaining files in the file system. Upon initiating the early backup for the one or more files in the file system 710, the file backup module 706 is configured to initiate the scheduled daily back up for the plurality of remaining files in the file system 710.

In an embodiment, the file identification module 704 is further configured to analyze file metadata to identify one or more files that have been edited since a preceding scheduled daily back-up time. The file identification module 704 is configured to analyze the file metadata to identify the one or more files, wherein a change or edit has been made since a preceding scheduled daily backup time. The file identification module 704 analyses the metadata information by comparing the snapshots taken at two or more different time windows, wherein specifically, one of the two different time windows is taken at the preceding scheduled daily backup time and the other time window being the current time window. Alternatively stated, the file identification module 704 identifies the changes in one or more files by analyzing the metadata between the two versions of the file, i.e. a snapshot taken at the preceding scheduled daily backup time and a snapshot taken at current time. The file identification module 704 is configured to identify any change such as an addition, deletion or any updates on the file that may occur after the scheduled daily backup time.

In another embodiment, the file identification module 704 is further configured to receive the file metadata from an agent installed on the file system 710, in response to a file edit event. The file identification module 704 is configured to receive the file metadata from the agent installed on the file system 710. In operation, the agent installed on the file system 710 is configured to transmit the metadata regarding files that are exposed to change(s) as a response the file edit event. The file edit event may include, but is not limited to, opening a file, closing a file, creating a file, deleting a file or a part of the file, rename a file, write to a file or add information to a file, flush a file, read from a file, move a file.

In yet another embodiment, wherein the file identification module 704 is further configured to send queries to the file system 710 at a predetermined time interval and receive the file metadata in response. The file identification module 704 is configured to send or transmit queries to the file system 710 at the predetermined time interval. Typically, the file identification module 704 transmits the queries to the file system 710 to receive the file metadata as the response to the queries. The query is sentto the file system 710 to facilitate the transmission of the file metadata from the file system 710 as the response.

In an embodiment, the file identification module 704 is further configured to determine a suitability for each file based on file native metadata, wherein the file native metadata comprises one or more of a file name, file size, file user permissions, file group permissions, creation time, last accessed time, last modification time and file type. The file identification module 704 is configured to analyze each file in the file system 710 to further determine the suitability for each file. For example, a suitable file is indicative of a change or update in the corresponding file and hence presenting a requirement to be backed up. Typically, the file identification module 704 determines the suitability for each file based on the file native metadata. The file identification module 704 identifies the changes based on at least the file native metadata, and further correspondingly determines the suitability for the file. For example, a file is determined suitable for early backup by the file identification module 704 when a change (if any) in the file occurs after the preceding scheduled daily back-up time.

In another embodiment, the file type indicates that edits to the file are append-only. The file identification module 704 is configured to identify the suitability of the file based on at least the file type, wherein the file type is indicative of the type of edits or changes made to the file. For example, the change or edit made to the file is append-only. In other words, only additions are made to the file in the file system 710. Collectively, each file is analysed separately and correspondingly grouped together for efficient operation. In an example, one or more files determined suitable for backup are grouped together, whereas the remaining files are group together separately to be backed up at a later time window or a later point in time.

In an embodiment, the file identification module 704 is further configured to determine an expected last access time by analyzing file metadata collected over time. The file identification module 704 is configured to determine the expected last access time for each file in a file system 710. Upon determining the expected last access time, the file identification module 704 determines whether a change has occurred at the expected last access time to further determine the suitability of the file for back up. For example, if the expected last access time is before the scheduled daily backup time, the file may be backed up early at the early backup window defined by the user.

In another embodiment, the plurality of remaining files include those files not backed up during the early back up window. Alternatively stated, the plurality of remaining files refer to the files in the file system 710 that were not backed up during the early backup time window. In an example, the plurality of remaining files include the files determined not suitable for the early backup by the file identification module 704, due to a probability of change or update at a later point in time.

In an embodiment, the scheduled daily back up includes checking whether a file that was backed up during the early back up window has been edited further. The file identification module 704 is configured to check prior to the scheduled daily backup time, whether the file backed up during the early backup time window has undergone any changes or updates. Typically, the file identification module 704 compares the snapshots of the file backed up during the early backup time window against a current snapshot of the file. If a change or edit is inferred from the comparison, file identification module 704 is configured to back up the edited file and optionally, replace the early backup of the file.

Various embodiments, operations, and variants disclosed above, with respect to the method 100 or 200, apply mutatis mutandis to the apparatus 700.

Referring to FIGs. 8A and 8B illustrated is a block diagram of a system 800A, 800B, respectively, for backing up a file system 814, in accordance with an embodiment of the present disclosure. Typically, the system 800A or 800B relates to a computing system operable to perform the backing up of the file system 814. The system 800A or 800B may be referred to as hardware, software, firmware or a combination of these configured for backing up the file system 814. The systems 800A and 800B of FIG. 8A and 8B respectively, should be read in line with FIGs. 1-7. Typically, the system comprises a processor 812 for performing the backup operation.

Referring to FIG. 8A, illustrated is a block diagram of the system 800A for backing up the file system 814, in accordance with an embodiment of the present disclosure. Herein, the processor 812 is part of a file system server 810 and initiating a back up includes sending the files to a backup server 820. In operation, the processor 812 of the file system server 810 is configured to initiate a backup of the files on the file system 814, wherein initiating the back up includes sending or transmitting the files to the backup server 820. Typically, the back up is initiated by the file system server 810 using the processor 812 to back up the files on the file system 814.

Referring to FIG. 8A, illustrated is a block diagram of the system 800A for backing up the file system 814, in accordance with an embodiment of the present disclosure. Herein, the processor 812 is part of a file system server 810 and initiating a back up includes sending the files to a backup server 820. In operation, the processor 812 of the file system server 810 is configured to initiate a backup of the files on the file system 814, wherein initiating the back up includes sending or transmitting the files to the backup server 820. Typically, the back up is initiated by the file system server 810 using the processor 812 to back up the files on the file system 814. The file system server 810 and the backup server 820 refer to a type of server configured to facilitate data, files, applications, and/or database backups. Notably, each of the file system server 810 and the backup server 820 comprises both hardware and software capabilities to manage and recover files from the backup. Referring to FIG. 8B, illustrated is a block diagram of the system 800B for backing up the file system 814, in accordance with an embodiment of the present disclosure. Herein, the processor 812 is part of a backup server 830 and initiating a back up includes requesting the files from the file system server 814. In operation, the processor 812 of the backup server 830 is configured to initiate a backup of the files in the file system 814, wherein initiating the back up includes requesting for receiving the files from the file system server 810. Typically, the back up is initiated by the backup server 830 using the processor 812 to back up the files in the file system 824. Notably, the backup server 830 comprises the processor 812 for initiating the back up in contrast to the file system server 810 as depicted in FIG. 8A.

Referring to FIGs. 9A and 9B illustrates a block diagram of a system 900A, 900B, respectively, for controlling back up of a file system 914, in accordance with an embodiment of the present disclosure. Typically, the system 900A or 900B relates to a computing system operable to perform the backing up of the file system 914. The system 900A or 900B may be referred to as hardware, software, firmware or a combination of these configured for controlling the back up of the file system 914. The systems 900A and 900B of FIG. 9A and 9B respectively, should be read in line with FIGs. 1 to 7, 8 A and 8B.

Referring to FIG. 9A, illustrated is a block diagram of the system 900A for controlling the back up of the file system 914, in accordance with an embodiment of the present disclosure. Typically, the system 900A comprises an apparatus 912 (similar to the apparatus 700 of FIG.7) for controlling the back up of the file system 914. Herein, the apparatus 912 is part of a file system server 910 and initiating a back up includes sending the files to a backup server 920. In operation, the apparatus 912 of the file system server 910 is configured to initiate a backup of the files on the file system 914, wherein initiating the back up includes sending or transmitting the files to the backup server 920. Typically, the back up is initiated by the file system server 910 using the apparatus 912 to back up the files on the file system 914. The file system server 910 and the backup server 920 refer to a type of server configured to facilitate data, files, applications, and/or database backups. Notably, the file system server 910 comprises both hardware and software capabilities to manage and recover files from the backup.

Referring to FIG. 9B, illustrated is a block diagram of the system 900B for controlling the back up of the file system 914, in accordance with an embodiment of the present disclosure. Typically, the system 900B comprises the apparatus 912 for controlling the back up of the file system 914. Herein, the apparatus 912 is part of the backup server 930 and initiating a back up includes requesting the files from the file system server 910. In operation, the apparatus 912 of the backup server 930 is configured to initiate a backup of the files on the file system 914, wherein initiating the back up includes requesting the files for receiving from the file system server 910. Typically, the back up is initiated by the backup server 930 using the apparatus 912 to back up the files on the file system 914. The file system server 910 and the backup server 930 refer to a type of server configured to facilitate data, files, applications, and/or database backups. Notably, the backup server 930 comprises both hardware and software capabilities to manage, control and recover files from the backup in contrast to the file system server 910 as depicted in FIG. 9A.

Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.