Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
HIERARCHICAL CATALOG FOR STORAGE TAPES
Document Type and Number:
WIPO Patent Application WO/2024/099541
Kind Code:
A1
Abstract:
A system is described, the system including one magnetic recording tape of a plurality of magnetic recording tapes, each one magnetic recording tape including an identifier, the one magnetic recording tape including at least one stored data item a full catalog including metadata items for each magnetic recording tape of the plurality of magnetic recording tapes, the metadata objects including metadata of stored data a backup catalog stored on a first fast storage media, the backup catalog including a mapping of one of devices or virtual devices to the identifier anda sparse catalog stored on a second fast storage media, the sparse catalog including a mapping of at least one file or object to the identifier. Related methods and apparatus are also described.

Inventors:
KUVENT AVIV (DE)
NATANZON ASSAF (DE)
TOAFF YAIR (DE)
ZACH IDAN (DE)
STERNBERG MICHAEL (DE)
Application Number:
PCT/EP2022/081145
Publication Date:
May 16, 2024
Filing Date:
November 08, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
KUVENT AVIV (DE)
International Classes:
G06F3/06
Foreign References:
US20130290388A12013-10-31
US20170293439A12017-10-12
US20200341694A12020-10-29
US7693878B22010-04-06
US9063666B22015-06-23
Attorney, Agent or Firm:
HUAWEI EUROPEAN IPR (DE)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A system comprising: one magnetic recording tape of a plurality of magnetic recording tapes, each one magnetic recording tape comprising an identifier, the one magnetic recording tape comprising at least one stored data item; a full catalog comprising metadata items for each magnetic recording tape of the plurality of magnetic recording tapes, the metadata objects comprising metadata of stored data; a backup catalog stored on a first fast storage media, the backup catalog comprising a mapping of one of devices or virtual devices to the identifier; and a sparse catalog stored on a second fast storage media, the sparse catalog comprising a mapping of at least one file or object to the identifier.

2. The system of claim 1, wherein the metadata object comprise a plurality of attributes of the data.

3. The system of claim 1 or claim 2, wherein the metadata object comprises at least the identifier and at least one attribute of the data.

4. The system of any of claims 1 - 3, wherein either storing or updating one of the backup catalog or the sparse catalog on the first or the second fast storage media is performed when the one magnetic recording tape is full.

5. The system of any of claims 1 - 4 and further comprising: a search engine operative to search for a data item by searching the sparse catalog for the data item, and operative to retrieve an identity of at least one magnetic recording tape of the plurality of magnetic recording tapes as a result of the searching; a second search engine operative to search for the data item by searching the backup catalog on the basis of the result of the retrieved identity if the identity of at least two magnetic recording tapes is retrieved by the first search engine; a third search engine operative to search the full catalog on the basis of the result of the search by the second search engine, and to return a location of the data item.

6. The system of any of claims 1 - 5, wherein the first fast storage media and the second fast storage media comprise the same fast storage media.

7. The system of any of claims 1 - 5, wherein the first fast storage media and the second fast storage media comprise the different fast storage media.

8. The system of any of claims 1 - 7 wherein the full catalog is stored on fast storage media until the one magnetic recording tape is full.

9. The system of claim 8, wherein when the one magnetic recording tape is full, the full catalog is copied to the one magnetic recording tape.

10. A method comprising: storing at least one data item on one magnetic recording tape of a plurality of magnetic recording tapes, each one magnetic recording tape comprising an identifier; cataloging a full catalog comprising metadata items for each magnetic recording tape of the plurality of magnetic recording tapes, the metadata objects comprising metadata of data stored on each one of the magnetic tapes; mapping one of a device or a virtual device in a backup catalog, the backup catalog stored on fast storage media, to one of the plurality of magnetic recording tapes comprising a backup of a device or virtual device; mapping at least one file or object to at least one magnetic recording tape of the plurality of magnetic recording tapes in a sparse catalog, the sparse catalog stored on fast storage media.

11. The method of claim 10 wherein the metadata object comprise a plurality of attributes of the data.

12. The method of claim 10 or claim 12 wherein the metadata object comprises at least an identification of the one magnetic recording tape among the plurality of magnetic recording tapes, and at least one attribute of the data.

13. The method of any of claims 10 - 12, wherein either storing or updating one of the backup catalog or the sparse catalog on a storage medium external to the plurality of magnetic recording tapes is performed when the one magnetic recording tape is full.

14. The method of any of claims 10 - 13, further comprising; searching for an item in the data by searching the sparse catalog for the item; retrieving an identity of at least one magnetic recording tapes of the plurality of magnetic recording tapes; searching the backup catalog for the item on the basis of the result of the identity if the identity of at least two least two magnetic recording tapes is retrieved; determining an identity of at least one magnetic recording tape of the plurality of magnetic recording tapes as a result of the searching the backup catalog; searching the full catalog on the basis of the result of the identity of the at least one magnetic recording tapes; determining a location the item on the basis of the searching the full catalog.

15. The method of claim 14 and further comprising retrieving the item.

16. The method of any of claims 10 - 15 and wherein the first fast storage media and the second fast storage media comprise the same fast storage media.

17. The method of any of claims 10 - 15 and wherein the first fast storage media and the second fast storage media comprise the different fast storage media.

18. The method of any of claims 10 - 17 wherein the full catalog is stored on fast storage media until the one magnetic recording tape is full.

19. The method of claim 18, wherein when the one magnetic recording tape is full, the full catalog is copied to the one magnetic recording tape.

20. A system comprising: a plurality of magnetic recording tapes, each one magnetic recording tape comprising an identifier, the one magnetic recording tape comprising at least one stored data item; a full catalog comprising metadata items for each magnetic recording tape of the plurality of magnetic recording tapes, the metadata objects comprising metadata of stored data; a backup catalog stored on a first fast storage media, the backup catalog comprising a mapping of one of devices or virtual devices to the identifier; a sparse catalog stored on a second fast storage media, the sparse catalog comprising a mapping of at least one file or object to the identifier; a first search engine operative to search the sparse catalog in order to locate a desired data item; a second search engine operative to search the backup catalog on the basis of a result returned by the first search engine; and a third search engine operative to search the full catalog on the basis of a result returned by the second search engine.

21. The system of claim 20 wherein the metadata objects comprise a plurality of attributes of the data.

22. The system of claim 20 or claim 21 wherein the metadata objects comprises at least an identifier of the one magnetic recording tape among the plurality of magnetic recording tapes, and at least one attribute of the data.

23. The system of any of claims 20-22 wherein the full catalog comprises a full catalog of data stored on one magnetic recording tape.

24. A method comprising: storing at least one data item on one magnetic recording tape among a plurality of magnetic recording tapes, the one magnetic recording tape comprising an identifier; creating a full catalog comprising metadata items stored on the one magnetic recording tape, the full catalog stored on the one magnetic recording tape; mapping one of devices or virtual devices to the identifier in a backup catalog, and storing the backup catalog on a first fast storage media; mapping the at least one file or object to the identifier in a sparse catalog, the sparse catalog stored on a second fast storage media; searching the sparse catalog in order to locate a desired data item, by a first search engine; searching the backup catalog on the basis of a result returned by the first search engine by a second search engine; and searching the full catalog by a third search engine on the basis of a result returned by the second search engine.

25. A system comprising: a sparse catalog stored on a fast storage media, the sparse catalog comprising a mapping of at least one file or object to an identifier of a magnetic recording tape among a plurality of magnetic recording tapes; a search engine executed by a processor, the search engine operative to search the sparse catalog, wherein if a file or object searched for in the sparse catalog is not found in the sparse catalog, the file or object searched for is not on the plurality of magnetic recording tapes, and if the file or object searched for in the sparse catalog is located in the sparse catalog, the file or object searched for might be stored on the magnetic recording tape associated with a returned identifier.

26. The system of claim 25 and further comprising a second search engine for searching the magnetic recording tape associated with the returned identifier for the file or object.

Description:
HIERARCHICAL CATALOG FOR STORAGE TAPES

BACKGROUND

The present disclosure, in some embodiments thereof, relates to a method and system for locating objects on magnetic recording tapes and, more particularly, but not exclusively, to a method and system utilizing hierarchical catalogs in order to locate objects on magnetic recording tapes.

The need for data storage is reaching quantities such that typical data storage disks are no longer big enough to accommodate data storage needs. The focus for secondary storage and archiving is accordingly moving from storage disks and deduplication to magnetic recording tape technologies. At least in part, this is due to the magnetic recording tape being less expensive than storage disks and having long data retention.

In typical usage of magnetic recording tapes for data storage, the data stored on the tapes is usually not deduplicated. Deduplication results in more seeks of the data, which is a highly computationally expensive operation on magnetic recording tapes. Since the data is not deduplicated, the number of files or objects per magnetic recording tapes is significantly smaller (typically, dependent on the deduplication ratio) then if the data was stored deduplicated on the magnetic recording tapes.

Since the seek time in magnetic recording tape is typically relatively long (e.g., tens of seconds), most application uses tapes without any optimizations beside built-in compression. There is a need to allow for quickly locating files or objects on magnetic recording tape, to recover the files or objects for the user. It is appreciated that long seek time in magnetic recording tape is an issue regardless of deduplication. Deduplication further exacerbates the problem by having each tape represent many more files than it would represent without deduplication.

Additional background art includes US Patent No. 7693878 of Anglin et al., which describes system, apparatus, and process which create a table of contents (TOC), including one or more table of contents (TOC) entries, to manage data in a hierarchical storage management system. Each TOC entry contains metadata describing the contents and attributes of a data object within an image, which is an aggregation of multiple data objects into a single object for storage management purposes. The TOC is stored in a storage hierarchy, such as magnetic disk, for fast access of and efficient operation on the aggregated TOC entries. The system, apparatus, and process also provide for aggregating the TOC entries from one or more TOCs into a TOC set in the storage management server database. The TOC set may be manipulated and queried in order to find a particular data object or image referenced by a TOC entry. The TOC entries, TOCs, and TOC sets may be dynamically managed by the hierarchical data storage management system through implementation of a set of policy management constructs that define appropriate creation, retention, and movement of the objects within the database and storage hierarchy.

Still additional background art includes US Patent No. 9063666 of Amir, et al., describes a method which includes loading a tape cartridge into at least one tape drive installed in an automated tape library, where a tape of the tape cartridge has at least two partitions; writing plurality of data blocks on a first of the partitions; and writing an index on a second of the partitions, wherein the index includes information about at least one of files and the blocks on the first partition.

The shortcomings of the above cited background art leaves unsolved at least both of the problems of fast location of files and objects across multiple tapes and efficiently representing large numbers of files.

SUMMARY

According to an aspect of some embodiments of the present disclosure there is provided a system including one magnetic recording tape of a plurality of magnetic recording tapes, each one magnetic recording tape including an identifier, the one magnetic recording tape including at least one stored data item, a full catalog including metadata items for each magnetic recording tape of the plurality of magnetic recording tapes, the metadata objects including metadata of stored data, a backup catalog stored on a first fast storage media, the backup catalog including a mapping of one of devices or virtual devices to the identifier and a sparse catalog stored on a second fast storage media, the sparse catalog including a mapping of at least one file or object to the identifier.

According to an aspect of some embodiments of the present disclosure there is provided a method including storing at least one data on one magnetic recording tape of a plurality of magnetic recording tapes, each one magnetic recording tape including an identifier, cataloging a full catalog including metadata items for each magnetic recording tape of the plurality of magnetic recording tapes, the metadata objects including metadata of data stored on each one of the magnetic tapes, mapping one of a device or a virtual device in a backup catalog, the backup catalog stored on fast storage media, to one of the plurality of magnetic recording tapes including a backup of a device or virtual device, mapping at least one file or object to at least one magnetic recording tape of the plurality of magnetic recording tapes in a sparse catalog, the sparse catalog stored on fast storage media.

According to an aspect of some embodiments of the present disclosure there is provided a system comprising a plurality of magnetic recording tapes, each one magnetic recording tape including an identifier, the one magnetic recording tape including at least one stored data item, a full catalog including metadata items for each magnetic recording tape of the plurality of magnetic recording tapes, the metadata objects including metadata of stored data, a backup catalog stored on a first fast storage media, the backup catalog including a mapping of one of devices or virtual devices to the identifier, a sparse catalog stored on a second fast storage media, the sparse catalog including a mapping of at least one file or object to the identifier, a first search engine operative to search the sparse catalog in order to locate a desired data item, a second search engine operative to search the backup catalog on the basis of a result returned by the first search engine, and a third search engine operative to search the full catalog on the basis of a result returned by the second search engine.

According to an aspect of some embodiments of the present disclosure there is provided a method including storing at least one data item on one magnetic recording tape among a plurality of magnetic recording tapes, the one magnetic recording tape including an identifier, creating a full catalog including metadata items stored on the one magnetic recording tape, the full catalog stored on the one magnetic recording tape, mapping one of devices or virtual devices to the identifier in a backup catalog, and storing the backup catalog on a first fast storage media, mapping the at least one file or object to the identifier in a sparse catalog, the sparse catalog stored on a second fast storage media, searching the sparse catalog in order to locate a desired data item, by a first search engine, searching the backup catalog on the basis of a result returned by the first search engine by a second search engine, and searching the full catalog by a third search engine on the basis of a result returned by the second search engine. According to an aspect of some embodiments of the present disclosure there is provided a system including a sparse catalog stored on a fast storage media, the sparse catalog including a mapping of at least one file or object to an identifier of a magnetic recording tape among a plurality of magnetic recording tapes, a search engine executed by a processor, the search engine operative to search the sparse catalog, wherein if a file or object searched for in the sparse catalog is not found in the sparse catalog, the file or object searched for is not on the plurality of magnetic recording tapes, and if the file or object searched for in the sparse catalog is located in the sparse catalog, the file or object searched for might be stored on the magnetic recording tape associated with a returned identifier.

According to some embodiments of the disclosure, the metadata object include a plurality of attributes of the data.

According to some embodiments of the disclosure, wherein the metadata object includes at least the identifier and at least one attribute of the data.

According to some embodiments of the disclosure, either storing or updating one of the backup catalog or the sparse catalog on the first or the second fast storage media is performed when the one magnetic recording tape is full.

According to some embodiments of the disclosure, at least the first and second aspects of the disclosure further include a search engine operative to search for a data item by searching the sparse catalog for the data item, and operative to retrieve an identity of at least one magnetic recording tape of the plurality of magnetic recording tapes as a result of the searching, a second search engine operative to search for the data item by searching the backup catalog on the basis of the result of the retrieved identity if the identity of at least two magnetic recording tapes is retrieved by the first search engine, a third search engine operative to search the full catalog on the basis of the result of the search by the second search engine, and to return a location of the data item.

According to some embodiments of the disclosure, the first fast storage media and the second fast storage media include the same fast storage media. According to some embodiments of the disclosure, the first fast storage media and the second fast storage media include the different fast storage media.

According to some embodiments of the disclosure, the full catalog is stored on fast storage media until the one magnetic recording tape is full.

According to some embodiments of the disclosure, when the one magnetic recording tape is full, the full catalog is copied to the one magnetic recording tape.

According to some embodiments of the disclosure, the metadata objects comprise at least an identifier of the one magnetic recording tape among the plurality of magnetic recording tapes, and at least one attribute of the data.

According to some embodiments of the disclosure, at least the fifth aspect of the disclosure further include a second search engine for searching the magnetic recording tape associated with the returned identifier for the file or object.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the disclosure, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

Implementation of the method and/or system of embodiments of the disclosure can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the disclosure, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.

For example, hardware for performing selected tasks according to embodiments of the disclosure could be implemented as a chip or a circuit. As software, selected tasks according to embodiments of the disclosure could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the disclosure, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic harddisk and/or removable media, for storing instructions and/or data. Optionally, a network connection is provided as well. A display and/or a user input device such as a keyboard or mouse are optionally provided as well.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS )

Some embodiments of the disclosure are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the disclosure. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the disclosure may be practiced.

In the drawings:

FIG. 1 is a depiction of a various catalogs used for searching for an object stored on a magnetic recording tape;

FIG. 2 is a first depiction of locating an object on a magnetic recording tape using a full catalog;

FIG. 3 is a depiction of an exemplary computer, upon which a search engine may be implemented or may be executed;

FIG. 4 is a flow chart of one exemplary method of storing creating the sparse catalog, the backup catalog and the full catalog; and

FIG. 5 is a is a flow chart of one exemplary method of searching the sparse, backup, and full catalogs for a file or object stored on magnetic recording tape.

DESCRIPTION OF SPECIFIC EMBODIMENTS

The present disclosure, in some embodiments thereof, relates to a method and system for locating objects on magnetic recording tapes and, more particularly, but not exclusively, to a method and system utilizing hierarchical catalogs in order to locate objects on magnetic recording tapes. When objects are written to a magnetic recording tape, a full catalog is stored on fast storage media. The full catalog comprises metadata items for the files and objects written on each magnetic recording tape. Such metadata items may include identifiers, such as file names and suffixes, as well as various file attributes as are associated with the stored object. The fast storage media may then be searched when trying to locate the objects on the magnetic recording tape. Once the magnetic recording tape is full (i.e., there is no space remaining on the magnetic recording tape on which to write further objects), and the magnetic recording tape is no longer actively used for copying objects, the full catalog may be written to a partition, referred to hereinafter as a catalog partition, at the beginning of the tape. Alternatively, the full catalog may be stored in the same partition as the stored objects, as will be described below. Information from the full catalog may now be added to other catalogs of object locations, including other types of catalogs of object locations, as will be described below.

As used herein, the term “fast storage”, and by contrast, the term “slow storage” are relative terms which refer to the relative amount of time to read objects from or write objects to a given type of storage media. For example, reading an object from a solid state drive, a hard drive, or from flash memory is considered “fast”, while, by contrast, reading an object from magnetic recording tape is considered “slow”, since seeking objects on the magnetic recording tape is typically requires rolling the tape, which is a lengthy mechanical operation.

As used herein, the term “object” may be anything stored on memory medium, and more particularly, in the present context, may be anything stored on long-term memory medium. As such, a data file, an executable file, data in a database table, and so forth may all be non-limiting examples of objects. The term file may be used herein and is understood to refer to an object as defined herein.

Typically, objects stored on magnetic recording tapes are not deduplicated. Deduplication typically results in more seeks for the object, and seeks are, a very expensive operation when performed on magnetic recording tape due to the amount of time required to roll the magnetic recording tape. Accordingly, fewer objects are stored on the magnetic recording tape than if the magnetic recording tape was subjected to deduplication. On the other hand, if the objects stored on the magnetic recording tape are subjected to deduplication, it is to be expected that the number of obj ects stored on the magnetic recording tape will increase, possibly significantly. An increase in the number of objects stored on the magnetic recording tape will, correspondingly, lead to an increase in the size of the catalog of objects stored on the magnetic recording tape. Storing this large catalog on fast media has costs, such as space, as well as an expected degradation of overall performance of searches in the catalog.

Referring now to the drawings, Figure 1 illustrates various catalogs used for searching for an object stored on a magnetic recording tape, as will now be explained.

A full catalog of contents of a magnetic recording tape is stored in a catalog partition at the start of each magnetic recording tape. That is to say that each object stored on the magnetic recording tape is cataloged in the full catalog, as will be explained below, with reference to Fig. 2. More specifically, the catalog partition of Magnetic Recording Tape 1 of Fig. 1 has, stored thereupon, the full catalog of objects stored on Magnetic Recording Tape 1. The catalog partition of Magnetic Recording Tape 2 of Fig. 1 has, stored thereupon, the full catalog of objects stored on Magnetic Recording Tape 2. The catalog partition of Magnetic Recording Tape 3 of Fig. 1 has, stored thereupon, the full catalog of objects stored on Magnetic Recording Tape 3. Similarly, the catalog partition of Magnetic Recording Tape N (not depicted in Fig. 1) has, stored thereupon, the full catalog of objects stored on Magnetic Recording Tape N. The full catalog may include a list of objects stored on the magnetic recording tape, and at least some of the attributes associated with the stored object. For example, in the case of stored files, the file name, the file identifier (for example, a unique tag or number used within file system), the file type, the file location on the magnetic recording tape, the file size, date and time of the last usage of the file, and so forth. It is appreciated that when a magnetic recording tape is deduplicated, the deduplication is applied on a per magnetic recording tape basis, and, as such an object may be stored on more than one magnetic recording tape.

A backup catalog, typically stored on a fast medium, such as, and without limiting the generality of the foregoing, a solid state drive (SSD) or a hard disk drive (HDD). The backup catalog may comprise a mapping of machines or virtual machines (VM) to magnetic recording tapes on which objects are backed up. Records in the backup catalog may comprise an identifier of the backup a time of the backup, an identifier of the VM, and an identifier of the magnetic recording tape containing the backup. A sparse catalog, typically stored on a fast medium, such as, and without limiting the generality of the foregoing, SSD or HDD. The backup catalog and the sparse catalog may be stored on the same storage drive, e.g., the same HDD or SSD, or alternatively, each of the catalogs may be stored on different storage devices. The sparse catalog might not be exact, and maps objects to magnetic recording tapes. By way of an example, a search of the sparse catalog might return that a certain object being searched for exists on certain magnetic recording tapes, when in fact the searched for object is not extant on the returned magnetic recording tapes. The sparse catalog is typically constructed so that the probability of erroneous result of searching of the sparse catalog is small. In addition, the sparse catalog does not return a false result for an object that exists on a particular magnetic recording tape. While a magnetic recording tape is being written to (the tape is “active”), its full catalog is stored on fast media (e.g., SSD or HDD, etc.). The full catalog stored on the fast media may be used for searching the magnetic recording tape. Once the magnetic recording tape is full and no longer “active”, its full catalog can be written to the catalog partition at beginning of the magnetic recording tape. Information about the objects cataloged in the full catalog may then be added to the backup catalog and the sparse catalog.

When an object is searched for, an initial search may be performed on the sparse catalog. Results from the search of the sparse catalog may then be used to search the backup catalog. A search of the backup catalog might return a list of candidate magnetic recording tapes to be searched for the desired object. Alternatively, if the result of the search of the sparse catalog returns only one magnetic recording tape, the full catalog from the one magnetic recording tape may be searched for the desired object.

Examples of methods of search will be provided below. Prior to doing so, more details are provided of searching for objects using the full catalog.

Reference is now made to Fig. 2, which is a depiction of locating an object on a magnetic recording tape using a full catalog. Fig. 2 show a magnetic recording tape 200. The magnetic recording tape 200 is partitioned into a catalog partition 210 and a data partition 220. The presence of each object Fl, F2, F3 stored in the data partition, is indicated by information in the full catalog, stored in the catalog partition 210. Because files may be moved, deleted, and so forth while the magnetic recording tape 200 is active, the full catalog stored in the catalog partition 210 is not necessarily a linear listing of files stored in the data partition 220 of the magnetic recording tape 200. Rather, the pointers from the catalog partition 210 to the data partition 220 depicted in Fig. 2 are symbolic. That is to say the full catalog stored in the catalog partition 210 does not hold an actual location of data of any given object in the data partition 220.

It is appreciated that Fig. 2 depicts one possible embodiment of the present disclosure. In other embodiments, the full catalog does not have to be placed in a separate partition. The full catalog may, alternatively, be disposed in the same partition as data on the magnetic recording tape 200. By way of example, the full catalog may be spread out across the data partition, located at the end of the data partition, and so forth.

The sparse catalog may be implemented in a number of ways. Two possible implementation are now described, by way of example.

In a first implementation, a Bloom filter is created for objects stored on each one magnetic recording tape of a plurality of magnetic recording tapes. As noted above, each magnetic recording tape may comprise a plurality of objects. The Bloom filter may comprise a catalog of hashes of file names, file attributes, or both. When searching for a given object, such as a file, an appropriate hash function or several hash functions is/are run on a file name associated with the object and/or one or more file attributes associated with the object. It may then be determined which Bloom filters in the sparse catalog contain hashes relevant for the desired object. For each Bloom filter containing the hashes relevant for the desired object, the full catalog is loaded from the corresponding magnetic recording tape of the Bloom filter, and the object is searched for on the magnetic recording tape. It is appreciated that this is a space efficient implementation. However, a search containing a wildcard is problematic in Bloom filters. Additionally, removing a deleted object from the Bloom filter is problematic. By way of example, if the desired file has a file name starting with the letters “ab”, a search would ordinarily be executed for the string “ab*”.

Bloom filters space-efficient probabilistic data structures used to test whether an element (such as the desired object) is a member of a set (such as all objects cataloged in the sparse catalog). False positive matches are possible when using Bloom filters, but false negatives are not. Typically, a query of Bloom filters returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed. Additionally, the more items added, the larger the probability of false positives. Bloom filters are usually implemented by starting with an empty Bloom filter, e.g., a bit array of m bits, all set to 0. A plurality, k, of hash functions are defined, each of which maps or hashes some set element to one of the m array positions, generating a uniform random distribution. Typically, k is a small constant which depends on the desired false error rate, while m is proportional to k and the number of elements to be added. To add an element, it is fed it to each of the k hash functions to get k array positions. The bits at all these positions are set to 1. To query for an element (test whether it is in the set), it is fed to each of the k hash functions to get k array positions. If any of the bits at these positions is 0, the element is definitely not in the set; if it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then either the element is in the set, or the bits have by chance been set to 1 during the insertion of other elements, resulting in a false positive.

In a second implementation, the sparse catalog may comprise a map of object name prefixes and object name suffixes. More specifically, the sparse catalog may be comprise record keys and values. The record key may be a prefix of a file name of a given length, such as 3 or 4 characters, and a suffix of the file name of a given length, such as 2 or 3 characters. The value may be a list of magnetic recording tapes on which the files with the given prefix and suffix combination exist. As opposed to the first implantation, which uses Bloom filters, the presently described implementation supports wildcards, and may be extended to support deletions from the magnetic recording tapes.

Each record in the prefix map will be smaller than a record in the full catalog, since the prefix map typically comprises a small set of file attributes to be used as part of the key, and the file name, often the “largest” attribute (e.g., most costly in terms of record storage), is truncated to a small number of bytes representing the prefix and suffix. Accordingly, records will be correspondingly shorter. It is appreciated that extremely short prefixes, e.g., only 1 or two characters in length may not be helpful. A prefix of ba* may point to a very large number of magnetic recording tapes, while adding a character to the “ba” prefix, bac* may eliminate a substantial number of potential files (all those having any character except “c” in the third position of their file name). It is further noted that the above discussion applies to lengths of file suffixes as well. Accordingly, there may be a tradeoff between a size of prefix and suffix and a number of magnetic recording tapes returned in a search of the sparse catalog. By way of example of the sparse catalog, the following table provides a mapping between tape IDs (e.g., a tape number), file names, and an example file creation time, as one file attribute. For ease of description, assume that suffix size is 0, and the prefix size is 3. Furthermore, the present example will be case insensitive, so that the file name Aaaa and the file name aaaa will refer to the same file. One file has the prefix “aaa” (file Aaaa). It has a creation time of 4, and is found on tape 1. There are seven files with the prefix “aba” (e.g., Abaa, Abade, abara, abaaa, abaxyz, abaqed, and abazzz. The creation time of these files range from 3, in the case of Abade to 15 in the case of abaqed. Tow files have the prefix “der” with creation times ranging from 9 to 13.

Accordingly, the sparse catalog for these files will look as follows: Thus, to search for the file abaqed, the sparse catalog is searched for the prefix “aba”, and a result is returned that both magnetic recording tapes 1 and 2 have files matching the search term. Both tapes may then be loaded, and their full catalog may be checked. A search of the full catalog of magnetic recording tape 1 will return no result. However, a search of the full catalog of magnetic recording tape 2 will return that the file is stored on magnetic recording tape 2.

It is appreciated that the above technique may be used to keep the sparse catalog a small size. Furthermore, objects may be distributed across sparse catalog records such that typically, a record does not correspond to a large number of files and or objects stored on many magnetic recording tapes. As an enhancement, records in the sparse catalog may be divided according to file size categories, e.g., files may be divided, for example into groups such as: smaller than 1KB; larger than 1KB and smaller than 1MB, and larger than 1GB. In such a case, file size may then be added as an attribute in the search of the sparse catalog.

As an additional enhancement, records in the sparse catalog may be split using an appropriate splitting method, as is known in the art. By way of example, records may be split according to file creation time. If, for example, there are around 10,000 files with a given prefix and suffix, which are mapped to 4 magnetic recording tapes, these 10,000 files may be split according to ranges of creation times. For example - one record will have files created before 2018, another from 2018 to 2020, and another from 2020 until present day. While this will increase the size of the sparse catalog, the number of magnetic recording tapes returned in a search may be accordingly reduced.

Additionally, when contents of a “completed” magnetic recording tape are cataloged and added to the sparse catalog, a preferable distribution of the file and object names for prefixes and suffixes may be determined in order to determine an optimal prefix and suffix length. It is noted that if a record contains a very large amount of references to tapes (i.e. the files represented by it appear in many tapes), greater than some pre-defined number of references, the record may be deleted, since it creates a need to search many magnetic recording tapes, required loading and performing a search of the full catalog of each loaded magnetic recording tape for the desired file. Such a file may accordingly be listed as requiring a full search of all full catalogs.

Referring back to Fig. 1, a sparse catalog is depicted as a table comprising a file prefix, such as ABC, ABA, and DRE, in a first column; a file suffix, such as no suffix, DOC, and MP3 in a second column; columns providing a minimum and maximum crate time for file having such prefixes and suffixes are provided, as well as a mapping to magnetic recording tapes which have files or objects meeting with the characteristics which may be searched for in the sparse catalog.

A backup catalog is depicted as a table mapping, backup times, and IDs of virtual machines to magnetic recording tapes on which objects backed up from the virtual machines may be found.

Accordingly, to know if a backup of a virtual machine exists, or, on which magnetic recording tape the backup is stored, the backup catalog may be used. The backup catalog may provide a list of magnetic recording tapes on which these backups exist. A result provided after consulting the backup catalog provides complete certainty. Accordingly, there is no need to load the full catalog once the result is provided.

To know if a particular object has been backed up to a magnetic recording tape, and it is known that the object belongs to a backup of a given virtual machine, the sparse catalog is searched. If the object does not exist in the system, sparse catalog will return a negative response, and the search is then terminated. On the other hand, if the sparse catalog returns that the object exists on particular magnetic recording tapes, the object is presumed to exist in the system. If only one magnetic recording tape is returned in this case, only the full catalog of that magnetic recording tape need be searched (without consulting the backup catalog) to locate the object. If the sparse catalog returns that the object exists on several magnetic recording tapes, then, in principle, the full catalog from each of those magnetic recording tapes could be consulted to locate the desired object. However, if it is known that the object belongs to a backup of the given virtual machine, the backup catalog may then be consulted to reduce the number of full catalogs which will be needed to be loaded, sparing the time of loading those magnetic recording tapes, and consulting their respective full catalogs. Only those magnetic recording tapes returned in the search of the backup catalog are loaded, and their respective full catalogs are then consulted to locate the desired object.

Thus, if a search of the sparse catalog for a file having the prefix “ABA” returns that magnetic recording tapes 1, 2, and 3 need to be searched to locate the desired file, virtual machines with device IDs 1, 2, 3, 4, and 5 need to be utilized to load and search the full catalogs of the desired magnetic recording tapes. Reference is now made to Fig. 3, which is a depiction of an exemplary computer, upon which a search engine may be implemented or may be executed. The magnetic recording tapes may be searched by an appropriate search engine implemented on or executed remotely by a computer, such as a computer 300. Computer 300 may comprise one or more processors, such as processor 301, providing an execution platform for executing machine-readable instructions such as software. One of the processors 301 may be a special purpose processor operative for executing the searches of the catalogs described herein above. Commands and data from the processor 301 may be communicated over a communication bus 302. The computer 300 may include a main memory 303, such as a Random Access Memory (RAM) 304, where machine readable instructions may reside during runtime, and a secondary memory 305. The secondary memory 305 may include, for example, a hard disk drive 307, a tape drive 308, for reading magnetic recording tapes, and/or a removable storage drive 309 (which may be not generally accessible on a regular basis, but possibly accessible by service personnel or installers, etc.), such as a floppy diskette drive, a compact disk drive, a flash drive, etc., or a nonvolatile memory where a copy of the machine readable instructions or software may be stored. The secondary memory 305 may also include ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM). A removable storage drive 310 may read from or write to a removable storage unit 309.

The computer 300 may have a user may interface which includes input devices 311, such as a touch screen, a keyboard, a mouse, a stylus, and the like, as well as interfaces for input via the wireless interface, in order to provide user input data or other commands. A display adaptor 315 interfaces with the communication bus 302 and a display 317 and receives display data from the processor 301 and converts the display data into display commands for the display 317.

A network interface 319 is provided for communicating with other systems and devices via a network. The network interface 319 typically includes a wireless interface for communicating with wireless devices in the wireless community. A wired network interface (an Ethernet interface, by way of example) may be present as well.

It is appreciated that each of the full catalog, the backup catalog, and the sparse catalog may be searched by search engines implemented or executed by a single computer, or each of the catalogs may be searched by a dedicated search engine. Reference is now made to FIG. 4 is a flow chart 400 of one exemplary method of storing creating the sparse catalog, the backup catalog and the full catalog. At step 410, at least one data item is stored on one magnetic recording tape of a plurality of magnetic recording tapes, each one magnetic recording tape including an identifier. At step 420, a full catalog including metadata items for each magnetic recording tape of the plurality of magnetic recording tapes is catalogued, the metadata objects including metadata of data stored on each one of the magnetic tapes. At step 430, one of a device or a virtual device in a backup catalog is mapped to one of the plurality of magnetic recording tapes including a backup of a device or virtual device, the backup catalog stored on fast storage media. At step 440, at least one file or object is mapped to at least one magnetic recording tape of the plurality of magnetic recording tapes in a sparse catalog, the sparse catalog stored on fast storage media.

Reference is now made to FIG. 5, which is a is a flow chart 500 of one exemplary method of searching the sparse, backup, and full catalogs for a file or object stored on magnetic recording tape. At step 510, at least one data item may be stored on one magnetic recording tape among a plurality of magnetic recording tapes, the one magnetic recording tape including an identifier. At step 520 a full catalog is created, the full catalog including metadata items stored on the one magnetic recording tape, the full catalog stored on the one magnetic recording tape. At step 530, one of devices or virtual devices is mapped to the identifier in a backup catalog, and storing the backup catalog on a first fast storage media. At step 540, the at least one file or object is mapped to the identifier in a sparse catalog, the sparse catalog stored on a second fast storage media. At step 550, the sparse catalog is searched in order to locate a desired data item, by a first search engine. At step 560, the backup catalog is searched on the basis of a result returned by the first search engine by a second search engine. At step 570, the full catalog is searched by a third search engine on the basis of a result returned by the second search engine.

It is expected that during the life of a patent maturing from this application many relevant magnetic recording tape search technologies will be developed and the scope of the term file and object is intended to include all such new technologies a priori.

The terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to".

The term “consisting of’ means “including and limited to”. The term "consisting essentially of' means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.

As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "an object" or "at least one object" may include a plurality of objects.

Throughout this application, various embodiments of this disclosure may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the disclosure. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements. Although the disclosure has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

It is the intent of the Applicant(s) that all publications, patents and patent applications referred to in this specification are to be incorporated in their entirety by reference into the specification, as if each individual publication, patent or patent application was specifically and individually noted when referenced that it is to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present disclosure. To the extent that section headings are used, they should not be construed as necessarily limiting. In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in it s/their entirety.