Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A DATA STORAGE SYSTEM AND A METHOD FOR ITS OPERATION
Document Type and Number:
WIPO Patent Application WO/2006/090412
Kind Code:
A2
Abstract:
The invention relates to data storage system in which a virtual tape drive or disc drive is emulated and such tape or disc drive achieves data compression without any changes or modifications to the back up software thereby facilitating cost effectiveness and speed. The present invention enables the client to save network bandwidth and storage space.

Inventors:
SHAH MONISH (IN)
Application Number:
PCT/IN2005/000341
Publication Date:
August 31, 2006
Filing Date:
October 20, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SHAH MONISH (IN)
International Classes:
G06F3/06
Foreign References:
US20040153614A12004-08-05
US5410671A1995-04-25
US6493811B12002-12-10
Other References:
"Virtual Tape Systems for IBM Mainframes, a comparative analysis" [Online] 6 February 2005 (2005-02-06), UNIVERSAL SOFTWARE , XP002394183 Retrieved from the Internet: URL:http://web.archive.org/web/20050206034 815/http://www.universalsoft.com/vtacompar isons.htm> [retrieved on 2006-08-08] the whole document
Attorney, Agent or Firm:
Deshpande, Anand (E 301 Riviresa, Baner Road, S. No. 287/3,, 5 Pune 5, IN)
Download PDF:
Claims:
I claim:
1. A data storage system comprising one host CPU or a back up client containing primary storage, a virtual tape drive being emulated in the said back up client, a hardware compression engine that works with the said virtual tape drive to compress the data, a network to transfer the compressed data to the back up server and a storage device attached to the backup server to store the compressed data transferred to it.
2. A data storage system as claimed in claim 1 , wherein the said virtual tape drive formats, writes and compresses the data in the said primary storage of the said host CPU itself with the help of the said hardware compression engine.
3. A data storage system as claimed in claim 1 , wherein there is more than one host CPU or back up client, each containing a virtual tape drive and hardware compression.
4. A data storage system as claimed in claim 1 , wherein the said virtual tape drive is implemented as a kernel level driver in the host system.
5. A data storage system as claimed in claim 1 , wherein the secondary storage is a disk drive or an array of disk drives.
6. A data storage system comprising one host CPU or a back up client containing primary storage, wherein at least one virtual disk drive is emulated in the said host CPU or back up client, a hardware compression engine that works with the said virtual disk drive to compress the data, a network to transfer the compressed data to the back up server and a storage device attached to the backup server to store the compressed data transferred to it.
7. A data storage system as claimed in claim 6, wherein the said virtual disk drive is implemented as kernel level driver for creating a file system within the operating system.
8. A data storage system as claimed in claim 6, wherein there is more than one host CPU or back up client, each containing a virtual disk drive and hardware compression.
9. A data storage system as claimed in claim 6, wherein the secondary storage is a disk drive or an array of disk drives.
10. A method of operation or storing data comprising emulating a virtual tape drive on at least one host CPU, retrieving the data from disc of the said host CPU, formatting the data by backup application on the backup client, writing the data to the said virtual tape in the backup client, compressing the data by the said split virtual tape and the said hardware compression engine, transferring the compressed data to the storage area network, writing the compressed data to the secondary storage of back up server and storing the compressed and transferred data in a device.
11. A method of operation or storing data as claimed in claim 10, wherein the data is compressed by the said virtual tape in the back up client itself.
12. A method of operation or storing data comprising emulating a virtual disk drive on at least one host CPU, retrieving the data from disc of the said host CPU, formatting the data by backup application on the backup client, writing the data to the said virtual disk in the backup client, compressing the data by the said split virtual disk and the said hardware compression engine, transferring the compressed data to the storage area network, writing the compressed data to the secondary storage of back up server and storing the compressed and transferred data in a device.
13. A method of operation or storing data as claimed in claim 12, wherein the data is compressed by the said virtual disk in the back up client itself.
14. A data storage system and A method of operation or storing data as claimed in claims 1 to 13 above with reference to the accompanying specification and drawings.
Description:
A DATA STORAGE SYSTEM AND A METHOD FOR ITS OPERATION

TECHNICAL FIELD OF THE INVENTION:

The invention relates to a data storage system in which a virtual tape drive is emulated and such virtual tape drive achieves data compression without any changes or modifications to the backup software thereby facilitating cost effectiveness and speed. The present invention enables the client to save network bandwidth and storage space.

In another embodiment, a virtual disk drive is emulated instead of the virtual tape drive. The data written by the back-up software is compressed. This approach also achieves the same object of saving network bandwidth and disk space.

The present invention is a data storage system in which a Virtual Tape Drive or Virtual Disk Drive is actually emulated on the host and the data is compressed in the host itself. Here, the host is the machine whose data is to be backed up. In the preferred embodiment, compression is done in hardware, which ensures that the host CPU (Central Processing Unit) is not burdened by the task of compression. Normally, the host CPU's data is stored in its primary storage. During backup, data from the primary storage is accessed and formatted for backup by the host CPU. With the present invention, the data is then compressed and transferred over the network to the secondary storage server, which inturn stores the data on the disc. Here the secondary storage is a means

of storing data and such data is the back up of the data in the primary storage in the form of a copy. In the preferred embodiment, the secondary storage is disk based, although it could also be tape based.

BACKGROUND OF THE INVENTION

Several technologies, such as virtual magnetic tape, storage area networking, and centralized systems management tools have evolved over the years to address data storage management challenges, including data backup.

A popular device for backing up large amounts of information in a data storage system is the automated cartridge library. Automated cartridge libraries store and manage large numbers of tape cassettes containing magnetic tape on which data is recorded. Typically an automated cartridge library comprises a tape library having arrays of uniquely identified cells, each cell being formed to contain a single tape cartridge. A robot assembly is operable within the tape library to locate a particular cell, retrieve a tape cartridge from the particular cell, convey the tape cartridge to a tape drive, and insert the tape cartridge into the tape drive.

The tape drive then reads or writes data to the magnetic tape of the tape cartridge. In this way, a large number of tape cartridges are automatically accessible to a single tape drive. The operation of the automated cartridge library

and the tape drive is typically controlled by a host computer communicating with a library server associated with the tape library.

Further, in typical magnetic tape storage environments, many tape cartridges, tape drives, and library devices are distributed among various clients. These clients may be single user workstations, work groups, departmental servers, or large multi-user platforms. The management complexity of these environments increases as a function of how much data is stored and the heterogeneity of the computational environment. Storage management costs are a significant portion of the information technology budget.

Further, a virtual tape storage system is a hardware and software product configured to interact with a host computer. Application programs running on the host computer store data output on tape volumes. These tape volumes are embodied in the virtual tape storage system as virtual volumes on virtual tape drives (VTD). A virtual volume is a collection of data, organized to appear as a normal tape volume, residing in the virtual tape storage system. To the host computer and to the application programs, the tape volume contents appear to be stored on a physical tape device of a particular model, with the properties and behavior of that model emulated by the actions of the virtual tape storage system. However, the data may actually be stored as a virtual volume on any of a variety of different storage mediums such as disk, tape, or other non-volatile storage media, or combinations of the above. The virtual volume may be spread

out over multiple locations, and copies or "images" of the virtual volume may be stored on more than one kind of physical device, e.g., on tape and on disk. When an image of the virtual volume is stored on disk, different portions of the volume's contents may be stored on different disk drives and on different, non- contiguous areas of each of the disk drives. The virtual tape storage system maintains indexes which allow the contents of any virtual volume whose image is stored on disk to be read by the host, the virtual tape storage system retrieving scattered parts as needed to return them in correct sequence.

When an image of a virtual volume is stored on tape, it may be stored on a single tape together with images of other virtual volumes, or different parts of the image may be stored on more than one different tape with each part again placed with images, or parts of images, of other virtual volumes. In both of these approaches to tape storage of virtual volume images, the images are said to be "stacked."

The virtual volume images may be stored on a variety of different tape device models other than the one being emulated. As with images stored on disk, the virtual tape storage system maintains indexes which allow it to retrieve the contents of any virtual volume stored in a stacked image from the tape or tapes on which it is stored.

Further, tape libraries provide an effective and economical means for storing large amounts of information. Each tape library includes one or more tape drives

performing access operations such as writing to tape and reading from tape. Information stored on magnetic tape is logically broken into tape volumes. The magnetic tape is typically packaged in tape cassettes or cartridges. There may be a one-to-one correspondence between tape volumes and cartridges, there may be multiple volumes on a single cartridge, or one volume may be split between several tape cartridges. In order to access information in a particular tape volume, the tape cartridge containing that volume must be loaded into a tape drive. Tape libraries are typically controlled by a tape library server which may perform functions such as tracking which cassettes hold each tape volume, tracking which library holds each cassette, controlling cassette loading and unloading mechanisms, controlling inter-library transport mechanisms, tracking cartridge use, managing data archiving, and the like.

PRIOR ART

References:

U.S.6, 324,497. U.S.6, 067,481. U.S.6, 779,077.

U.S.6, 842,841. U.S.6, 834,324

In prior art there exists a method and system for connecting a host to a tape drive for accessing a tape cartridge in a data storage system having a tape library provided with cartridges. The tape drives include the host requesting a desired cartridge to access. In response to the request, a proxy agent queries the library to determine which of the tape drives are available to receive the desired cartridge. The proxy agent then selects a tape drive from the available tape drives to receive the desired cartridge. The desired cartridge is then mounted into the selected tape drive. An access controller then connects a virtual tape drive associated with the host to the selected tape drive to provide the host with access to the desired cartridge. The selected tape drive may be selected as a function of its location with respect to the storage location of the desired cartridge. While the host computer is accessing the desired cartridge, the access controller may simultaneously connect another virtual tape drive associated with a second host to another tape drive to provide the second host with access to a different cartridge.

In another prior art, Virtual tape libraries are accessed by computer elements such as host computers. A canonical tape module services each tape drive. Each computer element is provided access to at least one logical tape module. Each logical tape module is mapped to a canonical tape module in response to a request to access a tape volume. Thus, each computer element has a logical tape drive capable of accessing any volume on any cartridge in any tape library within the storage system.

Yet another form of prior art is "Client Compression" option in commercially available backup software. Here different parts of the backup software are run on different hosts. The host or the backup client is running the client software that reads and formats the data to be backed up. This operation is controlled by a different host, the backup server, which receives this formatted data over the network and writes it to secondary storage. In this setup, some backup software packages offer the option of Client Compression. If this feature is enabled, the client software running on the backup client compresses the data before sending it to the backup server. But here there is an additional load on the CPU of the backup client.

There exist many compression techniques. The better known techniques are:

Run-length encoding

Huffman coding Adaptive Huffman coding

LZ77

LZ78

LZW

LZS Burrows-Wheeler Transfer (BWT), used in bzip2 algorithm.

We have listed only lossless compression techniques, as those are the only ones relevant in data backup. Lossless compression means that after the data is compressed and subsequently decompressed, the final output exactly matches

the input. In contrast, lossy compression techniques are such that after compression and decompression, the final data is similar to , but not necessarily the same as, the original data.

Prior art may be explained with the help of the following steps:

VIRTUAL TAPE APPLIANCE

Data retrieved from disk on the backup client

Data formatted by backup application on the backup client

Data sent over the network to Virtual Tape Appliance via tape drive

Data received by Virtual Tape Appliance * by emulating tape drive commands

Data compressed b 1y Virtual Tape Appliance

Compressed data written by Virtual Tape Appliance to secondary storage

CLIENT COMPRESSION BY BACKUP SOFTWARE

Data retrieved from disk on the backup client

Data compressed and formatted by backup application on the backup client

1

Compressed data sent over the network to backup server

Compressed data received by backup server

Compressed data written to se 1condary storage by backup server

DISADVANTAGES OF PRIOR ART:

The disadvantage of prior art is that it receives data over the network in an uncompressed form. This limits the speed of the backup.

The backup application can compress the data in the host CPU before sending it for backup but compression is a very burdensome task for the CPU and it slows down the system to a considerable extent during backup. Therefore users are reluctant to exercise this option. Hence, this option is not used very often.

Further in client compression techniques, there is an additional load on the CPU of the backup client.

Thus it is the object of the present invention to rectify the deficiencies of prior art. It is also an object of the present invention to compress the data before being transferred on the network, as it would be transferred in a shorter period of time, thus completing the backup sooner.

DESCRIPTION OF THE INVENTION:

The foregoing objects of the invention are accomplished and the problems and shortcomings associated with prior art techniques and approaches are overcome by the present invention described in the preferred embodiment.

The present invention is a data storage system and a method for its operation. The host whose data is being backed up is known as the backup client. The invention emulates a Virtual Tape Drive or Virtual Disk Drive in the backup client. The data is compressed in the backup client itself, which is the object of the invention. A compression engine is provided in the backup client that works in tandem with the said virtual tape drive or virtual disk drive to compress the data accepted by the virtual tape drive. The compressed data is transferred over the network to the secondary storage server, which in turn stores the data on the secondary storage. In the preferred embodiment, the secondary storage is disk based, although it could also be tape based. The object of compressing data in the hardware in the backup client is to reduce traffic on the network.

In a slightly different embodiment, the implementation of the Virtual Tape Drive may be split between at least two machines.

The Virtual Tape Drive would be implemented as a kernel level driver in the host system. A driver is software that controls a hardware device. In this case, however, there is no physical tape drive, so this driver would emulate the behavior of a tape drive. The data that would have been written to tape is instead saved in memory. It is then compressed and written to the secondary storage.

In the case where a Virtual Disk Drive is implemented, a kernel level driver is used to create a file system within the OS. Commands sent to the file system are re-directed to the secondary storage, except that commands that write or read the files will transparently compress or decompress the file.

It is the object of the present invention to compress the data within a virtual tape drive or virtual disk drive before being transferred on the network to facilitate fast data transfer and complete the back up sooner.

It is also the object of the present invention to allow introduction of new compression techniques, including hardware compression, without modification of backup software.

These and other objects, features and advantages will be readily apparent upon consideration of the following detailed description in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS:

Figurei is a block diagram illustrating the data storage system with its components (virtual tape/ disk drive not shown).

Figure 2 is a block diagram of hardware compression engine. Detailed description of the preferred embodiment is provided herein; however, it is to be understood that the present invention may be embodied in various forms. Therefore, specific details disclosed herein are not to be interpreted as limiting, but rather as a basis for the claims and as a representative basis for teaching one skilled in the art to employ the present invention in virtually any appropriately detailed system, structure or matter.

The present invention will now be described with reference to drawings. Referring to FIG. 1, a block diagram illustrating a data storage system emulating a virtual tape/disk drive according to a preferred embodiment of the present invention is shown. The said data storage emulating a virtual tape/disk drive comprises the following:

Backup client (100), Host platform (110), Hardware compression (120), Network adaptor (130),Local Area Network (140), Backup Server (150), Secondary storage (160), Back up clients (in case of more than one) (170).

The host whose data is being backed up is known as the backup client (100).

The data is compressed in the backup client (100) itself. The invention emulates

a Virtual Tape Drive or Virtual Disk Drive in the backup client. A compression engine (120) is provided in the backup client (100) that works in tandem with the said virtual tape drive or virtual disk drive to compress the data accepted by the virtual tape drive. The compressed data is transferred over the network or LAN (140) to the secondary storage server or back up server (150), which in turn stores the data in the secondary storage (160).

In the preferred embodiment there is at least one host CPU from which data is to be transferred and stored. A virtual tape drive is emulated on at least one of such host CPU and a hardware compression engine is also provided on the same host

CPU.

Figure 2 shows the said hardware compression engine, comprising the following:

Host Interface (210), Input buffer (220), Hash calculator (230), Hash table (240), Linked list Storage (250), String matcher (260), Huffman coder (270) and a CRC calculator (280).

The said virtual tape/disk drive and the said hardware compression engine together compress the data in the host CPU itself before it is transferred to the local area network. Therefore there is a local area network provided for transferring the compressed data from the Host CPU to a secondary storage server, which in turn stores it in a disc. Alternatively even a tape may be used. The compression achieved may be measured by ascertaining the compression ratio or by throughput. The latter is purely implementation dependent The

compression ratio is very highly data dependent, the common range being 2:1 to

3:1.

The present invention may be explained clearly by the following flow chart:

A DATA STORAGE SYSTEM AND A METHOD OF ITS OPERATION

Data retrieved from disk on the host CPU or backup client.

Data formatted by backup a 1pplication on the backup client.

Data written to the front en 1d of virtual tape in backup client

Data compressed by t 1he front end of virtual tape.

Compressed data sent over th etwork to back end of virtual tape.

Compressed data received by back end of virtual tape.

Compressed data written to secondary storage by backup server.

Another embodiment, where a virtual disk is utilized instead of a virtual tape, is explained by the flow chart below:

A DATA STORAGE SYSTEM AND A METHOD OF ITS OPERATION

Data retrieved from disk on the host CPU or backup client.

Data formatted by backup a 1pplication on the backup client.

Data written to virtua Jl disk in the backup client

Data compresse 1d by the virtual disk.

Compressed data sent over the network to backup server.

Compressed data received by backup server.

Compressed data written to secondary storage by backup server.

The embodiments of the invention as described above and the methods disclosed herein will suggest further modification and alterations to those skilled in the art. Such further modifications and alterations may be made without

departing from the spirit and scope of the invention, which is defined by the scope of the following claims.

ADVANTAGES OF THE PRESENT INVENTION:

a) It compresses the data in the back up client itself. . b) As the compression is done before the data is transferred the transfer becomes easier and faster. c) Data back up is extremely fast. d) It saves network bandwidth and disc space.