Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEMS AND METHODS FOR OPTIMIZED SERIALIZATION
Document Type and Number:
WIPO Patent Application WO/2008/054790
Kind Code:
A3
Abstract:
Systems, methods, and computer-readable media for serialization are described. One embodiment includes a method having the steps of reading a data set to be serialized, finding a value of a data structure in the data set, and writing an entry to an data structure map, the entry comprising the value. The method further includes the steps of associating an index key with the entry and writing the index key to a data array.

Inventors:
REIMER CLAY ANDRE (US)
STOTER DARREN (US)
Application Number:
PCT/US2007/023008
Publication Date:
June 26, 2008
Filing Date:
October 31, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
LIVE CARGO INC (US)
REIMER CLAY ANDRE (US)
STOTER DARREN (US)
International Classes:
G06F7/00
Foreign References:
US20050097110A12005-05-05
US20050259594A12005-11-24
Attorney, Agent or Firm:
CALKINS, Charles, W. et al. (LLP1001 West Fourth Stree, Winston-salem NC, US)
Download PDF:
Claims:

CLAIMS

That which is claimed is:

I . A method for serializing data, the method comprising: reading a data set to be serialized; finding a value of a particular data structure in the data set; writing an entry to a map of the particular data structure, the entry comprising the value; associating an index key with the entry; and writing the index key to a data array. 2. The method of claim 1, wherein the particular data structure is a string and the map is a map of strings.

3. The method of claim 1, wherein the map comprises an array of bytes.

4. The method of claim 1, wherein the data array comprises a serialized byte array. 5. The method of claim 1 , wherein the map comprises values of a plurality of particular data structures.

6. The method of claim 1, further comprising: determining whether the map contains the value.

7. The method of claim 1 , further comprising: transmitting the map and the data array across a network.

8. The method of claim 1, further comprising: reading the index key from the data array; and locating the value in the map based at least in part on the index key.

9. The method of claim 1, further comprising: determining if the value is a number; and converting the number to a byte array.

10. The method of claim 1, wherein the entry further comprises a type of the value.

I I . A method for serializing data, the method comprising: reading a data set to be serialized; finding a string in the data set; determining if the string has been written to a string map; writing a string entry into to the string map, the string entry comprised of the string and a type of the string;

associating an index key with the string entry; and writing the index key to a serialized data array.

12. A computer-readable medium on which is encoded program code, the program code comprising: program code for reading a data set to be serialized; program code for finding a value of a data structure in the data set; program code for writing an entry to an data structure map, the entry comprising the value; program code for associating an index key with the entry; and program code for writing the index key to a data array.

13. The computer-readable medium of claim 12, further comprising program code for transmitting the data structure map and the data array across a network.

14. The computer-readable medium of claim 12, further comprising: program code for reading the index key from the data array; and program code for locating the value in the data structure map based at least in part on the index key.

15. The computer-readable medium of claim 12, further comprising: program code for determining if the value is a number; and program code for converting the number to a byte array. 16. The computer-readable medium of claim 12, further comprising: program code for reading the index key from the data array; and program code for locating the value in the data structure map based at least in part on the index key.

Description:

SYSTEMS AND METHODS FOR OPTIMIZED SERIALIZATION

RELATED APPLICATIONS

This application claims priority to provisional U.S. patent application No. 60/855,469, filed on October 31, 3007, titled "Systems and Methods for Serialization," the entirety of which is incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to methods, systems and computer readable media for the serialization of digital information. Some embodiments of the present invention may allow serialization of large data sets for purposes including storage and transmission over a computer network.

BACKGROUND

Users may typically only access and manipulate the state of individual computer software components, such as objects, while the software program is running. Once the program stops running, the specific states of these objects are lost.

To escape this limitation, various computer programming languages implement methods of serialization. Serialization generally refers to the process of saving an object's temporary state onto a storage medium, or transmitting the object's state across a network. Further, deserialization refers to the process of reconstructing an object's state from its serialized form, and loading this saved state into a software program. In this fashion, software developers can save the specific state of a computer software component while the program is executing, and recreate the saved state at a later point in time, or at a different location.

Current methods of serialization pose several problems for enterprises. One significant problem is compatibility. Computers using different software platforms, such as Sun Microsystems' s Java, J2ME, or Microsoft's .NET, may not be able to interact with each other. More specifically, a Java client may not be able to reconstruct the state of an object serialized on a different .NET client. Some software platforms may not even include methods for serialization. A further problem of current serialization techniques is efficiency. Java's serialization method may lead to excessively large serialized data, when dealing with larger and larger data sets. Larger serialized data, in turn, leads to delays in transmitting and deserializing the data.

The need for more efficient and interoperable methods to serialize data is always in demand. Although storage devices have grown in speed and capacity, and computer networks have increased in bandwidth, individuals and businesses can capitalize in many ways from a more efficient method of serializing large sets of data. Businesses may also leverage common software platforms to their advantage across disparate networks.

SUMMARY

The present invention provides systems and methods that may be used by individuals and businesses for serialization of data. Embodiments of the present invention may advantageously facilitate the speed, performance, and convenience of serializing and deserializing data.

Embodiments of the invention may reduce or eliminate the duplication of information in serialized data. These efficiency advantages may lead to faster serialization and deserialization operations, as well as smaller serialized data sets.

More specifically, some embodiments of the invention may create a library, or a map of certain data which is normally written several times to a serialized byte array. Instead of writing data to a serialized byte array numerous times, some embodiments of the invention may write a map index in the serialized byte array. Each map index located in the serialized byte array may point to specific data already written in the separate map. In this manner, embodiments of the invention may avoid writing data multiple times into the serialized byte array, by writing an index to the data in the map instead.

One embodiment includes a method having the steps of reading a data set to be serialized, finding a value of a data structure in the data set, and writing an entry to a data structure map, the entry comprising the value. The method further includes the steps of associating an index key with the entry and writing the index key to a data array. Another embodiment comprises a computer-readable medium for performing the same steps.

As should be understood by those of ordinary skill in the area, the foregoing detailed description illustrates possible embodiments of the present invention. The present invention is capable of being implemented in a variety of different ways, for a variety of different purposes without departing from the scope of the invention. These illustrative embodiments are mentioned not to limit or define the invention, but to provide examples to aid understanding thereof. Illustrative embodiments are discussed in the Detailed Description, and further description of the invention is provided therein. Advantages offered by various embodiments of this invention may be further understood by examining this specification.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features, aspects, and advantages of the present invention are better understood when the following Detailed Description is read with reference to the accompanying drawings, wherein: Figure 1 is a block diagram illustrating a system for serialization according to one embodiment of the present invention;

Figure 2 is a flow chart illustrating a first method for serialization according to one embodiment of the present invention; and

Figure 3 is a flow chart illustrating a second method for serialization according to one embodiment of the present invention.

DETAILED DESCRIPTION

Embodiments of the present invention provide methods, systems, and products for serialization. One embodiment of the present invention may comprise a method for converting a set of data objects into a stream of data, and later reconstructing the set of data objects from that stream of data.

In one illustrative embodiment of the present invention, an array of bytes may be created with the data from a data set to be serialized. A map of strings in the data set may also be created. Instead of writing a string to the serialized array of bytes, each unique string may be written to the map of strings. When the serialization method calls for a string to be written to the array of bytes, an index to the string on the separate map may be written instead.

Since each string may only be written to the string map once, instead of being written multiple times into the serialized byte array, the overall size of the serialized data array may be much smaller than current methods of serialization. Further advantages may be seen in the speed of serializing and deserializing the data, since long or duplicitous read and write operations may be avoided.

Some embodiments of the invention may be particularly advantageous for large data sets, which may require identical strings to be serialized many times in one array. For instance, an improvement in one embodiment of the present invention resulted in a serialized data size 40% smaller than in implementations of other serialization methods. As another example, an embodiment of the invention reduced serialization time by 16% over other serialization methods, and reduced deserialization time by 62%.

As an example of an embodiment of the present invention, a user may want to remotely store a file over a Live Cargo storage system. When the data in a request to

remotely store the file is serialized, the string representing the file name may be written to a map, and a numeric index to that string may be generated. That numeric index may then be written to the serialized byte array, rather than the entire string representing the file name. As the data is being serialized, embodiments of the present invention may avoid writing repeated instances of the string to the serialized byte array. Rather, a numeric index to the string in the map may be written to the serialized byte array in place of the string.

In other embodiments of the invention, other maps of data may be created. In some embodiments, a map of classes in the data to be serialized may be created, along with an index key to each class in the map. In such embodiments, when a class is to be written to the serialized array of bytes, an index key pointing to that class in the map is written to the serialized byte array, rather than the entire class. Such embodiments may lead to increased performance in deserializing data, since classes may be loaded from the class map, rather than created again by the deserialization process.

Other advantages of the present invention may be seen in these and other embodiments. Some embodiments of the invention may reduce or eliminate the number of string manipulations performed by a serialization routine. For example, in some embodiments of the invention, an algorithm may convert numbers into a byte array through a MOD function. More specifically, a MOD function may create a byte for each digit of a number, in order to convert the number into a byte array. Illustrative System for Optimized Serialization

Various systems in accordance with the present invention may be constructed. Referring now to the drawings in which like numerals indicate like elements throughout the several figures, Figure 1 is a block diagram illustrating a system for serialization according to one embodiment of the present invention. The system 100 shown in Figure 1 comprises a first computer 101 in communication with a second computer 111 over a network 120. As shown, network 120 is the Internet. In other embodiments, the network 120 may comprise an intranet, a Local Area Network (LAN), a telephone network, or a combination of suitable networks. The first computer 101 and the second computer 111 may connect to network 120 through wired, wireless, or optical connections.

Examples of a computer 101 include personal computers, digital assistants, personal digital assistants, cellular phones, mobile phones, smart phones, pagers, digital tablets, laptop computers, Internet appliances, and other processor-based devices. In general, a computer 101 may be any type of processor-based platform that operates on any operating system, such

as Microsoft® Windows® or Linux, capable of supporting one or more applications. The computer 101 may store one or more applications in memory 103, and the computer 101 may execute applications with the processor 102.

Specific embodiments of the invention may be platform independent. That is, unlike some methods of serialization, which may only run on one software platform, embodiments of the present invention may operate on several different software platforms, such as Java, J2ME, and .NET. Embodiments of the present invention may operate on many different clients, such as RIM Blackberry devices and Microsoft Windows workstations.

Applications executed by processor 102 may be operable to read and write files to memory 103, and transmit files over the network 120. An application being executed by processor 102 may serialize data in memory 103 and transmit the serialized data over the network 120. An application stored in the memory 113 and being executed by the processor 112 of the second computer 111 may be configured to receive serialized data over the network, and deserialize the data. Illustrative Methods for Optim ized Serialization

Figure 2 is a flow chart illustrating a first method for serialization according to one embodiment of the present invention. In the first method 200, the computer 101 serializes a data set. The data set may be a large data set comprising many thousands of files and folders. In certain embodiments, the data set may comprise many thousands of strings or sequences of characters. For example, each file name and file location in a data set may be represented as a character string comprising a sequence of letters and symbols.

First, the computer 101 may create an array of bytes from a data set to be serialized 202. The array of bytes may be a serialized byte array. In certain embodiments, the serialized byte array may be read by the computer 101 at a later time to reconstruct, or deserialize, the data set. In another embodiment, the computer 101 may transmit the serialized byte array across a network 120 to a second computer 1 11. The second computer 111 may then read the serialized byte array to deserialize the data set.

Then, the computer 101 may create a map of strings found in the data set 204. If the data set contains a file, the computer 101 may write the file name to the map of strings. In some, embodiments, the computer 101 may create a map of data types other than strings. For example, the computer 101 may create an array of floating point data types.

The map of strings may only contain unique strings. In one embodiment, the computer 101 may determine if a string has already been written to the map of strings. If the string has not been written to the map of strings, the computer 101 may write the string to the

map of strings. If the string has already been written to the map of strings, than the computer may determine an index key associated with the string in the map of strings. Although step 202 and step 204 are shown as discrete steps in Figure 2, these steps may occur at substantially the same time. In particular, in certain embodiments, a serialized byte array may be created at or before the same time that the map is created.

Next, the first computer 101 transmits the array of bytes and the map of strings 206 to the second computer 1 11. The first computer 101 may transmit the serialized byte array and the string map across a network 120. In another embodiment, the first computer 101 may store the serialized byte array and the string map on a storage medium that the second computer 11 1 is configured to access. In certain other embodiments, a computer 101 that serializes a data set does not transmit the serialized byte array and the string map to a second computer.

After the second computer 111 receives the array of bytes and the map of strings 208, the second computer 111 may proceed to deserialize the data set. To deserialize the data set, the second computer 11 1 may examine the array of bytes

210. When the computer 111 encounters an index key in the array of bytes, the computer 11 1 may look up the corresponding value in the string map 212. As the second computer reads each index key from the serialized byte array, the second computer reads each corresponding value from the string map. By traversing the entire serialized byte array, the second computer 111 may deserialize the entire data set.

Figure 3 is a flow chart illustrating a second method for serialization according to one embodiment of the present invention. In the embodiment shown, the computer 101 first reads a data set to be serialized 302.

Next, the computer 101 finds a value of a particular data structure in the data set 304. For example, the particular data structure may be a string. The value may comprise a value of a string, such as a file name or a directory name.

The method then comprises the step of writing an entry to a map of the particular data structure 306. For example, the computer 101 may write the string value to a map of string values. The entry in the map may also comprise a type of the value. For example, an entry comprising a file name may also comprise the type of the file name string, such as a character string. In another example, the type of the string may be an integer string.

In step 308, the method associates an index key with the object entry. The index key may point to a particular location in the map to find a value.

In step 310, the method writes the index key to a data array. The data array may be a serialized data array comprising an array of bytes. The computer 101 may create the data array and the map to serialize the data set. Although step 310 is shown to follow step 306 in Figure 3, these steps may occur substantially simultaneously. That is, computer 101 may write the string value to the map at substantially the same time that the computer 101 writes the index key to the serialized byte array. In the

To deserialize the data set, or reconstruct the data set from the data array and the map, the computer 101 may look up each value in the map which is referenced by an index key in the data array. For example, the computer 101 may look up each string in the map indexed in the data array.

Because certain embodiments of the map of the present invention may only comprise unique values, the combined size of the map and the data array may be smaller than other traditional methods for serialization. Specifically, the map may contain no duplicated data values. This is particularly advantageous for data sets with multiple instances of the same value, such as a data set that has the same file located in multiple directories. Because a string map may only contain one instance of the file name string, the string map and the data array may be smaller than a serialized byte array comprising every string in the data set.

Illustrative Platforms for Optimized Serialization Referring again to Figure 1 , embodiments of the present invention can be implemented in digital electronic circuitry, computer hardware, firmware, software, or in combinations of them.

In one embodiment, a computer 101 comprises a processor 102 or processors. The processor 102 comprises or is in communication with a computer-readable medium, such as a random access memory (RAM) coupled to the processor. The processor 102 executes computer-executable program instructions stored in memory 103, such as executing instructions for serializing data. Such processors may comprise a microprocessor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), field programmable gate arrays (FPGAs), and state machines. Such processors may further comprise programmable electronic devices such as PLCs, programmable interrupt controllers (PICs), programmable logic devices (PLDs), programmable read-only memories (PROMs), electronically programmable read-only memories (EPROMs or EEPROMs), or other similar devices.

Such processors may comprise, or may be in communication with, media, for example computer-readable media, that may store instructions that, when executed by the

processor, can cause the processor to perform the steps described herein as carried out, or assisted, by a processor. Embodiments of computer-readable media may comprise, but are not limited to, an electronic, optical, magnetic, or other storage or transmission device capable of providing a processor, such as the processor in a web server, with computer- readable instructions. Other examples of media comprise, but are not limited to, a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, ASIC, configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read. Also, various transmission devices or channels, such as a router, private or public network, may comprise other forms of computer-readable media to facilitate transmitting or carrying instructions to a computer. The processor, and the processing, described may occur in one or more structures, and may be dispersed through one or more structures. The processor may comprise code for carrying out one or more of the methods (or parts of methods) described herein.

General The foregoing description of the embodiments, including preferred embodiments, of the invention has been presented only for the purpose of illustration and description and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Numerous modifications and adaptations thereof will be apparent to those skilled in the art without departing from the spirit and scope of the invention.