METHOD OF DATA OPTIMIZATION FOR LOSSLESS COMPRESSION, DATA COMPRESSION, AND DECOMPRESSION APPARATUS

Title:

METHOD OF DATA OPTIMIZATION FOR LOSSLESS COMPRESSION, DATA COMPRESSION, AND DECOMPRESSION APPARATUS

Document Type and Number:

WIPO Patent Application WO/2023/172156

Kind Code:

Abstract:

A method for optimization of lossless compression provides that a sample data sequence of an input data is split into data words of a fixed size. Further, one or more byte positions in the data words are marked according to different permutation patterns and the marked bytes are moved in the beginning to obtain permuted data that is compressed to obtain compressed permuted data. Further, the compressed permuted data is analyzed to select the permutation pattern that shows the highest compression ratio. The selected permutation pattern is then applied to the whole data.

Inventors:

BARDASHEVICH ADAM VICTOROVICH (CN)
KRINOV PETR SERGEEVICH (CN)
ROMANOVSKII ALEKSEI VALENTINOVICH (CN)
ILISAVSKII SERGEY SERGEEVICH (CN)

Application Number:

PCT/RU2022/000072

Publication Date:

September 14, 2023

Filing Date:

March 11, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HUAWEI TECH CO LTD (CN)
BARDASHEVICH ADAM VICTOROVICH (CN)

International Classes:

H03M7/30

Foreign References:

US20100281079A1

2010-11-04

Other References:

"Data Compression Explained", INTERNET CITATION, 15 April 2013 (2013-04-15), XP002796495, Retrieved from the Internet [retrieved on 20191212]
ANONYMOUS: "bzip2", WIKIPEDIA, 12 January 2022 (2022-01-12), pages 1 - 7, XP093010305, Retrieved from the Internet [retrieved on 20221221]

Attorney, Agent or Firm:

LAW FIRM "GORODISSKY & PARTNERS" LTD. et al. (RU)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method (100) of data optimization for lossless compression, the method (100) comprising, at a compression side: obtaining a sample data sequence of a predefined size from an input data, splitting the sample data sequence into data words of a fixed size in bytes, defining a set of permutation patterns that mark one or more byte positions in the data words, wherein one permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions, and an identifier, ID, for each permutation pattern, obtaining a set of permuted data sequences by reordering bytes in the sample data sequence using the permutation patterns, wherein each permuted data sequence is obtained by sequentially moving, to a beginning of the permuted data sequence, bytes from marked byte positions in the data words of the sample data sequence, in accordance with certain permutation pattern, and then sequentially moving bytes from non-marked byte positions, wherein different permuted data sequences are obtained using different permutation patterns, compressing each of the permuted data sequences by a compression algorithm, selecting a permutation pattern used in a compressed permuted data sequence of a minimum size among the compressed permuted data sequences, encoding an ID of the selected permutation pattern, obtaining a compressed data by reordering bytes in the input data using the selected permutation pattern to obtain a permuted data and compressing the permuted data by the compression algorithm, and obtaining an output data by appending the encoded ID to the compressed data.

2. The method (100) of claim 1, wherein the predefined size of the sample data sequence is defined by an external source and/or based on characteristics of one or more of the input data and the compression algorithm.

3. The method (100) of claim 2, wherein obtaining the sample data sequence comprises extracting a sample of the predefined size from the input data as the sample data sequence.

4. The method (100) of claim 1, wherein obtaining the sample data sequence comprises selecting the whole input data as the sample data sequence, wherein obtaining the compressed data comprises selecting the compressed permuted data sequence of the minimum size as the compressed data.

5. The method (100) of any of claims 1 to 4, wherein the fixed size of data words is defined by an external source and/or based on characteristics of the compression algorithm.

6. The method (100) of any of claims 1 to 5, further comprising, at a decompression side: receiving the output data from the compression side, obtaining the set of permutation patterns with IDs of the permutation patterns and the fixed size of data words, obtaining the compressed data by detaching the encoded ID from the output data, decoding an ID of the selected permutation pattern used in the compressed data from the encoded ID, decompressing the permuted data from the compressed data by a decompression algorithm corresponding to the compression algorithm, and obtaining the input data by reverse reordering bytes in the permuted data using the permutation pattern corresponding to the decoded ID.

7. A data compression apparatus (200), comprising: a compression controller (204) configured for obtaining a sample data sequence of a predefined size from an input data, and splitting the sample data sequence into data words of a fixed size in bytes, a permutation module (206) configured for defining a set of permutation patterns that mark one or more byte positions in the data words, wherein one permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions, and an identifier, ID, for each permutation pattern, and obtaining a set of permuted data sequences by reordering bytes in the sample data sequence using the permutation patterns, wherein each permuted data sequence is obtained by sequentially moving, to a beginning of the permuted data sequence, bytes from marked byte positions in the data words of the sample data sequence, in accordance with certain permutation pattern, and then sequentially moving bytes from non-marked byte positions, wherein different permuted data sequences are obtained using different permutation patterns,

3U a compression module (208) configured for compressing each of the permuted data sequences by a compression algorithm, a comparison module (210) configured for selecting a permutation pattern used in a compressed permuted data sequence of a minimum size among the compressed permuted data sequences, an ID encoder (212) configured for encoding an ID of the selected permutation pattern, wherein the compression controller (204) is configured for controlling the permutation module (206) to obtain a permuted data by reordering bytes in the input data using the selected permutation pattern and controlling the compression module (208) to obtain a compressed data by compressing the permuted data by the compression algorithm, and an output module (214) configured for obtaining an output data by appending the encoded ID to the compressed data.

8. The data compression apparatus (200) of claim 7, wherein the predefined size of the sample data sequence is defined by an external source and/or based on characteristics of one or more of the input data and the compression algorithm.

9. The data compression apparatus (200) of claim 8, wherein the compression controller (204) is configured for obtaining the sample data sequence by extracting a sample of the predefined size from the input data.

10. The data compression apparatus (200) of claim 7, wherein the compression controller (204) is configured for obtaining the sample data sequence by selecting the whole input data as the sample data sequence, and the compression controller (204) is configured for controlling the permutation module (206) and the compression module (208) for obtaining the compressed data by selecting the compressed permuted data sequence of the minimum size as the compressed data.

11. The data compression apparatus (200) of any of claims 7 to 10, wherein the fixed size of data words is defined by an external source and/or based on characteristics of the compression algorithm.

12. A data decompression apparatus (700), comprising: a decompression controller (704) configured for receiving an output data comprising a compressed data and an encoded ID of a selected permutation pattern used in the compressed data, obtaining a set of permutation patterns with IDs of the permutation patterns and a fixed size of data words, and obtaining the compressed data by detaching the encoded ID from the output data, an ID decoder (706) configured for decoding an ID of the selected permutation pattern used in the compressed data from the encoded ID, a decompression module (708) configured for decompressing a permuted data from the compressed data by a decompression algorithm, and a reverse permutation module (710) configured for obtaining an input data by reverse reordering bytes in the permuted data using the permutation pattern corresponding to the decoded ID.

Description:

METHOD OF DATA OPTIMIZATION FOR LOSSLESS COMPRESSION, DATA COMPRESSION, AND DECOMPRESSION APPARATUS

TECHNICAL FIELD

The present disclosure relates generally to the field of data compression and more specifically, to a method of data optimization for lossless compression, data compression, and decompression apparatus.

BACKGROUND

With the rapid advancement in recent technology, the world is moving towards digitalization. Today, most of the work is performed digitally, which helps in maintaining data records easily without carrying files or documents. But a large size of data requires a large memory space for storage and takes time to transfer or download, due to which data compression technique was introduced to reduce the size of stored, transmitted, or loaded data. Conventionally, compression techniques process data in small words, such as one word being used as a single piece of information, and does not allow to find in-word data redundancy

Conventionally, certain attempts have been made to find in-word redundancy, such as by slowly transforming an original data into a pair of alphabets. Since the transformation is based on a combinatorial approach, therefore needs to traverse data several times along with big number computations, which is not desirable. In certain scenarios, lossless data compression and decompression were disclosed in which an encoder was used to perform the compression. However, such scenarios do not disclose about applying specific data transformations related to data type. To overcome this, another method was introduced that discloses lossless video encryption compression transmission based on ranking permutation. In such a case, the data was rearranged based on order replacement to increase compressibility. Furthermore, such a conventional method does not disclose about checking compressed data size after compression of each permutation result. To address the challenge, another method was introduced for files compression that was based on permutation transforming before compression to reduce output file size and reverse permutation transforming during decompression. However, the permutation described in such a method is based on an explicit property of sound files, which is ambiguous for general data, and not desirable, because the problem is still not resolved. As a result, there exists a technical problem of how to perform lossless compression of data without facing in-word data redundancy and with an improved compression ratio. i Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional compression techniques.

SUMMARY

The present disclosure provides a method of data optimization for lossless compression. The present disclosure further provides a data compression apparatus and a data decompression apparatus. The present disclosure provides a solution to the existing problem of how to perform data compression without facing the problem of in-word data redundancy and with an improved compression ratio. An objective of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in the prior art and provides an improved method of data optimization for lossless compression, an improved data compression apparatus, and an improved data decompression apparatus.

One or more objectives of the present disclosure is achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.

In one aspect, the present disclosure provides a method of data optimization for lossless compression. The method includes, at a compression side obtaining a sample data sequence of a predefined size from an input data, splitting the sample data sequence into data words of a fixed size in bytes, defining a set of permutation patterns that mark one or more byte positions in the data words. Further, one permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions, and an identifier, ID, for each permutation pattern. Further, obtaining a set of permuted data sequences by reordering bytes in the sample data sequence using the permutation patterns, wherein each permuted data sequence is obtained by sequentially moving, to a beginning of the permuted data sequence, bytes from marked byte positions in the data words of the sample data sequence, in accordance with certain permutation pattern, and then sequentially moving bytes from non-marked byte positions. Further, different permuted data sequences are obtained using different permutation patterns, compressing each of the permuted data sequences by a compression algorithm, selecting a permutation pattern used in a compressed permuted data sequence of a minimum size among the compressed permuted data sequences. Further, encoding an ID of the selected permutation pattern, obtaining compressed data by reordering bytes in the input data using the selected permutation pattern to obtain a permuted data and compressing the permuted data by the compression algorithm, and obtaining an output data by appending the encoded ID to the compressed data. Beneficially, the sample data sequence is split into data words of a fixed size to match the size of data words with the size used in a compression algorithm. Further, the one or more byte positions in the data words are marked to move the marked byte at the beginning of a data block that is being compressed. Further, the bytes in the sample data sequence are reordered using the permutation patterns to obtain a set of permutations that helps in preprocessing the input data before performing the compression. Moreover, the best permutation pattern is selected according to the size of the compressed permuted data sequence. In addition, the ID of the selected permutation pattern is encoded to use that permutation pattern in the input data. Further, the output data is obtained after appending the encoded ID with the compressed data to perform lossless compression. The method reorders bytes of input data before compression to transform the data into a more compressible format and increases the productivity of the compression algorithm. The reordering of bytes performed by the method is lossless reversible data transformation because the method marks the bytes before reordering and saves the pattern to reverse the process easily. Moreover, reordering the bytes and selecting the best pattern is suitable for providing the minimum size of data. The original bytes order is restored by reversing the pattern selected in pre-processing, thus the method recovers the data without loss of information. Beneficially, the restoration of data can be performed in one pass.

In an implementation form, the predefined size of the sample data sequence is defined by an external source and/or based on characteristics of one or more of the input data and the compression algorithm.

It is advantageous to define the size of the sample data sequence based on characteristics of the one or more of the input data and the compression algorithm for matching the size of the sample data sequence with the size used in a compression algorithm.

In a further implementation form, obtaining the sample data sequence comprises extracting a sample of the predefined size from the input data as the sample data sequence.

It is advantageous to use the sample from the input data to find the most appropriate permutation pattern, and then the selected permutation pattern is applied to the input data.

In a further implementation form, obtaining the sample data sequence comprises selecting the whole input data as the sample data sequence. Further, obtaining the compressed data comprises selecting the compressed permuted data sequence of the minimum size as the compressed data.

It is advantageous to obtain sample data sequence from the input data as it optimizes the performance. Moreover, the compression ratio of the compressed data is also improved. In a further implementation form, the fixed size of data words is defined by an external source and/or based on characteristics of the compression algorithm.

It is advantageous to define fixed size to data words based on characteristics of the compression algorithm for matching the size of the sample data sequence with the size used in a compression algorithm.

In a further implementation form, the method includes, at a decompression side, receiving the output data from the compression side, obtaining the set of permutation patterns with IDs of the permutation patterns and the fixed size of data words, obtaining the compressed data by detaching the encoded ID from the output data, decoding an ID of the selected permutation pattern used in the compressed data from the encoded ID, decompressing the permuted data from the compressed data by a decompression algorithm corresponding to the compression algorithm, and obtaining the input data by reverse reordering bytes in the permuted data using the permutation pattern corresponding to the decoded ID.

The output data is received for performing the decompression, the received output data is in the compressed and permuted form. Moreover, the output data has the encoded ID of the permutation pattern that was applied during permutation. Further, the encoded ID is detached from the output data to obtain the compressed data, which is further decoded to obtain the permutation pattern. Further, decompression is performed to obtain the decompressed data, without losing any information.

In another aspect, the present disclosure provides a data compression apparatus, comprising a compression controller that is configured for obtaining a sample data sequence of a predefined size from an input data, and splitting the sample data sequence into data words of a fixed size in bytes, a permutation module configured for defining a set of permutation patterns that mark one or more byte positions in the data words. One permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions and an identifier, ID, for each permutation pattern. Further, obtaining a set of permuted data sequences by reordering bytes in the sample data sequence using the permutation patterns, wherein each permuted data sequence is obtained by sequentially moving, to a beginning of the permuted data sequence, bytes from marked byte positions in the data words of the sample data sequence, in accordance with certain permutation pattern, and then sequentially moving bytes from nonmarked byte positions. Further, different permuted data sequences are obtained using different permutation patterns, a compression module configured for compressing each of the permuted data sequences by a compression algorithm, a comparison module configured for selecting a permutation pattern used in a compressed permuted data sequence of a minimum size among the compressed permuted data sequences. Further, a comparison module configured for selecting a permutation pattern used in a compressed permuted data sequence of a minimum size among the compressed permuted data sequences, an ID encoder configured for encoding an ID of the selected permutation pattern, and the compression controller is configured for controlling the permutation module to obtain a permuted data by reordering bytes in the input data using the selected permutation pattern and controlling the compression module to obtain a compressed data by compressing the permuted data by the compression algorithm. Further, an output module is configured for obtaining an output data by appending the encoded ID to the compressed data.

The data compression apparatus achieves all the advantages and technical effects of the method of the present disclosure

In yet another aspect, the present disclosure further provides a data decompression apparatus, comprising a decompression controller configured for receiving an output data comprising a compressed data and an encoded ID of a selected permutation pattern used in the compressed data, obtaining a set of permutation patterns with IDs of the permutation patterns and a fixed size of data words. Further, obtaining the compressed data by detaching the encoded ID from the output data, an ID decoder configured for decoding an ID of the selected permutation pattern used in the compressed data from the encoded ID. Further, a decompression module is configured for decompressing permuted data from the compressed data by a decompression algorithm, and a reverse permutation module is configured for obtaining an input data by reverse reordering bytes in the permuted data using the permutation pattern corresponding to the decoded ID.

The decompression controller of the decompression apparatus receives the output data that is in compressed and permuted form. Moreover, the output data has the encoded ID of the permutation pattern that was applied during permutation. Further, the decompression controller detaches the encoded ID from the output data to obtain the compressed data. Further, the ID decoder decodes the detached encoded ID to obtain the permutation pattern. Further, the decompression is performed by the decompression algorithm of the decompression module to obtain the decompressed data, without losing any information.

It has to be noted that all devices, elements, circuitry, units, and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.

Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.

Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:

FIG. 1 is a flow chart of a method of data optimization for lossless compression, in accordance with an embodiment of the present disclosure;

FIG. 2 is a block diagram of a data compression apparatus, in accordance with an embodiment of the present disclosure;

FIG. 4 is an illustration that depicts different types of permutation patterns, in accordance with an embodiment of the present disclosure;

FIG. 5 is a flow chart of compressing the different types of permutation patterns, in accordance with an embodiment of the present disclosure; FIG. 6 is an illustration of a sequence of different types of permutation patterns, in accordance with an embodiment of the present disclosure;

FIG. 7 is a block diagram of a data decompression apparatus, in accordance with an embodiment of the present disclosure;

FIG. 8 is another flow chart that depicts steps to obtain decompressed data, in accordance with an embodiment of the present disclosure; and

FIG. 9 is a flow chart that depicts steps for data recovery after decompression, in accordance with an embodiment of the present disclosure.

In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the nonunderlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.

DETAILED DESCRIPTION OF EMBODIMENTS

The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.

FIG. 1 is a flow chart of a method of data optimization for lossless compression, in accordance with an embodiment of the present disclosure. With reference to FIG. 1, there is shown a flow chart of a method 100 of data optimization for lossless compression. The method 100 includes steps 102 to 118.

There is provided the method 100 of data optimization for lossless compression. The method 100 provides an improved compression ratio in a compression algorithm that manipulates fixed-size data words during compression. The method 100 assumes pre-processing data before compression to make in-word data redundancy explicit to the compression algorithm. In other words, the method 100 includes, pre-processing of the data to increase the data compression ratio.

At step 102, the method 100 includes, obtaining a sample data sequence of a predefined size from an input data. In other words, to perform the pre-preprocessing of the input data, firstly, the sample data sequence of predefined size is used, and the predefined size is a small part of the input data. Examples of the input data may include but are not limited to images, documents, videos, and the like. In an implementation, the input data is referred to as a data block that is a bigger piece of consecutive data usually from 4-kilo bytes (KB) to a few megabytes (MB). In an example, the predefined size of the sample data sequence is measured in bytes, such as 512 bytes. In an embodiment, the predefined size of the sample data sequence is defined by an external source and/or based on characteristics of one or more of the input data and the compression algorithm. In an implementation, the external source defines the predefined size of the sample data sequence. In another implementation, the predefined size of the sample data sequence depends upon the characteristics of one or more of the input data and the compression algorithm. In yet another implementation, the predefined size of the sample data sequence is defined by an external source and based on characteristics of one or more of the input data and the compression algorithm.

In accordance with an embodiment, the method 100 comprises, obtaining the sample data sequence that includes extracting a sample of the predefined size from the input data as the sample data sequence. In other words, the sample data sequence is a small part of the input data with the predefined size. In an implementation, a compression controller is used for obtaining the sample data sequence. Further, the sample data sequence is extracted by the method 100 for optimization.

At step 104, the method 100 includes, splitting the sample data sequence into data words of a fixed size in bytes. In other words, the sample data sequence that is extracted from the input data is split into the data words of fixed size (e.g., in bytes). In an implementation, a compression controller is used for splitting the sample data sequence into data words. In an example, the data word is a small piece of data that includes a sequence of a few bytes (e.g., from 2 to 8 bytes). For example, the predefined size of the sample data sequence is 512 bytes, and the fixed size of the data words is around 4 bytes, as further shown and described in FIG. 4. In accordance with an embodiment, the fixed size of data words is defined by an external source and/or based on characteristics of the compression algorithm. In an implementation, the external source defines the fixed size of the data words. In another implementation, the fixed size of the data words depends upon the compression algorithm, such as LZ4 compression algorithm is used for a size of 4 bytes. In yet another implementation, the fixed size of data words is defined by an external source and based on the characteristics of the compression algorithm.

At step 106, the method 100 includes, defining a set of permutation patterns that mark one or more byte positions in the data words. Further, one permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions, and an identifier (ID) for each permutation pattern. In other words, the set of permutation patterns is defined to mark the one or more byte positions in the data words of fixed size to reorder the bytes of the data words. In an implementation, a permutation module is used to define the set of permutation patterns. Further, one permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions and an identifier ID (e.g., permutation ID) for each permutation pattern. In other words, the byte position marked in one data word is the same as marked in the consecutive data word. In an example, one permutation pattern marks a third-byte position in each data word. In another example, another permutation pattern marks a fourth-byte position in each data word. In yet another example, one permutation pattern marks a second-byte position and fourth-byte position in each data word, as further shown in FIG. 6. For example, if the fourth-byte position is marked in one data word of a first permutation pattern, then the fourth-byte position is needed to be marked of a second data word, and similarly for subsequent data words. Such marking of bytes positions in different permutation patterns is beneficial to select the appropriate permutation pattern, which brings the input data in a more compressible format. Further, the method 100 includes checking the byte positions without marking, and the order of the byte positions that are not marked is not changed. In other words, unmarked byte positions preserve their original order.

At step 108, the method 100 includes, obtaining a set of permuted data sequences by reordering bytes in the sample data sequence using the permutation patterns. Further, each permuted data sequence is obtained by sequentially moving, to a beginning of the permuted data sequence, bytes from marked byte positions in the data words of the sample data sequence, in accordance with certain permutation patterns, and then sequentially moving bytes from non-marked byte positions. Further, different permuted data sequences are obtained using different permutation patterns. In other words, to obtain the set of permutation data sequences, the bytes of the sample data sequence is reordered using the permutation patterns. In an implementation, a permutation module is used to obtain the set of permuted data sequences. Further, in certain permutation patterns the bytes are reordered in such a way that firstly the bytes are marked in each of the data words, and then sequentially moving the bytes from marked byte positions in the data words of the sample data sequence to the beginning of the permuted data sequence.

In other words, the bytes from the marked bytes positions are moved forward, and then sequentially moving bytes from non-marked byte positions, such as the rest of the bytes are placed sequentially after the marked byte positions. In an example, the arrangement of bytes is different in each permutation pattern. For example, if the fourth-byte position is marked in one data word of a first permutation pattern, then the fourth-byte position is needed to be marked of a second data word, and the like. After marking the bytes in the data words, the bytes are reordered in such a way that the fourth byte of each data word is placed at the beginning of the permuted data sequence, and then the other bytes are placed sequentially, as further shown and described in FIG. 6. Similarly, if a third-byte position is marked in one data word of a permutation pattern, then the third byte of each data word is placed at the beginning of the permuted data sequence, and then the other bytes are placed sequentially, and the like. In such a way, multiple permutation patterns are obtained. In an example, the sample data sequence with all the bytes marked in the data word is skipped because the order of bytes is the same as in the original data.

In an implementation, the method 100 reorders bytes of the input data before compression, and such reordering is lossless reversible data transformation. Moreover, the reordering of bytes follows a specific repeating pattern that is known as permutation patterns. The permutation pattern is applied to the data words, word-by-word, repeatedly by moving predefined bytes of a word to the beginning of the data block, which is being compressed. In an example, the reordering follows the same pattern for each fixed-size data word to combine defined bytes from all data words together.

At step 110, the method 100 includes, compressing each of the permuted data sequences by a compression algorithm. In an implementation, a compression module is used for compressing each of the permuted data sequences. In an example, the LZ4 (number 4 stands for 4-byte data words) compression algorithm is used to compress each of the permutated data sequences. However, another compression algorithm can also be used, without limiting the scope of the disclosure. Further, the compression algorithm is to reduces the size of the data. Further, each of the permuted data sequences is compressed separately by the compression algorithm to provide multiple permuted data sequences in compressed form. The obtained multiple permuted data sequence after compression are of different sizes because of the different permutation patterns used in each of the sample data sequences.

At step 112, the method 100 includes, selecting a permutation pattern used in a compressed permuted data sequence of a minimum size among the compressed permuted data sequences. In other words, after performing compression on each of the permuted data sequences, multiple compressed permuted data sequence is obtained that are of various sizes. In an implementation, a comparison module is used for selecting the permutation pattern, such as for selecting the best permutation pattern from the compressed permuted data sequences. Moreover, the variation of size in each compressed permuted data sequence is because of the different permutation patterns used by the method 100. Further, the compressed permuted data sequence, which contains the least size among all the compressed permuted data sequences is selected. At step 114, the method 100 includes, encoding the ID of the selected permutation pattern. In other words, the ID (or permutation ID) of the permutation pattern that is selected after compression of the permuted data sequence is encoded to use that particular permutation pattern on the whole input data. In an implementation, an identifier encoder (or a permutation ID encoder) is used for encoding the ID of the selected permutation pattern. Further, the encoded ID of the selected permutation pattern is beneficial to recover the original data by reading the applied permutation pattern with the help of the ID of the corresponding permutation pattern.

At step 116, the method 100 includes, obtaining compressed data by reordering bytes in the input data using the selected permutation pattern to obtain permuted data and compressing the permuted data by the compression algorithm. In other words, after analyzing the compression ratio of the results obtained after performing permutation and compression on the sample data sequence, the method 100 selects the best permutation pattern and apply on the whole input data. In other words, the bytes of the input data are reordered according to the selected permutation pattern to obtain the compressed data and the permuted data. Further, the compression algorithm compresses the permuted data that is obtained by reordering the whole input data. In an embodiment, obtaining the sample data sequence includes selecting the whole input data as the sample data sequence. Further, obtaining the compressed data includes selecting the compressed permuted data sequence of the minimum size as the compressed data. In other words, the whole input data is selected as the sample data sequence of predefined size and the sample data sequence is the small part of the input data. Further, the compressed permutated data sequence of the minimum size is selected to obtain the compressed data of various sizes. After that, the compressed permuted data with the least size among all the compressed permuted data is selected as the finest compressed data.

At step 118, the method 100 includes, obtaining an output data by appending the encoded ID to the compressed data. In other words, the compressed data is appended with the encoded ID of the selected permutation pattern to obtain the output data. The output data helps in recovering the original data easily by reading the applied permutation pattern with the help of the encoded ID of that permutation pattern. Further, the output data is obtained after appending the encoded ID with the compressed data to perform lossless compression. In an example, the difference analyzed in compression ratio by using the method is around 8.390607% and 11.68099%.

In accordance with an embodiment, the method 100 further includes, receiving the output data from the compression side, and obtaining the set of permutation patterns with IDs of the permutation patterns and the fixed size of data words. The output data is permuted, compressed, and appended with the encoded ID. Further, the method 100 includes receiving the output data for performing a decompression process, and to recover the original data without losing any information. Further, the output data may include but is not limited to images, documents, videos, and the like. In an implementation, a decompression controller is used for receiving the output data. Further, to recover the original data, the permutation pattern according to which the original data was permuted, and the ID of the permutation pattern are required to obtain the set of permutation patterns and the IDs of the permutation patterns. Furthermore, the data words with fixed size are also required to recover the original data without losing any information. Further, the obtained data blocks are in compressed format. In an example, the data block of 4 bytes is used. However, the size of the data blocks can have other possible values without limiting the scope of the present disclosure.

In an implementation, the method 100 further includes, obtaining the compressed data by detaching the encoded ID from the output data and decoding an ID of the selected permutation pattern used in the compressed data from the encoded ID. In other words, firstly the encoded ID of the permutation pattern is detached from the output data to obtain the compressed data and to perform the decompression on the compressed data. After detaching the encoded ID from the output data, the ID is decoded of the selected permutation pattern according to which the original data was permuted. In an implementation, an identifier decoder (or a permutation ID decoder) is used for decoding the ID of the selected permutation pattern. The bytes of the permuted data are arranged back to their original position in which they were before permutation.

In an implementation, the method 100 further includes, decompressing the permuted data from the compressed data by a decompression algorithm corresponding to the compression algorithm. In other words, the permuted data is decompressed by the decompression algorithm. In an implementation, a decompression module is used for decompressing each of the permuted data sequences. In an example, the LZ4 (number 4 stands for 4-byte data words) decompression algorithm is used for performing the decompression. However, another decompression algorithm can also be used, without limiting the scope of the disclosure. The decompression is performed to recover the data by reversing the compression applied on the permuted data. By reversing the compression, the permuted data comes back in its original size. In an implementation, the method 100 further includes, obtaining the input data by reverse reordering bytes in the permuted data using the permutation pattern corresponding to the decoded ID. The permutation pattern applied on the permuted data is detected by decoding the ID of the permutation pattern. In an implementation, a reverse permutation module is used for reverse reordering bytes of the permuted data sequences. After decompressing the permuted data, the bytes of the permuted data are arranged back to their original position in which they were before permutation. Further, the i2 arrangement of bytes back to the original position is performed with respect to the permutation pattern used when the bytes were permuted. Further, after reversing the permutation pattern, the original data is obtained that is the input data, which was received for compression.

In an exemplary embodiment, the permutation is analyzed, and a permutation (P) is used for the analysis of data sequence ‘a’ on which the permutation (P) is applied, and resulting in data sequence is P(a). Moreover, P. |a| corresponds to a length of a compressed sequence ‘a’, and |P(a)| corresponds to a length of compressed sequence after transformation. In an implementation, if the permutation is applied, then inequality |a| < |P(a)| + C holds for most of the cases for some small constant C. Further, the given in-equation holds for some generated set of sequences and the compression ratio is compared by performing the compression with permutation and without permutation.

The method 100 is used for improving the compression ratio and avoids the problem of loss of data after compression. The method 100 increases the capabilities of any compression algorithm by transforming the data into a more compressible format. Moreover, the method 100 preprocess the data before compression and analyzes samples of the data before compressing the whole data. Further, the method 100 performs permutation in the data words in a repetitive manner word-by- word that allow to turn in-word data redundancy to block-level data redundancy. Beneficially, as compared to conventional approaches, the method 100 is used to increase compression ratio and compression speed that helps in minimizing the size of the data in very less time. As the method 100 receives the data for compression, it splits the sample of the data into bytes and rearranges those bytes to make various permutation patterns. The sampling approach allows decreasing the time of permutation choice. After rearranging the bytes, the method 100 compresses the samples and analyses the size of the samples to select the permutation pattern that minimized the size of the sample. In such a way, the method 100 compares all possible permutation patterns for a given size of the data word in terms of end-to-end compression ratio of the data block, which allows finding optimal permutation for each data block. The method 100 reorders bytes of input data before compression to transform the data into a more compressible format and increases the productivity of the compression algorithm. The reordering of bytes performed by the method 100 is lossless reversible data transformation because the method 100 marks the bytes before reordering and saves the pattern to reverse the process easily and obtain the data after decompression. Moreover, the method 100 analyses multiple patterns of reordering the bytes and selects the best pattern that is suitable for providing the minimum size of data. The reordering of bytes follows the same pattern for each fixed-size data word to combine defined bytes from all words together. Further, the method 100 restores the original bytes order of data after decompression. The original bytes order is restored by reversing the pattern selected in preprocessing, thus the method 100 recovers the data without loss of information and the restoration of data is performed in one pass. It can be noted that if one go over the data is negligible compared to decompression complexity; then decompression speed is not affected.

The steps 102 to 118 are only illustrative, and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.

FIG. 2 is block diagram of a data compression apparatus, in accordance with an embodiment of the present disclosure. FIG. 2 is described in conjunction with elements from FIG 1. With reference to FIG 2, there is shown a data compression apparatus 200, a data source 202, a compression controller 204, a permutation module 206, a compression module 208, a comparison module 210, an ID encoder 212, and an output module 214.

The data compression apparatus 200 is used for improving the compression ratio of any compression algorithm that manipulates the fixed-size data words during compression. The data compression apparatus 200 increases the capabilities of any compression algorithm by transforming the data into a more compressible format. Moreover, the data compression apparatus 200 pre-process the data before compression and analyzes samples of the data before compressing the whole data.

The data source 202 is the element from which the data is received for compression. The compression controller 204 is used to control all the other elements for performing the compression. The compression controller 204 receives the data from the data source 202 and extracts a sample from the data, and transmits the sample for the permutation. Further, the compression controller 204 receives the permuted data with the finest permuted pattern and applies that permutation pattern on the whole data. After performing permutation on the whole data, the compression controller 204 transmits the data for compression. Examples of implementation of the compression controller 204 may include but are not limited to a central data processing device, a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a state machine, and other processors or control circuitry.

The permutation module 206 applies various permutation patterns on the sample of the data received from the compression controller 204. After applying the permutation patterns on the samples, the permutation module 206 sends permuted samples for comparison. The compression module 208 is to compress the data received from the compression controller 204. The compression module 208 uses a compression algorithm to compress the data. The comparison module 210 receives the permuted data from the permutation module 206 for comparing the permutation data to select the finest permutation pattern that has minimizes the size of the sample. Further, the comparison module 210 sends the selected permutation pattern to the compression controller 204.

The ID encoder 212 is to encode the ID of the selected permutation pattern to make the task easier to recover the original data by reading the applied permutation pattern with the help of the ID of that permutation pattern. The output module 214 is to provide the compressed data with a highly compressed ratio. Moreover, the output module 214 receives the compressed data with appended encoded ID.

There is provided a data compression apparatus 200 that includes the compression controller 204 configured to obtain a sample data sequence of a predefined size from the input data. In other words, to perform the pre-preprocessing of the input data, firstly the compression controller 204 receives the input data from the data source 202. Further, the compression controller 204 extracts the sample data sequence of predefined size from the input data the extracted sample data sequence is a small part of the input data. Examples of the input data may include but are not limited to images, documents, videos, and the like. In an implementation, the input data is referred to as a data block that is a bigger piece of consecutive data usually from 4-kilo bytes (KB) to a few megabytes (MB). In an example, the predefined size of the sample data sequence is measured in bytes, such as 512 bytes. In an embodiment, the predefined size of the sample data sequence is defined by an external source and/or based on characteristics of one or more of the input data and the compression algorithm. In an implementation, the external source defines the predefined size of the sample data sequence. In another implementation, the predefined size of the sample data sequence depends upon the characteristics of one or more of the input data and the compression algorithm, as the input data may include but not be limited to images, documents, videos, and the like. In yet another implementation, the predefined size of the sample data sequence is defined by an external source and based on characteristics of one or more of the input data and the compression algorithm.

In an embodiment, the compression controller 204 is configured for obtaining the sample data sequence by extracting a sample of the predefined size from the input data. In other words, the sample data sequence is a small part of the input data, with the predefined size. Further, the sample data sequence is extracted by the compression controller 204 for optimization. The compression controller 204 is further configured to split the sample data sequence into data words of a fixed size in bytes. In other words, the sample data sequence that is extracted from the input data by the compression controller 204 is split into the data words of fixed size (e.g., in bytes). In an example, the data word is a small piece of data that includes a sequence of a few bytes (e.g., from 2 to 8 bytes). For example, the predefined size of the sample data sequence is 512 bytes, and the fixed size of the data words is around 4 bytes, as further shown and described in FIG. 4. In an accordance with an embodiment, the fixed size of data words is defined by an external source and/or based on characteristics of the compression algorithm. In an implementation, the external source defines the fixed size of the data words. In another implementation, the fixed size of the data words depends upon the compression algorithm, such as the LZ4 compression algorithm is used for a size of 4 bytes. In yet another implementation, the fixed size of data words is defined by an external source and based on the characteristics of the compression algorithm.

There is provided in the data compression apparatus 200, the permutation module 206 configured to define a set of permutation patterns that mark one or more byte positions in the data words. Further, one permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions, and an identifier, ID, for each permutation pattern. In other words, the set of permutation patterns is defined to mark the one or more byte positions in the data word of fixed size to reorder the bytes data words by the permutation module 206. Further, one permutation pattern marks the same byte positions in each data word, including an empty permutation pattern that marks no byte positions, and an identifier ID (e.g., permutation ID) for each permutation pattern. In an example, the byte position marked in one data word is the same as marked in the consecutive data word. In an example, one permutation pattern marks a third-byte position in each data word. In another example, another permutation pattern marks a fourth-byte position in each data word. In yet another example, one permutation pattern marks a second-byte position and a fourth-byte position in each data word. For example, if the fourth-byte position is marked in one data word of a first permutation pattern, then the fourth-byte position is needed to be marked of a second data word, and similarly for subsequent data words. Such marking of byte positions in different permutation patterns is beneficial to select the appropriate permutation pattern, which brings the input data in a more compressible format. Further, the permutation module 206 check the byte position without marking, and the order of the byte position that is not marked is not changed. In other words, unmarked byte positions preserve their original order. The permutation module 206 is further configured to obtain a set of permuted data sequences by reordering bytes in the sample data sequence using the permutation patterns. Further, each permuted data sequence is obtained by sequentially moving, to a beginning of the permuted data sequence, bytes from marked byte positions in the data words of the sample data sequence, in accordance with certain permutation patterns, and then sequentially moving bytes from nonmarked byte positions. Further, different permuted data sequences are obtained using different permutation patterns. In other words, to obtain the set of permutation data sequences, the bytes of the sample data sequence is reordered by using the permutation patterns through the permutation module 206. Further, in certain permutation patterns, the bytes are reordered in such a way that firstly the bytes are marked in each of the data words, and then sequentially moving the bytes from marked byte positions in the data words of the sample data sequence to the beginning of the permuted data sequence.

In other words, the bytes from the marked bytes positions are shifted forward, and then sequentially moving bytes from non-marked byte positions, such as the rest of the bytes are placed sequentially after the marked bytes. In an example, the arrangement of bytes is different in each permutation pattern. For example, if a fourth-byte position is marked in one data word of a first permutation pattern, then the fourth-byte position is needed to be marked of a second data word, and the like. After marking the bytes in the data words, the bytes are reordered in such a way that the fourth byte of each data word is placed at the beginning of the permuted data sequence, and then the other bytes are placed sequentially, as further shown and described in FIG. 6. Similarly, if a third-byte position is marked in one data word of a permutation pattern, then the third byte of each data word is placed at the beginning of the permuted data sequence, and then the other bytes are placed sequentially, and the like. In such a way, multiple permutation patterns are obtained. In an example, the sample data sequence with all the bytes marked in the data word is skipped because the order of bytes is the same as in the original data.

In an implementation, the permutation module 206 reorders bytes of the input data before compression, and such reordering is lossless reversible data transformation. Moreover, the reordering of bytes follows a specific repeating pattern that is known as permutation patterns. The permutation pattern is applied to the data words, word-by-word, repeatedly by moving predefined bytes of a word to the beginning of the data block, which is being compressed. In an example, the reordering follows the same pattern for each fixed-size data word to combine defined bytes from all data words together.

The data compression apparatus 200 includes the compression module 208 that is configured to compress each of the permuted data sequences by the compression algorithm. In an example, the LZ4 (number 4 stands for 4-byte data words) compression algorithm is used by the compression module 208 to compress each of the permutated data sequences. However, another compression algorithm can also be used, without limiting the scope of the disclosure. Further, the compression algorithm is to reduces the size of the data. Further, each of the permuted data sequences is compressed separately by the compression module 208 to provide multiple permuted data sequences in compressed form. The obtained multiple permuted data sequence after compression is of different sizes because of different permutation patterns used in each of the sample data sequences.

The data compression apparatus 200 includes the comparison module 210 that is configured to select the permutation pattern used in a compressed permuted data sequence of a minimum size among the compressed permuted data sequences. In other words, after performing compression on each of the permuted data sequences, multiple compressed permuted data sequence is obtained that are of various sizes. The comparison module 210 is used for selecting the permutation pattern, such as for selecting the best permutation pattern from the compressed permuted data sequences. Moreover, the variation of size in each compressed permuted data sequence is because of the different permutation patterns used by the data compression apparatus 200. Further, the compressed permuted data sequence, which contains the least size among all the compressed permuted data sequences is selected by the comparison module 210.

The data compression apparatus 200 includes the ID encoder 212 that is configured to encode an ID of the selected permutation pattern. Further, the compression controller 204 is configured for controlling the permutation module to obtain permuted data by reordering bytes in the input data using the selected permutation pattern and controlling the compression module 208 to obtain compressed data by compressing the permuted data by the compression algorithm. In other words, the ID (or permutation ID) of the permutation pattern that is selected after compression of a permuted data sequence, is encoded by the ID encoder 212 to use that particular permutation pattern on the input data. Further, the encoded ID of the selected permutation pattern is beneficial to recover the original data by reading the applied permutation pattern with the help of the ID of the corresponding permutation pattern. After selecting the permutation pattern, the permutation module 206 reorders the bytes of the input data with respect to the selected permutation pattern to obtain the permuted data and compressed data. Further, the compression module to compresses the permuted data that is obtained by reordering the whole input data. In an embodiment, the compression controller 204 is configured for obtaining the sample data sequence by selecting the whole input data as the sample data sequence. Further, the compression controller 204 is configured for controlling the permutation module and the compression module 208 for obtaining the compressed data by selecting the compressed permuted data sequence of the minimum size as the compressed data. In other words, the whole input data is selected as the sample data sequence by the compression controller 204. The sample data is of predefined size and the sample data sequence is the small part of the input data. Further, the compressed permutated data sequence of the minimum size is selected to obtain the compressed data of various sizes. After that, the compressed permuted data with the least size among all the compressed permuted data is selected as the finest compressed data.

The data compression apparatus 200 includes the output module 214 that is configured to obtain an output data by appending the encoded ID to the compressed data. In other words, the compressed data is appended with the encoded ID of the selected permutation pattern to obtain the output data. The output data helps in recovering the original data easily by reading the applied permutation pattern with the help of the encoded ID of that permutation pattern. Further, the output data is obtained after appending the encoded ID with the compressed data to perform lossless compression.

FIG. 3 is a flow chart that depicts an algorithm for implementing data optimization for lossless compression in a data compression apparatus, in accordance with an embodiment of the present disclosure. FIG. 3 is described in conjunction with elements from the FIG. 2. With reference to FIG. 3, there is shown a flow chart 300 that depicts steps of the algorithm for implementing data optimization for lossless compression in the data compression apparatus as depicted in FIG. 2. In other words, the algorithm from FIG. 3 is an exemplary implementation form of the method from FIG.l . The flow chart 300 includes steps 302 to 322.

At step 302, the input data is provided by the data source 202 to the compression controller 204. Further, the input data that is received for performing the compression can be of any type (for example image, document, video, and the like). In an example, the input data is taken from a memory unit, in this scenario the data source 202 is the memory unit from which the input data is fetched. Moreover, after receiving the input data from the data source 202, the compression controller 204 extracts a sample data sequence from the input data. Further, the sample data sequence of the input data is split into data words of fixed size in bytes. The compression controller 204 optimizes the performance by using only a small part of the input data.

At step 304, the compression controller 204 defines the permutation list according to which the one or more byte positions in the data words are marked and shifted.

At step 306, after defining the list of permutations by the compression controller 204, the permutation pattern from the defined list is picked and the one or more byte positions in the data words are marked and shifted with respect to the permutation pattern. One-by-one each permutation pattern is picked to perform pre-processing of the sample data sequence.

At step 308, the permutation pattern is applied by the permutation module 206 on the sample data sequence received from the compression controller 204. Further, the permutation pattern is applied by the permutation module 206 in such a way that the bytes are marked and rearranged according to the permutation patterns defined in the list. The arrangement of bytes is different in each permutation pattern.

At step 310, after applying the permutation pattern on the sample data sequence, the compression module 208 starts compression. The sample data sequence with different permutation patterns is compressed separately by the compression module 208 to obtain permuted data sequence. In an example, the LZ4 compression algorithm is used by the compression module 208. However, another compression algorithm can also be used, without limiting the scope of the disclosure.

At step 312, after performing the compression by the compression module 208, there are various compressed permuted data sequences obtained that are of different sizes because of the different permutation patterns used in each permuted data sequence. Further, the compressed permuted data sequence is compared by the comparison module 210 to check if the current result is better than the previous. In an example, if the compressed permuted data sequence has a smaller size than the previous compressed permuted data sequence, then the ID (or permutation ID) of the compressed selected permuted data sequence is stored. If the size of the compressed permuted data sequence is bigger than the previous compressed permuted data sequence, then the comparison module 210 checks for other permutation pattern lists. In other words, if the current result is better than the previous, then step 314 is executed, otherwise step 316 is executed directly.

At step 314, the ID of the selected compressed permuted data sequence is stored by the compression controller 204. In an example, if the comparison module 210 compares two compressed permuted data sequences with each other, then the ID of the compressed permuted data sequence with a smaller size is saved in the compression controller 204. Further, other compressed permuted data sequences are compared and checked by the comparison module 210.

At step 316, the compression controller 204 checks if the permutation list is empty or not. Moreover, if the permutations list is empty, then step 318 is executed, otherwise, the previous step 306 is executed to pick the next permutation from the permutation list. In an example, if there are some permutation patterns left in the permutation list then the compression controller 204 pick the permutation patterns one by one to apply the sample data sequence. Further, the

2u same procedure of compression and comparison is repeated. If the permutation list is empty, then the best permutation pattern selected by the comparison module 210 is applied on the whole data by the compression controller 204. Further, the permuted data is compressed by the compression module 208 to obtain compressed permuted data.

At step 318, the ID of the selected permutation pattern is encoded by the ID encoder 212. Moreover, at step 320, the ID encoded by the ID encoder 212 is appended with the compressed permuted data. Further, the ID of the selected permutation pattern is encoded to recover the original data by reading the applied permutation pattern with the help of the ID of that permutation pattern.

At step 322, the compressed data is obtained by the output module 214. The output module 214 receives the compressed data with appended encoded ID from the compression controller 204.

FIG. 4 is an illustration that depicts different types of permutation patterns, in accordance with an embodiment of the present disclosure. FIG. 4 is described in conjunction with elements from the FIG. 2. With reference to FIG 4, there is shown an illustration 400 of different types of permutation patterns.

In an embodiment, the input data that is needed to be compressed is received from the data source 202. Further, the compression controller 204 splits the original data into data words of four bytes. However, the input data can be split in other possible bytes without limiting the scope of the present disclosure. Further, the bytes are marked according to the permutation applied on the bytes by the permutation module 206. Initially, no bytes are marked in the data word, so it corresponds to the input data and refers to an empty permutation pattern (or permutation 0). Further, when a first permutation pattern (or permutation 1) is applied to the data word then, the first-byte position of each data word is marked. Similarly, in a second permutation pattern (or permutation 2), the second-byte position of each data word is marked. Further, in a third permutation pattern (or permutation 3), the first-byte position and second-byte position of each data word is marked after that in a fourth permutation pattern (or permutation 4), the third-byte position of each data word is marked. Similarly, in a fifth permutation pattern (or permutation 5), the first-byte position and the third-byte position of each data word are marked, after that in a sixth permutation pattern (or permutation 6), the second-byte position and the third-byte position of each data word is marked. Moreover, in a seventh permutation pattern (or permutation 7), the first-byte position, the second-byte position, and the third-byte position of each data word are marked. In an eighth permutation pattern (or permutation 8), the fourth-byte position of each data word is marked. Further, in a ninth permutation pattern (or permutation 9), the first-byte

2i position and the fourth-byte position of each data word are marked, after that in a tenth permutation pattern (or permutation 10), the second-byte position and fourth-byte position of each data word are marked. Similarly, in an eleventh permutation pattern (or permutation 11), the first-byte position, the second-byte position, and the fourth-byte position of each data word are marked. In a twelfth permutation pattern (or permutation 12), the third-byte position and the fourth-byte position of each data word are marked. In addition, in a thirteenth permutation pattern (or permutation 13), the first-byte position, the third-byte position, and the fourth-byte position of each data word are marked. In a fourteenth permutation pattern (or permutation 14), the second-byte position, the third-byte position, and the fourth-byte position of each data word are marked. Therefore, fifteen permutation patterns are represented, which follows their binary sequence to mark one or more byte positions in the data words. According to the binary sequence, the fifteenth permutation pattern marks all the byte positions in the data word, so the position of the bytes remains unchanged, thus the fifteenth permutation pattern is skipped by the permutation module 206. Further, the permutation 0 to permutation 14 is useful, so there are fifteen useful permutation patterns that are equal to 2 ⁴- 1. In an example, the eighth permutation pattern shows the finest result by providing the highest compression ratio and the thirteenth permutation pattern is least effective. However, the other permutation patterns can also show better permutation, without limiting the scope of the disclosure.

FIG. 5 is a flow chart that depicts steps for compressing different types of permutation patterns, in accordance with an embodiment of the present disclosure. FIG. 5 is described in conjunction with elements from the FIG. 2. With reference to FIG. 5, there is shown flow chart 500 that includes an original data 502, a first permutation data 504A, a second permutation data 504B, a third permutation data 504C, a first compressed data 506A, a second compressed data 506B, and a third compressed data 506C.

In an embodiment, the flow chart 500 discloses the selection of the finest permutation pattern by analysing various permutation patterns. Firstly, the compression controller 204 receives the original data 502 (or input data) and extracts a sample from the original data 502. Further, the compression controller 204 splits the sample into data word of fixed size in bytes, and the sample is provided to the permutation module 206. In the FIG 5, examples of three permutation patterns are shown. However, more permutation patterns can be applied without limiting the scope of the present disclosure. The permutation module 206 applies the first permutation pattern to the sample for obtaining the first permutation data 504A. Further, the permutation module 206 applies a second permutation pattern to the sample for obtaining the second permutation data 504B. Furthermore, the permutation module 206 applies a third permutation pattern to the sample for obtaining the third permutation data 504C. To apply the permutation patterns on the sample, the permutation module 206 initially marks the bytes of the sample, which is performed with respect to the applied permutation pattern. After, marking the bytes, the permutation module 206 sequentially moves the marked bytes to a beginning. The permutation module 206 does not change the position of the un-marked bytes. Further, the arrangement of the bytes of the first permutation data 504A, the second permutation data 504B, and the third permutation data 504C are different from each other because of the difference in the applied permutation pattern.

Further, the compression module 208 compresses the first permutation data 504A, the second permutation data 504B, and the third permutation data 504C separately to obtain the first compressed data 506A, the second compressed data 506B, and the third compressed data 506C respectively. Further, the comparison module 210 compares the first compressed data 506A, the second compressed data 506B, and the third compressed data 506C with respect to the size. In an example, if the second compressed data 506B has the minimum size among all, then the comparison module 210 selects the second compressed data 506B as the compressed data. Further, the permutation pattern applied on the second compressed data 506B is further applied on the original data 502.

FIG. 6 is an illustration that depicts a sequence of different types of permutation patterns, in accordance with an embodiment of the present disclosure. FIG. 6 is described in conjunction with elements from the FIG. 2, and FIG. 5. With reference to FIG. 6, there is shown and illustration 600 that depicts the sequence of different types of permutation patterns.

In an embodiment, the original data 502 is split into data words of fixed bytes by the compression controller 204, the byte positions are marked by the permutation module 206. Further, the marking of bytes positions in the data word is performed according to the applied permutation pattern. In an example, (as shown in FIG 6) the third-byte position of each data word is marked with respect to a permutation pattern A. Further, the byte from the marked byte position in each data word is sequentially moved in the beginning and the un-marked byte positions remain at their position to obtain a permuted data sequence (e.g., sequence ‘A’). Further, the permutation module 206 applies a permutation pattern B, and the fourth-byte position of each data word is marked. Further, the bytes from the fourth-byte position of each data word are sequentially moved in the beginning to obtain a corresponding permuted data sequence (e.g., sequence ‘B’). Further, the permutation module 206 applies permutation pattern ‘C’, a second-byte position and a fourth-byte position of each data word are marked. Further, bytes from the second-byte position and fourth-byte position of each data word are sequentially moved in the beginning to obtain a corresponding permuted data sequence (e.g., sequence ‘C’). It can be noted that the marked byte positions in all the permutation patterns preserve their original positing before moving data and then the un-marked byte positions preserve their original position. Further, each of the sequence ‘A’, the sequence ‘B’, and the sequence ‘C’ can be compressed by the compression algorithm of the compression module 208. Further, non-permuted data is also compressed to analyse the compression ratio with the permutation patterns and without permutation. Further, the permutation pattern that provides the highest compression ratio is used in the original data. In an example, if the compression ratio of non-permuted data is highest then the original data is compressed without permutation.

FIG. 7 is a block diagram of a data decompression apparatus, in accordance with an embodiment of the present disclosure. With reference to FIG 7, there is shown a data decompression apparatus 700, a compressed data source 702, a decompression controller 704, an ID decoder 706, a decompression module 708, a reverse permutation module 710, and an output data module 712.

The data decompression apparatus 700 is for decompressing a compressed data. Further, the decompression performed by the data decompression apparatus 700 is to retrieve original data by reversing the compression applied on the compressed data.

The compressed data source 702 is the element from which the compressed data is received for decompression. Although, the data source 702 is not a part of the disclosure, thus mentioned only for explaining the disclosure explicitly.

The decompression controller 704 is to control all the other elements for performing the decompression. The decompression controller 704 receives the compressed data from the compressed data source 702. Examples of implementation of the decompression controller 704 may include but are not limited to a central data processing device, a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a state machine, and other processors or control circuitry.

The ID decoder 706 is to decode the ID of a permutation pattern that is applied on the compressed data. Further, the permutation ID decoder helps in retrieving original data from the compressed data by providing the permutation pattern applied to the compressed data.

The decompression module 708 is to decompress the data received from the decompression controller 704. The decompression module 708 uses a decompression algorithm to decompress the data. The reverse permutation module 710 is to solve the permutation pattern applied on the decompressed data received from the decompression controller 704. Moreover, the reverse permutation module 710 moves the bytes of the decompressed data to their original position in which they were before permutation.

The output data module 712 is used to provide original data after decompression. There is provided the data decompression apparatus 700 that includes the decompression controller 704 configured to receive an output data comprising compressed data and an encoded ID of a selected permutation pattern used in the compressed data. In other words, the output data is provided to the decompression controller 704 by the compressed data source 702. Further, the output data is permuted, compressed, and appended with the encoded ID. Further, the output data is received for performing a decompression process and to recover the original data without losing any information. Further, the output data may include but is not limited to images, documents, videos, and the like.

The decompression controller 704 is further configured to obtain a set of permutation patterns with IDs of the permutation patterns and a fixed size of data words. The decompression controller 704 further obtain the compressed data by detaching the encoded ID from the output data. In other words, to recover the original data, the received output data from the compressed data source 702 is needed to be decompressed by the decompression controller 704. Further, to recover the original data, the permutation pattern according to which the original data was permuted, and the ID of the permutation pattern are required, so the set of permutation patterns and the IDs of the permutation patterns are obtained. Furthermore, the data words with fixed size are also required for the recovery of original data without losing any information. Moreover, the obtained data blocks are in compressed format. In addition, the encoded ID of the permutation pattern is detached from the output data by the decompression controller 704 to obtain the compressed data. The data decompression apparatus 700 further includes the ID decoder 706 that is configured to decode an ID of the selected permutation pattern used in the compressed data from the encoded ID. After detaching the encoded ID from the output data, the encoded ID is decoded of the selected permutation pattern according to which the original data was permuted by the ID decoder 706. The permuted data of the compressed data and the bytes of the permuted data are arranged back to their original position in which they were before permutation.

The data decompression apparatus 700 further includes the decompression module 708 that is configured to decompress a permuted data from the compressed data by a decompression algorithm. In other words, the permuted data is decompressed by the decompression module 708. Further, the decompression module 708 uses the decompression algorithm to perform the decompression. In an example LZ4 (number 4 stands for 4-byte data words) decompression algorithm is used by the decompression module 708 for performing the decompression. However, another decompression algorithm can also be used, without limiting the scope of the disclosure. The decompression is performed by the decompression module 708 to recover the data by reversing the compression applied on the permuted data. By reversing the compression, the permuted data is received in the original size.

The data decompression apparatus 700 further includes the reverse permutation module 710 that is configured to obtain an input data by reverse reordering bytes in the permuted data using the permutation pattern corresponding to the decoded ID. The permutation pattern applied on the permuted data is detected after decoding the ID of the permutation pattern by the ID decoder 706. After decompressing the permuted data, the reverse permutation module 710 arranges the bytes of the permuted data back to their original position in which they were before permutation. Further, the reverse permutation module 710 arranges the bytes back to the original position with respect to the permutation pattern used when the bytes were permuted. Moreover, the output of the permutation module 710 is received by the output data module 712. Finally, after reversing the permutation pattern, the original data is obtained, that is the input data that was received for compression.

FIG. 8 is another flow chart that depicts steps to obtain a decompressed data, in accordance with an embodiment of the present disclosure. FIG. 8 is described in conjunction with elements from the FIG 7. With reference to the FIG. 8, there is shown a flow chart 800 that depicts steps to obtain a decompressed data.

In an implementation, the decompression is performed on the compressed data to recover original data. The decompression helps the compressed data in obtaining the original size

At step 802, the data for decompression is provided to the decompression controller 704 by the compressed data source 702. The data provided by the compressed data source 702 is the compressed data that is permuted, compressed, and appended with the encoded ID. Further, the received compressed data is needed to be decompressed for recovering the original data without losing any information. Further, the compressed data may include but is not limited to images, documents, videos, and the like.

At step 804, firstly the encoded ID of the permutation pattern is detached from the compressed data by the decompression controller 704. After detaching the encoded ID from the compressed data, the ID of the selected permutation pattern according to which the original data was permuted, is decoded by the ID decoder 706. At step 806, the compressed data is decompressed by the decompression module 708. Further, the decompression module 708 uses the decompression algorithm to perform the decompression. In an example, the LZ4 decompression algorithm is used by the decompression module 708 for performing the decompression. However, another decompression algorithm can also be used, without limiting the scope of the disclosure. The decompression is performed by the decompression module 708 to recover the data by reversing the compression applied on the compressed data.

At step 808, the decompressed data is received from the decompression module 708. Further, the reverse permutation module 710 arranges the bytes of the decompression data back to their original position in which they were before permutation.

At step 810, the decompressed data is obtained with all the information without loss. The obtained decompressed data is the original data that was received for performing the compression.

FIG. 9 is a flow chart of that depicts steps for data recovery after decompression, in accordance with an embodiment of the present disclosure. FIG. 9 is described in conjunction with elements from the FIG 7. With reference to the FIG. 9, there is shown a flow chart 900 that depicts steps for data recovery after decompression

In an embodiment, after performing the compression of data for the reduction in size, the original data is needed to be recovered. Further, to recover the original data, decompression is performed. The decompression process helps the compressed data in obtaining the original size.

At step 902, the compressed data is received from the compressed data source 702. The compressed data is appended with the encoded ID. The compressed data may include but is not limited to images, documents, videos, and the like. Further, the decompression module 708 uses the decompression algorithm to perform the decompression on the compressed data to obtain decompressed data.

At step 904, the encoded ID of the permutation pattern applied on the obtained decompressed data is decoded by the ID decoder 706. After decompressing the permuted data, the reverse permutation module 710 arrange the bytes of the permuted data back to their original position in which they were before permutation.

At step 906, the reverse permutation module 710 arranges the bytes back to the original position with respect to the permutation pattern used when the bytes were permuted. Further, after reversing the permutation pattern, the original data is obtained. The method Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a nonexclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.

Previous Patent: METHOD FOR DETECTING VULNERABILITIES IN SOFTWARE CODE

Next Patent: DIGITAL STANDARDIZED MODULE (CASSETTE) OF A RADIANT HEATING/COOLING SYSTEM