Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR FAST RECURSIVE CODING IN ELECTRONIC DEVICES
Document Type and Number:
WIPO Patent Application WO/2006/051151
Kind Code:
A1
Abstract:
The invention relates to method for coding data in an electronic device. In the method a source text is obtained in the memory. At least one symbol in the source text is grouped in at least one symbol group based on the probability of said at least one symbol in the source text. A prefix and a suffix stream are formed using codewords for said at least one symbol. Each subsequent prefix pair in said prefix stream is concatenated to form a concatenated prefix stream. The concatenated prefix stream is set as a new source text in said memory and the coding procedure is repeated if the number of symbol groups among said at least one symbol group is less than or equal to a predefined threshold value.

Inventors:
ASTOLA JAAKKO (FI)
EGIAZARIAN KAREN (FI)
PONOMARENKO NIKOLAY (UA)
LUKIN VLADIMIR (UA)
RYABKO BORIS (RU)
Application Number:
PCT/FI2005/000428
Publication Date:
May 18, 2006
Filing Date:
October 10, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ASTOLA JAAKKO (FI)
EGIAZARIAN KAREN (FI)
PONOMARENKO NIKOLAY (UA)
LUKIN VLADIMIR (UA)
RYABKO BORIS (RU)
International Classes:
H03M7/30; H03M7/40; H03M7/42; H03M
Foreign References:
US6690306B12004-02-10
US20040051653A12004-03-18
EP0661668A21995-07-05
Other References:
LIDDELL ET AL: "Hybrid Prefix Codes for Practical Use", IEEE PROCEEDINGS OF THE DATA COMPRESSION CONFERENCE (DCC'03), March 2003 (2003-03-01), pages 392 - 401
RYABKO ET AL: "Fast Codes for Large Alphabets", COMMUNICATIONS IN INFORMATION AND SYSTEMS, vol. 3, no. 2, October 2003 (2003-10-01), pages 139 - 152
Attorney, Agent or Firm:
PAPULA OY (P.O. Box 981, Helsinki, FI)
Download PDF:
Claims:
CLAIMS :
1. A method for coding data in an electronic device comprising a processor and a memory, the method comprising: obtaining a source text in said memory; grouping at least one symbol in said source text in at least one symbol group based on the probability of said at least one symbol in said source text; mapping the codeword of said at least one symbol in said source text to a prefix and a suffix to form a prefix stream and a suffix stream; concatenating each subsequent prefix pair in said prefix stream to form a concatenated prefix stream; setting said concatenated prefix stream as a new source text in said memory and repeating said group¬ ing, mapping and concatenating steps if the number of symbol groups among said at least one symbol group is less than or equal to a predefined threshold value; and outputting a coded text from said memory to at least one of a second memory and a data network.
2. The method according to claim 1, wherein said predefined threshold value is the square root of the alphabet size in said source text.
3. The method according to claim 1, wherein said second memory is a nonvolatile memory.
4. The method according to claim 3, wherein said second memory is removable from said electronic device.
5. The method according to claim 1, the method further comprising: providing information on said at least one symbol group to said coded text; providing said suffix stream to said coded text; and providing said prefix stream to said coded text.
6. A method for decoding data in an elec¬ tronic device comprising a processor and a memory, the method comprising: obtaining a coded text comprising at least one symbol group entry, a prefix stream and a suffix stream in said memory from at least one of a second memory and a data network; setting said prefix stream as an input prefix stream and said suffix stream as an input suffix stream; reading a symbol group entry from said coded text; processing said input prefix stream using informa¬ tion in said symbol group entry; processing said input suffix stream using said in put prefix stream and information in said symbol group entry; forming an output symbol stream based on each pre¬ fix in said input prefix stream and each suffix in said input suffix stream; and if there are more symbol group entries in said coded text, dividing each symbol in said output symbol stream into two prefixes, setting said output symbol stream as a new input prefix stream and repeating said reading, processing and forming steps.
7. The method according to claim 6, wherein said second memory is a nonvolatile memory.
8. The method according to claim 7, wherein said second memory is removable from said electronic device.
9. An electronic device comprising a proces¬ sor and a memory, the electronic device further com¬ prising: a coding entity in said electronic device config¬ ured to: obtaining a source text in said memory, group at Least one symbol in said source text in at least one symbol group based on the probability of said at least one symbol in said source text, map the codeword of said at least one symbol in said source text to a prefix and a suffix to form a prefix stream and a suffix stream, concatenate each subsequent prefix pair in said prefix stream to form a concatenated prefix stream, set said concatenated prefix stream as a new source text in said memory, check if the number of symbol groups among said at least one symbol group is less than or equal to a predefined threshold value, and output a coded text from said memory to at least one of a second memory and a data network.
10. An electronic device comprising a proces¬ sor and a memory, the electronic device further com¬ prising: a decoding entity configured to: obtain a coded text comprising at least one sym¬ bol group entry, a prefix stream and a suffix stream in said memory from at least one of a second memory and a data network, set said prefix stream as an input prefix stream and said suffix stream as an input suffix stream, read a symbol group entry from said coded text, process said input prefix stream using informa¬ tion in said symbol group entry, process said input suffix stream using said in put prefix stream and information in said symbol group entry, form an output symbol stream based on each pre¬ fix in said input prefix stream and each suffix in said input suffix stream, divide each symbol in said output symbol stream into two prefixes if there are more symbol group en¬ tries in said coded text, and set said output symbol stream as a new input prefix stream if there are more symbol group entries in said coded text.
11. A computer program comprising code 5 adapted to perform the following steps when executed on a dataprocessing system: obtaining a source text, grouping at least one symbol in said source text in at least one symbol group based on the probability 0 of said at least one symbol in said source text; mapping the codeword of said at least one symbol in said source text to a prefix and a suffix to form a prefix stream and a suffix stream; concatenating each subsequent prefix pair in said 5 prefix stream to form a concatenated prefix stream; setting said concatenated prefix stream as a new source text and repeating said grouping, mapping and concatenating steps if the number of symbol groups among said at least one symbol group is less than or ϋ equal to a predefined threshold value; and outputting a coded text.
12. The computer program according to claim11 wherein said computer program is stored on a com¬ puter readable medium. 5 13.
13. The computer program according to claim 12 wherein said computer readable medium is a remov¬ able memory card.
14. The computer program according to claim 12, wherein said computer readable medium is a mag 0 netic or an optical disk.
15. A computer program comprising code adapted to perform the following steps when executed on a dataprocessing system: obtaining a coded text comprising at least one 5 symbol group entry, a prefix stream and a suffix stream; setting said prefix stream as an input prefix stream and said suffix stream as' an input suffix stream; reading a symbol group entry from said coded text; processing said input prefix stream using informa¬ tion in said symbol group entry; processing said input suffix stream using said in¬ put prefix stream and information in said symbol group entry; forming an output symbol stream based on each pre¬ fix in said input prefix stream and each suffix in said input suffix stream; and if there are more symbol group entries in said coded text, dividing each symbol in said output symbol stream into two prefixes, setting said output symbol stream as a new input prefix stream and repeating said reading, processing and forming steps.
16. The computer program according to claim15 wherein said computer program is stored on a com puter readable medium.
17. The computer program according to claim16 wherein said computer readable medium is a remov¬ able memory card.
18. The computer program according to claim 16, wherein said computer readable medium is a mag¬ netic or an optical disk.
Description:
TITLE OF THE INVENTION

METHOD FOR FAST RECURSIVE CODING IN ELECTRONIC DEVICES

BACKGROUND OF THE INVENTION Field of the invention:

The invention relates to data compression in electronic devices. Particularly, the invention re¬ lates to the coding algorithms based on the grouping of source text symbols.

Description of the Related Art:

Most files and other data structures stored in the secondary or primary memory of an electronic device contain much redundant information from infor- mation theoretic point of view. File formats and the coding standards for characters contained therein are optimized for fast and easy presentation on a display device. Examples of such file formats are ASCII (American Standard for Information Exchange) and Unicode text files. Most file formats have not been designed with considering minimum file size as a pri¬ ority and thus do not deal with the redundancy of the information stored in the files. Because of this it is necessary to provide algorithms for data compression whenever the storage size is limited or there is a need to transfer files or other uncompressed over a limited bandwidth data communication channel. Data compression is an important feature in source coding, which is a process that converts data from a source to a sequence of binary digits in such a way that the se¬ quence of binary digits may be converted back to pro¬ duce a corresponding reproduction of the original data.

Reference is now made to Figure 1, which il- lustrates the theoretical model of source coding. In Figure 1 there is a source electronic device 100,

which forms a message 104, which is a sequence of bi¬ nary digits. Message 104 is consumed in a destination electronic device 102. Message 104 is used to carry a sequence of n symbols, each of which belong to an al- phabet 106, which comprises the symbols a x , a 2/ a 3/ ..,a k . The probability P(ai) of receiving of a given symbol ai wherein l≤i≤k is sometimes briefly referred to as pi- Assuming independency between each symbol

K

^P(α,.) =l. A given symbol a± is represented as a binary

digit sequence, in other words a codeword, in a code as b (t) , wherein the length L(b (i) )=li. The aver¬ age length of a binary digit sequence in a code is ■ A code is used to represent the sym- bols in message 104. According to information theory in an optimal code L =L(b M ) =Iog 2 ( ), which makes the

PO,.) average binary digit sequence length representing a symbol / . However, it must be noted that in an optimal code the binary digit sequence length representing a given symbol may have a decimal fraction. This is possible due to the fact that a group of symbols coded together may carry state infor¬ mation, which is recovered by the decoder.

In arithmetic coding the internal state of a coder is represented using two state variables L and R, which are similarly computed by the coder and the decoder. L is the lower end of a bounding interval and R is the width of the bounding interval and at all times O≤Z≤l and 0 <R ≤1. Initially, let L=O and R=I. In order to encode a symbol ai the values for L and R are set to and At the end of the message as all symbol probabilities have been com¬ puted and the L and R variables the transmitted code

is any number c satisfying the condition L≤c<L+R. The optimal average binary digit sequence length is achieved only in arithmetic codes. The disadvantage of arithmetic coding is the computational complexity in- volved in the computing of the state variables. Arith¬ metic coding is disclosed in the publication "Arithme¬ tic coding", Rissanen J. and Langdon G.G., IBM Journal of Research and Development, 23(2), pages 149-162, 1979. Reference is now made to Figure 2, which il¬ lustrates Huffman coding, which achieves optimal bi¬ nary digit sequence length, when each symbol in the alphabet is encoded separately. In Figure 2 there is a column 203, which contains the alphabet, a column 201 containing their respective probabilities. The code¬ words, that is, the binary digit sequences represent¬ ing the symbols are given in column 202. There is also illustrated a binary tree 210, which is the binary tree used to map the codewords to their respective symbols. A binary tree branch is selected at each level using the bit in the codeword the index of which corresponds to the number of the level. The disadvan¬ tages of Huffman coding are the computational complex¬ ity and the separate encoding of each symbol in the coded message, which results in non-optimal compres¬ sion.

Reference is now made to Figure 3, which il¬ lustrates a tree structure in K-flat coding. K-flat coding is disclosed in publication "Hybrid Prefix Codes for Practical Use", Liddell M. and Moffat A., Proceedings of the Data Compression Conference (DCC'03), IEEE, pages 392-401, 2003. Flat codes have a simple structure, which facilitates faster decoding compared to Huffman coding and arithmetic coding. A K- flat codeword comprises a k-bit prefix and a variable length suffix. The length of the suffix is determined by the prefix. The prefix is used to find the right

sub-tree, which may be indexed using the suffix in or¬ der to find the symbol corresponding to the codeword. In Figure 3 there is a binary tree structure 300. In binary tree structure 300 k=2 and at level 2 there are four sub-trees, namely sub-trees 310-340. The symbols are identified in leaf nodes 311, 321, 331, 332, 341, 342, 343. The leaf nodes 311 and 321 represent sub¬ trees 310 and 320, respectively. The leaf nodes 331 and 332 are associated with a single sub-tree root node for sub-tree 330. The leaf nodes 341-343 are as¬ sociated with sub-tree 340. Because of the uniform structure of the sub-trees, it is possible to compute the location of a symbol leaf node using the codeword suffix. Reference is now made to Figure 4, which il¬ lustrates a super-letter based coding method. The method is disclosed in publication "Fast Codes for Large Alphabets", Ryabko B., Astola J. and Egiazarian K., Communications in Information and Systems, Vol. 3, No. 2, pages 139-152, October 2003. In the super- letter based method, symbols, in other words, letters with small probabilities are grouped in subsets, which are called super-letters. The letters in the subsets have the same codeword length and may be coded fast. In Figure 4 there is an alphabet 400, which has sym¬ bols, that is, letters a 0 , a. lt a 2 , a 3 and a 4 that con¬ tained in column 412. The respective probabilities are contained in column 414. Codewords for the symbols are contained in column 416. Based on the closeness of their probabilities letters a 0 , ai, a 2 and a 3 are as¬ signed to super-letter SL 2 , whereas symbol a 4 remains separate or is equivalently comprised in its own su¬ per-letter SL 1 . In Figure 4 there is also a binary tree structure, which comprises a root node 420 and leaf nodes 422 and 424 linked directly to root node 420. The leaf nodes comprise table structures for storing the letters belonging to the respective super-

letters. Based on a prefix in the codeword for a sym¬ bol, the correct super-letter is selected. Thereupon, the suffix is used to index the. super-letter table to find the letter corresponding to the suffix. In Figure 4 first bit of letter codewords is checked and the sub-tree corresponding to the bit value is selected.

The problem associated with the coding meth¬ ods disclosed in association with Figures 2, 3 and 4 is related to non-optimal compression. The problem as- sociated with arithmetic encoding is computational complexity. A further problem associated with the com¬ putational complexity of a coding method is the delay introduced when the coding is used in data transmis¬ sion. A message in its entirety or individual symbols from it must be first encoded before the message or the symbols may be transmitted. In the former case an initial delay is introduced and in the latter case data transmission speed is adversely affected.

SUMMARY OF THE INVENTION:

The invention relates to a method for coding data in an electronic device comprising a processor and a memory. The method comprises obtaining a source text in the memory, grouping at least one symbol in the source text in at least one symbol group based on the probability of ' the at least one symbol in the source text, mapping the codeword of the at least one symbol in the source text to a prefix and a suffix to form a prefix stream and a suffix stream, concatenat- ing each subsequent prefix pair in the prefix stream to form a concatenated prefix stream, setting the con¬ catenated prefix stream as a new source text in the memory and repeating the grouping, mapping and con¬ catenating steps if the number of symbol groups among the at least one symbol group is less than or equal to a predefined threshold value, and outputting a coded

text from the memory to at least one of a second mem¬ ory and a data network.

The invention relates also to a method for decoding data in an electronic device comprising a processor and a memory. The method comprises obtaining a coded text comprising at least one symbol group en¬ try, a prefix stream and a suffix stream in the memory from at least one of a second memory and a data net¬ work, setting the prefix stream as an input prefix stream and the suffix stream as an input suffix stream, reading a symbol group entry from the coded text, processing the input prefix stream using infor¬ mation in the symbol group entry, processing the input suffix stream using the input prefix stream and infor- mation in the symbol group entry, forming an output symbol stream based on each prefix in the input prefix stream and each suffix in the input suffix stream, and if there are more symbol group entries in the coded text, dividing each symbol in the output symbol stream into two prefixes, setting the output symbol stream as a new input prefix stream and repeating the reading, processing and forming steps.

The invention relates also to an electronic device comprising a processor and a memory, the elec- tronic device further comprising: a coding entity in the electronic device configured to obtain a source text in the memory, to group at least one symbol in the source text in at least one symbol group based on the probability of the at least one symbol in the source text, to map the codeword of the at least one symbol in the source text to a prefix and a suffix to form a prefix stream and a suffix stream, to concate¬ nate each subsequent prefix pair in the prefix stream to form a concatenated prefix stream, to set the con- catenated prefix stream as a new source text in the memory, to check if the number of symbol groups among the at least one symbol group is less than or equal to

a predefined threshold value, and to output a coded text from the memory to at least one of a second mem¬ ory and a data network.

The invention relates also to an electronic device comprising a processor and a memory, the elec¬ tronic device further comprising: a decoding entity configured to obtain a coded text comprising at least one symbol group entry, a prefix stream and a suffix stream in the memory from at least one of a second memory and a data network, to set the prefix stream as an input prefix stream and the suffix stream as an in¬ put suffix stream, to read a symbol group entry from the coded text, to process the input prefix stream us¬ ing information in the symbol group entry, to process the input suffix stream using the input prefix stream and information in the symbol group entry, to form an output symbol stream based on each prefix in the input prefix stream and each suffix in the input suffix stream, to divide each symbol in the output symbol stream into two prefixes if there are more symbol group entries in the coded text, and to set the output symbol stream as a new input prefix stream if there are more symbol group entries in the coded text.

The invention relates also to a computer pro- gram comprising code adapted to perform the following steps when executed on a data-processing system: ob¬ taining a source text, grouping at least one symbol in the source text in at least one symbol group based on the probability of the at least one symbol in the source text, mapping the codeword of the at least one symbol in the source text to a prefix and a suffix to form a prefix stream and a suffix stream, concatenat¬ ing each subsequent prefix pair in the prefix stream to form a concatenated prefix stream, setting the con- catenated prefix stream as a new source text and re¬ peating the grouping, mapping and concatenating steps if the number of symbol groups among the at least one

symbol group is less than or equal to a predefined threshold value, and outputting a coded text.

The invention relates also to a computer pro¬ gram comprising code adapted to perform the following steps when executed on a data-processing system: ob¬ taining a coded text comprising at least one symbol group entry, a prefix stream and a suffix stream, set¬ ting the prefix stream as an input prefix stream and the suffix stream as an input suffix stream, reading a symbol group entry from the coded text, processing the input prefix stream using information in the symbol group entry, processing the input suffix stream using the input prefix stream and information in the symbol group entry, forming an output symbol stream based on each prefix in the input prefix stream and each suffix in the input suffix stream, and if there are more sym¬ bol group entries in the coded text, dividing each symbol in the output symbol stream into two prefixes, setting the output symbol stream as a new input prefix stream and repeating the reading, processing and form¬ ing steps.

In one embodiment of the invention, the pre¬ defined threshold value is the square root of the al¬ phabet size in the source text. In one embodiment of the invention, the form¬ ing of a prefix and a suffix stream using the- code¬ words of the symbols in the source text is performed so that the codeword for the at least one symbol is mapped to a corresponding prefix and suffix. The map- ping may be performed, for example, by indexing a ta¬ ble using the codeword and obtaining the corresponding prefix and suffix from the table entry. The prefixes and suffixes for the individual symbols in the source text form the prefix and suffix streams. For example, if there are N symbols in the source text, the prefix and suffix streams formed by mapping the codewords for the N symbols to a prefix and a suffix will have N

suffixes and N prefixes, respectively. The table is formed in the step of grouping at least one symbol in the source text in at least one symbol group based on the probability of the at least one symbol in the source text.

In one embodiment of the invention, the sec¬ ond memory is a non-volatile memory such as, for exam¬ ple, a magnetic or an optical disk or a flash memory. In one embodiment of the invention, the first memory is a Random Access Memory (RAM) :

In one embodiment of the invention, the sec¬ ond memory is removable from the electronic device. An example of a removable second memory is a diskette or a memory card. In one embodiment of the invention, the cod¬ ing entity provides information on the at least one symbol group to the coded text, provides the suffix stream to the coded text, and provides the prefix stream to the coded text. The symbol group information is provided, for example, as a symbol group entry com¬ prising at least information on the symbols grouped in the group.

In one embodiment of the invention, the com¬ puter program is stored on a computer readable medium. The computer readable medium may be a removable memory card, magnetic disk,, optical disk or magnetic tape.

In one embodiment of the invention, the elec¬ tronic device is a mobile device, for example, a lap¬ top computer, a palmtop computer, a mobile terminal or a personal digital assistant (PDA) . In one embodiment of the invention, the electronic device is a desktop computer or a mainframe computer.

The benefits of the invention are related to the improved efficiency of data compression. On the other hand, the method is faster than arithmetic cod¬ ing. The advantage of the method as disclosed over arithmetic coding is higher coding speed with insig-

nificantly larger code redundancy. Besides, due to re¬ cursion in the coding algorithm, at each step of which symbol probabilities are taken into account, the method is able to effectively code symbols for very large alphabets.

BRIEF DESCRIPTION OF THE DRAWINGS:

The accompanying drawings, which are included to provide a further understanding of the invention and constitute a part of this specification, illus¬ trate embodiments of the invention and together with the description help to explain the principles of the invention. In the drawings:

Fig. 1 is a block diagram illustrating the theoretical model of source coding in prior art;

Fig. 2 illustrates Huffman coding in prior art;

Fig. 3 illustrates a tree structure in K-flat coding in prior art; Fig. 4 illustrates a super-letter based cod¬ ing method in prior art;

Fig. 5A is a table illustrating the grouping of symbols from a sample alphabet to super-letters in one embodiment of the invention; Fig. 5B is a processing diagram illustrating fast recursive coding for a source text sample accord¬ ing to the invention;

Fig. 6 is a flow chart depicting one embodi¬ ment of a recursive coding method according to the in- vention;

Fig. 7 is a flow chart depicting a super- letter determination method in one embodiment of the invention;

Fig. 8 is a flow chart depicting one embodi- ment of a recursive decoding method according to the invention;

Fig. 9A is a block ' diagram illustrating an electronic device in one embodiment of the invention;

Fig. 9B is a block diagram illustrating a transmitting station and a receiving station communi- eating over a network in one embodiment of the inven¬ tion; and

Fig. 10 is a block diagram illustrating a coded .text in one embodiment of the invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS:

Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings.

Figure 5A is a table illustrating the group- ing of symbols from a sample alphabet to super-letters in one embodiment of the invention. In Figure 5A there is a source text 540, from which 14 first symbols are shown. The symbols from source text 540 belong to an alphabet 541, which' comprises 255 different symbols. In Figure 5A there is also a table 542, which contains columns 543, 544, 545 and 546. Column 543 shows super- letter indexes and the number -of symbols grouped in •each respective super-letter. Column 544 shows the symbols from the original alphabet. For example, in the case of super-letter 1 there are two symbols grouped in that super-letter, namely the two symbols associated with codewords "01101100" and "01100010". Column 545 shows the prefixes assigned for each super- letter. Column 546 shows the suffixes used to separate the different symbols grouped in a given super-letter. The prefix assigned for super-letter 1 is "000-0" and the suffixes used to separate the two symbols in that super-letter are "0" and "1" .

Figure 5B is a processing diagram illustrat- ing fast recursive coding for a source text sample ac¬ cording to the invention. Column 552 represents a se¬ quence using codewords for the 14 first symbols from

source text 540. The codewords for encoding alphabet 541 have in this case a uniform length of eight bits.

Arrow 501 represents a step in which the 14 first symbols from source text 540 are set as the in- put source text. Thereupon, as illustrated with arrow

502 the super-letters in alphabet 542 are determined. Grouping symbols with close probabilities forms the super-letters. The closeness of probabilities is de¬ termined, for example, by subtracting the probabili- ties and comparing the result to a threshold value. The sizes of the resulting super-letters and their composite probabilities may also be used in the super- letter determination. A super-letter is uniquely iden¬ tified using a prefix. The individual letters con- tained in a super-letter are separated from each other using a suffix. The symbols from input source text are each given a prefix, which corresponds to the super- letter in which that symbol belongs, and the super- letter specific suffix. For example, symbol with code "01101100" is represented as prefix "0000" and suffix "0" . The prefix and suffix parts are determined for each symbol in the input source text and used to form two streams. The prefix stream is represented as list 554 and suffix stream as list 556. The suffix stream may be directly provided to output data stream, that is, there is no further compression required for it. The suffix stream may be outputted, for example, to an output file or to a data transmission channel. Arrow

503 represents a step where each subsequent codeword prefix pair q 2 i, q 2 i + i and O≤i≤n is concatenated to form a concatenated prefix pair q2i|q2i+i• The concate¬ nated prefix pairs are used to form a concatenated prefix pair stream. The first seven concatenated pairs from the concatenated prefix pair stream is illus- trated in Figure 5B as list 558. If the condition N≥N* is true, wherein N is the original alphabet size and N s denotes the super-letter alphabet size,

the concatenated pair stream ' is set as a new source text and the steps represented by arrows 502 and 503 are recursively applied for the new source text. The setting of the concatenated pair stream as a new source text is illustrated in Figure 5B with arrow 504.

Figure 6 is a flow chart depicting one em¬ bodiment of a recursive coding method according to the invention. At step 602 a source text is obtained. The source text may be read en bloc from a secondary stor¬ age of an electronic device. The secondary storage may be, for example, a magnetic or optic disk or another non-volatile memory. The source text may comprise a file, a directory or a set of files stored in the sec¬ ondary storage. The source text is, however, consid¬ ered as a single entity. The source text may also be readily available in a primary memory, for example, Random Access Memory (RAM) . The source text is read by a coding entity, which is, for example, a software program executed in the electronic device. It is as¬ sumed in accordance with the example of Figures 5A and 5B that the source text consists of N symbols. For simplicity, it is assumed that the number N is even. If this is not the case, the source text may be padded with an extra padding character.

At step 604 the coding entity determines the symbol frequencies and the alphabet in the source text. This involves the computation of symbol prob- abilities.

At step 606 super-letters in the source text are determined by the coding entity. In one embodiment of the invention, the super-letter determination is performed as explained in association of Figure 7. In the coding entity the result of the determination may also comprise a table, which is indexed using the sym¬ bol codewords code(ai) . The table entries provide the

associated prefix pi and suffix si for each symbol codeword.

At step 608 super-letter data is outputted to an output data stream by the coding entity. The output data stream is, for example, a file stored in the sec¬ ondary storage of the electronic device. The super- letter data comprises at least the number of super- letters and the symbols included in each super-letter. The super-letter data may also be outputted at another phase of the method or at the end of the method. In the output data stream the super-letter data is iden¬ tified so that the decoder may directly determine its position in the output data stream. The super-letter data entries in the output data stream reveal the re- cursive coding cycles performed during the coding pro¬ cess.

At step 610 the source text codewords code(ai) are used to form the prefix pi and suffix s± streams by the coding entity. The mapping of codewords to corresponding prefixes and suffixes may use the ta¬ ble mentioned in association with step 606.

At step 612 the suffix stream is outputted by the coding entity to the output data stream. The suf¬ fix-stream may also be outputted at another phase of the method or at the end of the method.

At step 614 each subsequent codeword prefix pair q 2 i, q 2 i + i and O≤i≤n is concatenated in the coding entity to form a concatenated prefix pair q 2 i|q 2 i + i• The concatenated prefix pairs are used to form a concate- nated prefix pair stream. In one embodiment of the in¬ vention, the concatenation is performed as simple shifting of the first super-letter q 2 i by 4 bits and using logical bit-wise or-operation with the second super-letter q 2 i + i. In this embodiment it is assumed that prefixes for each super-letter have a uniform length of 4 bits.

At step 616 the coding entity checks if the condition N≥N^ is true, wherein N is the original alphabet size and N s denotes the super-letter alphabet size, the concatenated pair stream is set as a new source text at step 620 and the method continues at step 602. From step 602 a' new recursive coding cycle is started, comprising .the ' steps of super-letter de¬ termination, diving of source text and concatenation. If the condition N≥N* is false, the method continues at step 618.

At step 618 the original prefix stream qi is outputted to the output data stream by the coding en¬ tity and the method is complete. The output data stream is the coded text. The original prefix stream may also be outputted at another phase of the method as soon as it has been determined.

Figure 7 is a flow chart depicting a super- letter determination method in one embodiment of the invention. The determination method is performed in a coding entity within an electronic device-.

At step 702 a threshold value T for grouping alphabet symbols into a super-letter is initialized. The threshold value is compared with a ratio R of av¬ erage code length before and after grouping of symbols in super-letters of given size. The threshold value depends on alphabet siz-e. In practice for providing N s <=16 for N=2 8 it is enough to set T=O.02 although in most practical cases the value T=O.01 may be used. Let us assume that T is set to value 0.01. At step 704 the symbols present in source text are sorted in ascending order based on their probabilities.

At step 706 a variable P is set to initial value. The variable P is used in computing exponents of 2 in descending order. For alphabet with N=2 8 P is initialized to value 8, which corresponds to 2 8 . The value 2 P is used as the super-letter size.

At step 708 the ratio R of average code length before and after grouping of symbols in super- letters of size 2 P is computed, wherein R = -l-p s( -l og2Ps + log 2 M) f M=2P is fche number of symbolg

i=\ grouped in the super-letter, pi is the probability of a symbol a t and p s is the aggregate probability p s — ∑/>, of symbols in the super-letter S. cii≡S

At step 710 it is checked if the condition R≤T is true. If the condition is false, the method continues at step 716 where it is set P=P-I and there¬ upon M=2 P . Otherwise, the method continues at step 712. It should be noted that the reaching of P=O is impossible under the condition that R≤T is false.

At step 712 M symbols are grouped to form a super-letter S. The super-letter is assigned a prefix. In one embodiment of the invention, the super-letter prefixes have a uniform length of, for example, 4 bits.

At step 714 it is checked if there are. any ungrouped symbols in the alphabet remaining. If there are, the method continues at step 718, where the M symbols comprised in super-letter are removed from the alphabet .

Figure 8 is a flow chart depicting one em- bodiment of a recursive decoding method according to the invention. The decoding method inverts the recur¬ sive coding method as disclosed in association with Figure 6.

At step 802 the coded data stream is obtained by decoding means in a decoding electronic device. The coded data stream is the output data stream as pro¬ duced using the fast recursive coding method. The de¬ coding electronic device obtains the coded data stream, for example, via a network from a coding elec-

tronic device. The coded data stream may also be read from a secondary memory associated with the decoding electronic device. In this case the coding and the de¬ coding electronic device may be the same electronic device. The decoding is performed using a decoding en¬ tity in association with the decoding electronic de¬ vice. In the coded data stream there are separate su¬ per-letter information entries for each recursive cod¬ ing cycle and separate coded prefix and coded suffix streams. The coded prefix stream is set as the input prefix stream and the coded suffix stream is set as the input suffix stream.

Af step 804 a super-letter information entry corresponding to one recursive coding cycle is read by the decoding entity. The super-letter information en¬ try comprises the symbols that have been grouped in super-letters during the processing for one source text. Using super-letter information is determined symbols contained in any super-letter, the prefixes and suffixes associated with each super-letter and the corresponding original codeword.

At step 806 a prefix is read from input pre¬ fix stream using super-letter data.

At step 808 the suffix length is determined based on the prefix and the corresponding super- letter. At step 810. the suffix is read from the input suffix stream. In the coded data stream the suffix and prefix streams are ' separate and may be read simultane¬ ously by the decoding entity. At step 812 the symbol codeword corresponding to the prefix and the suffix is output to an output data stream. The output data stream is, for example, a file, a database or arbi¬ trary media block, which will store the recovered source text at the end of the decoding process. At step 814 it is determined whether there are more prefixes remaining in the coded data stream. If there are, the method continues at step 806.

At step 816 it is determined whether there are more recursions remaining for the decoding proc¬ ess. The determination is based on the presence of further super-letter data in the coded data stream. This step checks whether the decoding is complete or whether a further recursive coding cycle is yet to be reversed. If there are more recursions remaining, at step 818 each symbol in the output data stream is di¬ vided into two prefixes, which are placed sequentially in the output data stream. The division is the reverse process for the concatenation performed at step 614 in the coding process as illustrated in Figure 6. There¬ upon, the output data stream is set as a new input prefix stream and the method continues at step 804. Figure 9A is a block diagram illustrating an electronic device 900 in one embodiment of the inven¬ tion. Electronic device 900 comprises at least a Cen¬ tral Processing Unit (CPU) 902, a first memory 904 and a second memory 908. First memory 904 is, for example, a Random Access Memory (RAM) . Second memory 908 is a non-volatile memory, for example, based on a magnetic or optic medium. Second memory 908 may be, for exam¬ ple, a magnetic or optical disk. Second memory 908 may be fixed in association with electronic device 900 or it may be removable. First memory 904 comprises at least a coding entity 906, which is, for example, a computer program configured to perform the fast recur¬ sive coding according to the invention. Second memory 908 comprises at least a source text 910 and an output data stream 912. Source text 910 and output data stream 912 are data, which is stored, for example, in a file or a database or arbitrary physical storage blocks .

Figure 9B is a block diagram illustrating a transmitting station 952 and a receiving station 972 communicating over a network 960 in one embodiment of the invention. In transmitting station 952 there is a

coding entity 954, which is, for example, a computer program configured to perform the fast recursive cod¬ ing according to the invention. In transmitting sta¬ tion 952 there is a decoding entity 974, which is, for example, a computer program configured to perform the decoding -of the data that has been compressed accord¬ ing to the invention. Decoding of the data in one em¬ bodiment of the invention is illustrated in Figure 8. The network may be, for example, a fixed network or a wireless network or a combination of both. The coded source text is transmitted by transmitting station 952 to receiving station 972.

Figure 10 is a block diagram illustrating a coded text 1000 in one embodiment of the invention. The coded text, that is, the output data stream com¬ prises symbol group entries 1002, 1004 and 1006. There are symbol group entries for each recursive coding cy¬ cle. The symbol group entries comprise the super- letter information. There is also a prefix stream 1008 and a suffix stream 1010. Suffix stream 1010 comprises suffixes obtained during each recursive coding cycle.

It is obvious to a person skilled in the art that with the advancement of technology, the basic idea of the invention may be implemented in various ways. The invention and its embodiments are thus not limited to the examples described above; instead they may vary within the scope of the claims.