Online calculator: LZW

The calculators in this article are used to compress and decompress a string using the LZW algorithm. The LZW method is simple and reliable, and it does not require storing a dictionary - the dictionary is dynamically generated during compression and decompression.
To study the algorithm's logic, check the "Show details" box - the calculators will display the work log and generated phrase dictionaries.

LZW text compression

Text

Text to count words

Initial dictionary

Encoding

Create from the input text

Enter

Encoding

Initial dictionary

Details

Empty space marker

A character to replace invisible symbol.

Missing byte marker

A symbol that is used to indicate the position of the missing byte of an incomplete multibyte character.

Number of bits in the original text

Number of bits in the compressed message

Zero bits added

Compressed message in binary form

File

Initial dictionary

Dictionary

The file is very large. Browser slowdown may occur during loading and creation.

The binary dump or binary file generated by the previous calculator can be passed to the following calculator, which will restore the original string. It is necessary to set the method of forming the source dictionary in the same way as it is set for the compression calculator. (choose the same encoding or specify the same dictionary explicitly).

LZW decompression

Input format

010000010001000000011100100011011110011000100011011110011101000001000000011011010011000010011110010001000000011011101000001010001000000011010010011011100011010100011101010011100100011001010001000000011000010001000000011010000011101011000010000011011100001000000011000100011001011000011110011001110001000000011011110011100100001011000001000000011101000011010001000000100011101010011001110011010001000011100011011100011000010011000110011101000011010010011011110011011101001001000011000010011011000011011000011011110011101111000101011000101111000110010011000011000110111000111011000111111001001010011011110001000000011000110011011110011011011000101000011101001010000100011010000011000010011100100011011010001011100001000001000000001000000101000001001000001100011011010011101010011100111000001101000000110011001011000010101001001101000101001001000100011001000011001010011100100011100110001000000011001110011010010011101100011001011000110111010010001000011101000001100011000101000010101000110001000110101000111001000111100011011100011001110011100111001001000011001010011110000011000110011001010011100001000001100011101110011010001010111111000101000011100110011101010011000111001010111010111011010111111011000010011101110011011110011101010011011000011001001010000111001100100011001100011011000011010011001011110001000000011101110011010011001001101001001011011111000001000000010001100011010011011000001000001100010011000011000010011101111010011101010100001000000111000011011010101001010101100001000000011100001000000100011101000011001011100100001100100111011000011001110001000110111011101010011010011010101101011001101011101110001011100000

Message

An encoded message. If a file is given, then the input data will be taken from the file.

File

Drag files here

Initial dictionary

Encoding

Dictionary

Initial dictionary

Encoding

Details

Empty space marker

A character to replace invisible symbol.

Missing byte marker

A symbol that is used to indicate the position of the missing byte of an incomplete multibyte character.

Text

The file is very large. Browser slowdown may occur during loading and creation.

Lempel-Ziv-Welch algorithm

The lossless compression algorithm LZ78 was published in 1978 by Abraham Lempel and Jacob Ziv and then modified by Terry Welch in 1984. After Welch's publication, the algorithm was named LZW after the authors' surnames (Lempel, Ziv, Welch). The algorithm has been patented, but all patents have expired by now, which gives us a great opportunity to publish our implementation here.

LZW algorithm description

Create the initial dictionary containing all possible characters.
Put the first character to the input phrase W.
Read next character K.
If EOF returns W, else:
If WK phrase is already in the dictionary, W ⟵ WK, go to 3,
Else return the code of W, add WK to the dictionary, W ⟵ K. Go to 3.

To decode the data generated in this way, you do not need to store the dictionary. It is recreated by itself in the process of decompression:

Create the initial dictionary containing all possible characters.
Put the first code to the input phrase W.
Read the next code K.
4.If EOF, return the character having the code W, else:
If the phrase with the WK code is not in the dictionary, return the phrase with the W code, and add the phrase with the WK code to the dictionary.
Else, assign the WK code to the input phrase and go to 3.

Welch's original article intended to encode a phrase in a dictionary with a fixed-size 12-bit code. Still, this approach may even increase the length of the encoded message for small messages compared to the original text. Therefore, it is quite common to use dynamic code length, which changes every time the dictionary limit is reached.

Dictionary size growth handling

In the compression algorithm described above, the size of the dictionary is not limited. In practice, this can lead to resource constraints when packing large amounts of data.
There are known modifications to the algorithm trying to address this problem:

LZC - the implementation of the algorithm in the compress utility limits the maximum dictionary size to 16 bits. When the maximum size is reached, the dictionary stops changing. The algorithm monitors the compression ratio and, if it degrades significantly, resets the dictionary and forms it anew.
LZT - on overflow, removes from the dictionary a phrase that has not been used for the longest time

Our implementation features

Initial dictionary

Our calculators implement an endlessly growing vocabulary, which can be too expensive for huge data. The size of phrase codes starts at 8 bits for the initial dictionary specified by standard encoding, or less for a dictionary entered by hand. In the latter case, the phrase codes' size is the minimum number of bits required to express the number of characters in the dictionary. Bit length.

Multibyte encodings

Some encodings can take more than 1 byte per character, for example, UTF-8 encoding contains characters of variable length from 1 to 4 bytes. In any case, the initial dictionary will have 256 items with codes from 0 to 255 for whatever encoding you choose.
For multibyte encodings, a dynamically generated dictionary will look quite exotic - its phrases will inevitably be composed of different parts of adjacent characters. In this case, for clarity in the output tables, we use a missing byte marker (by default, this is the '~' character). For example, when compressing the phrase "9 €", represented in UTF-8 encoding, the dynamic dictionary can be filled with combinations of the following characters:
9 - nine,
' ' - space,
€~~ - first byte of euro symbol,
~€~ - second byte of euro symbol,
~~€ - third byte of euro symbol,
€~ - the first two bytes of the euro symbol,
~€ - the last two bytes of the euro symbol,
€ - all three bytes of the euro symbol.
We have introduced this character description for convenience only. Since the constituent parts of many multibyte characters are just bytes of 8 bits, they may be the same for different characters.
for example:
€ ~~ - the first byte of the euro symbol, its code in utf-8 = 226
₽ ~~ - the first byte of the ruble symbol, its code in utf-8 = 226.

Trailing with zeros

The compression algorithm can produce a bit array with a size that is not a multiple of 8. In this case, the compression calculator fills the end with zero bits. Since our implementation uses a variable size of the phrase code, this approach can cause problems if the manually set initial dictionary is tiny.
In this example the final message will be only 18 bits and 6 bits would have to be padded with zeros to store the entire message in a binary file.
Since with the specified parameters, the phrase code's size is only 4 bits, then the extra word will be read during unpacking, and if we were to create the original dictionary with the code of the first character = 0, then the extra character would be unpacked.
We solve this problem by adding a null character to the beginning of the dictionary, which is commonly interpreted as the end of a string.
When using the encoding as the initial dictionary, this problem will never arise since the phrase code's size is always at least 8 bits.

PLANETCALC Online calculators

LZW

LZW text compression

LZW decompression

Lempel-Ziv-Welch algorithm

LZW algorithm description

Dictionary size growth handling

Our implementation features

Initial dictionary

Multibyte encodings

Trailing with zeros

Similar calculators

Comments

PLANETCALC Online calculators

LZW text compression

LZW decompression

Lempel-Ziv-Welch algorithm

LZW algorithm description

Dictionary size growth handling

Our implementation features

Initial dictionary

Multibyte encodings

Trailing with zeros

Similar calculators

Comments

Share this page