Faultless #09
24 мая 1998

LZW and GIF - Description of image formats. LZW and. GIF.

<b>LZW and GIF</b> - Description of image formats. LZW and. GIF.
Subject: Graphic formats. LZW and. GIF
Author: Steve Blackstock
---------------------------------------

                 - 1 -
 I hope this little document
will help to enlighten those who want to know
a little more about the compression algorithm Lempel-Ziv Welch, 
and specifically on its realization tion for the format of GIF.

 Before we begin, a bit of ter
terminology in the light of this document:
Symbol: a fundamental element of the data.
In ordinary text files a separate
bayt.V bitmaps, which
you are interested in this index, which
specifies the color of the pixel. I will
refer to any character as a
"K".
The stream of characters: a stream of characters such
as a data file. "Chain": a somewhat
consecutive characters. Chain length
can vary from 1 to very large
number of characters. I can indicate productivity
flax string as "[...] K ".
"Prefix": almost the same as the chain
ka, but it is understood that the prefix immediately precedes a 
character, and prefix can have zero length. I will refer to an 
arbitrary prefix as on "[...]".

"Root": a single-character string.
For most purposes, this is just a symbol,
but sometimes it can be otherwise. This
[...] K, where [...] is empty.
"Code": the number determined by the known number of bits that 
encodes the chain. The stream codes: output codes such as 
"raster data". "Element": the code and its chain.

"String table": a list of items usually
but not necessarily unique. This
should be sufficient for understanding the document.
 LZW - a method of data compression, which
extracts benefits for repeated
chains of data. Since raster data
nye usually contain quite a lot of
repetition, LZW is a good method
to compress them, and disclosure.
 At the moment, let's consider the usual
Noe encoding and decoding using the LZW-algorithm. In the GIF 
used variation of this algorithm. Compression and

Disclosure LZW manipulates three objects
mi: a character stream into a flood of codes and tables
Lyceum chains. When compressing a stream of characters
is an input and the codestream - vyhodnym.Pri Mouth is a stream 
of codes, and a stream of characters - vyhodnym.Tabli

sample of chains generated by and under pressure, and
the disclosure, but it is never transmitted from the 
compression to the disclosure and vice versa. 

                 - 2 -
 The first thing we do in LZWszhatii is to initialize our 
strings. To do this, we must choose a code size (number of

in bits) and know how many possible values ​​can take our 
characters. Let's put the code size to 12 bits, which means you 
can store 0FFF, or 4096, elements in our table

chains. Let's also assume that
We have 32 different possible characters.
(This corresponds, for example, a picture with
32 possible colors for each pixel.) To initialize the table, we
establish compliance with the code # 0 character
# 0, code # 1 is the # 1, etc., to
Code # 31 and # 31 symbols. In fact, we
indicated that each code from 0 to 31 is the root. More in the 
table will not be other codes with this property.

 Now we will begin to compress the data. Let
first define something called the "current prefix". This 
prefix, we always remember and compare

with him here in the future. I will refer to it as "[. C.]". 
Initially, the current schy prefix contains nothing. Let

and also define the "current string"
which forms the prefix and the current
the next character in the stream of characters. I
will refer to the current string as
"[. C.] K", where K - some character.
 Now look at the first character of
current characters. Let's call it P. We make
[. C.] P the current chain. (At this point,
This, of course, the root P.) Now let's run
Search string table to determine
Does it [. c.] P.
Of course, now it will happen, because
at our table during initialization were
mescheny all the roots. In this case we Nitsche
of not doing. Now make the current prefix [. C.] P. Take the 
next character from the stream of characters. Let's call it Q. 
Add the current prefix to form

[. C.] Q, ie current string. Perform
Search string table to determine
Does it [. c.] Q. In this case,
Of course, this will not happen. Aha! Now
we need something sdelat.Dobavim [. c.] Q
(Which in this case is the PQ) in the string table as code # 
32, and derive code for [. c.] to the codestream. Now begin 
again with the current prefix corresponding to the root of P. 
We continue to add characters to [. c.], to form

[. C.] K, until such time until we can find
T [. c.] K in the string table. Then output the code for [. C.] 
and add [. C.] K in string table. In pseudo code algorithm

be described approximately as follows:

    [1] Initialize string table;

    [2] [. C.] <- empty;

    [3] K <- next character in the stream

        characters;

    [4] Does [. C.] K in the table of the chains
        check?

        (Yes: [. C.] <- [. C.] K; go to [3])

        (No: add [. C.] K in the table

        chains;

        output the code for [. c.] in the stream to

        transitions;

        [. C.] <- K; go to [3])

                 - 3 -
 How easy it is! Of course, when we
execute step [3] and the input stream is not
remains more characters, you draw code
for [. c.] and leave the table. All done.
 Want an example? Let's assume that
we have a 4-character alphabet: A, B, C, D.
Stream of characters looks like ABACABA. Yes
White compress it. First, we initialize our string table:

          # 0 = A, # 1 = B,

          # 2 = C, # 3 = D.
 The first character is A, which is included in
string table, so [. c.] hundred
becomes equal to A. Next we get AB, which is not included in 
the table, so we derive the code # 0 (for [. c.]), and adding

AB it in the string table as code # 4.
[. C.] becomes B. Next we get
[. C.] A = BA, which is not included in the table
chains, therefore the output code # 1, and
add BA to the string table with code
# 5. [. C.] becomes A. Next, we
take the AC, which is not included in the table chain
kidneys. Output code # 0, and add the AC in
string table as code # 6. Now [. C.]
equals C. Next we get [. C.] A = CA, which is not included in 
tablitsu.Vyvodim # 2 C, and add CA to the table as code # 7.

Now [. C.] = A. Next we get AB, which is a string table, so [. 
C.] becomes equal to AB, and we looking for ABA, which is not 
in string table therefore, we derive the code for AB, which

is # 4, and add ABA to the table of the chains
check under the code # 8. [. C.] is equal to A. We do not
can no longer take the characters, so we
output code # 0 for A and ends. Consequently, the codestream is 
# 0 # 0 # 1 # 2 # 4 # 0.  A few words (three), it should be 
said about efficiency: use hashing strategy. Search for the 
string table can have resulted in significant computation and 
hashing reduces these costs. Note that "direct

my LZW "compression runs the risk of overflow string table - 
and you can code that can not be represented

number of bits previously set for which
dov.Suschestvuet several ways for
To cope with this problem, and GIF
implements the most simple of them. We will
do well.
 An important point to which you should pay attention to is the 
fact that in any point during the compression of the condition: 
if [...] K goes into the string table, [...] is also included 
in it. This leads to an effective method of storing chain 
vtablitse. Instead of memorizing the table

the whole chain, use the fact that any
chain can be represented as a prefix plus a character: [...] K. 
If you make [...] K to the table, you know that [...]

already there, so you can
remember the code for [...] plus closing
symbol K.
 It's all about what care should be taken when
compression. Disclosure, perhaps more difficult
conceptually, but the software implementation of it easier.

                 - 4 -
 We describe how to do it. We again begin by initializing the 
table tsepochek.Eta table is formed based on the knowledge

that we have of generators in
Eventually the stream of symbols, for example,
about the possible values ​​simvolov.V GIF-fi
crystals, this information is in the header, the number of 
possible pixel values However, the beauty of LZW is that

that's all we need. Compression was carried out in such a way 
that we never meet in a stream of codes code, which we

could not be transformed into a chain.
 We need to define something called
emoe "current Code" to which we
referred to as "", and "old code"
to which we refer to as "<old>".
To begin unpacking take the first
code. Now it becomes . This
code will initialize the string table as the root. Derive the 
root a stream of characters. This code does the old

Code <old>. (*) Now, take the following
code and assigning it to . Maybe
that this code is not included in the string table, but let's 
assume that it there. Derive the chain corresponding  in 
a stream of characters. Now we find We shall first character in 
the chain, you just received. Call it K. Doba

Wim him to the prefix [...], generated by <old>, to get
new string [...] K. Add this string to the string table and set 
the old <old> code to the current code .

Repeat on the place that I train
nachil asterisk and you're all done. Read this paragraph again 
if you just "Run" on it!

 Now let's consider the possibility that  not included in 
the table chain kidneys. Get back to the compression and try to 
understand what happens when you input stream appears the 
chain-type P [...] P [...] PQ. Assume that P [...]

is already in the table, but P [...] P - no.
The encoder performs parsing P [...], and finds that P [...] P 
is not in the table. This will lead to the conclusion between 
the code for P [...] and add P [...] P the string table. Then 
he will take P [...] P for the chain and find that P [...] P is 
in the table and give output code for P [...] P, if it turns out

that P [...] PQ in the table is missing.
 The decompressor is always "on one
step back "encoder. When the decoder sees the code for P [...] 
P, it will not add this code to your table right away because 
he needed the beginning character P [...] P to add to the chain 
for settlement glacial code P [...], to form

code for P [...] P. However, when decoding
schik find the code that it is still unknown, he will always be 
on a more recent addition to the table. Investigator but he can 
guess that the chain for This code should be and, in fact, only

yes is correct.
If I decoder, and I saw the code # 124,
and my table-chain contains the last
code only # 123, I can assume that
code # 124 must be, add it to the model
her string table and to bring the very chain. If code # 123 
generates a chain on I will refer to here as the prefix

[...], Then code # 124 in this particular case
will be [...] plus the first character of [...].
So I have to add the first symbol
[...] To itself. Not so bad.

                 - 5 -
As an example (often encountered
radiating), let's assume that we
we have a bitmap image in which
the first three pixels have the same color
Ie My character stream looks like: QQQ
For definiteness, let us say that we
we have 32 colors and Q corresponds to the color
# 12.Kodirovschik generate a sequence
sequence codes 12.32 ,....( if you do not understand
why, take a minute to understand.)
Recall that # 32 is not included in the initial
nuyu table that contains codes from # 0
to # 31. Decoder will see # 12 and
translates it as color Q. Then he shall see
leads # 32, about the significance of which he is still
do not know. But if he thinks about it long enough, he will be 
able to understand that QQ must be an element of # 32 in the 
table and QQ shall be the following chain of inference.

 Thus, the pseudo-code decoding
can be represented as follows:

    [1] Initialize string chains;

    [2] to take the first code: ;

    [3] output circuit for a 

         stream of characters;

    [4] <old> = ;

    [5]  <- the following code in the stream

         codes;

    [6], there is  table

         chains?

         (Yes: output chain for  in

         stream of characters;

         [...] <- Translation for <old>;
K <- first character translation for ;

    add [...] K to the string table;

    <old> <- ;)

    (No: [...] <- translation for <old>;

    K <- first character [...];

    output [...] K in a stream of characters and to
    addition of him to his table to the chains
    check; <old> <- )

    [7] go to [5];
Again, if you discover in step [5]
there are no more characters, you must Act
chit. Conclusion chains and finding the initial character in 
them pose themselves efficiency problems, but I'm not going to

here to offer their solutions. Half the fun of programming
is to allow these pieces!

                 - 6 -

                  - And now GIF'a variations on this theme. In
the header part of GIF-file there is a field called the stream 
of raster data "Code size". This is very confusing

name for this field, but we need to
him to accept. In fact, this "size
root. "actual size (in bytes) of compression codes actually 
changes in the process of compression / decompression, and I 
will refer to it here as the "size

compression. "Primary table, as usual,
contains codes for all the roots, but its
added to the top of the two special codes. Suppose we have the 
"size Code ", which is usually equal to the number of bits

per pixel. N. If we denote it by the number of bi
Comrade per pixel is 1, N must be 2: the roots take up slots # 
0 and # 1 in the initial table and two special code

will take up slots # 4 # 5. In any other
GOM case, N equals the number of bits per pixel, the roots take 
up slots from # 0 to # (2 ** N-1), and special codes are

(2 ** N) and (2 ** N + 1). Initial size
compression will be equal to N +1 bit of the code. If
you're encoding, you output, we first
LA codes of length (N +1) bits, and if you Veda
those decoding, you choose first
(N +1) bits from the codestream. As
special codes are used:  or
cleanup code, is (2 ** N), and <EOI> or
end information, is (2 ** N +1).  Said encoder, which need 
to re-ini ized string table, and pereusta

novit compression size equal to (N +1). <EOI>
means that the more codes you've net.Esli
Dete encoding or decoding, you
need to start adding items to a table
face chains  + 2. If you're
coding, you should bring in 
as the very first code, and then
again each time, once you reach
those code # 4095 (hex FFF),
because GIF does not allow compression sizes greater than 12 
bits. If you're opening, you should reinitialize your string 
table as soon as you find .  Variable compression size is 
actually does not cause much trouble. If you're encoding you 
start with a size compression in the (N +1) bits, and as soon 
as you output the code (2 ** (compression size) -1), you

increase the compression size by one bit.
Consequently, the following code to your conclusions
Let there be one bit longer. Remember
that the largest compression size is 12
bits, which corresponds to code 4095. If
you have reached this limit, you must withdraw  as the next 
code start. If you're decoding, you must increase your 
compression size Tia Once you burn an element

# (2 ** (compression size) -1) in the string table. The 
following code, that you read will be one bit longer. Do not do

mistake of waiting until you will need to
add to the table code (2 ** compression size). You already 
missed the last bit of koda.Upakovka codes bitstream raster 
data is also a potential stumbling block for beginners encoding 
and decoding. Mlad shy-bit code must match with the younger

available bit of the first available byte
in a stream of codes. For example, if you started
with a 5-bit compression codes, and your three lane
Out of code, say, , <fghij>,
<klmno>, where e, j, and bits o # 0, your code will flow as:

                  - 7 -

             byte # 0: hijabcde

             byte # 1:. klmnofg
 Thus the distinction between ordinary
LZW and GIF LZW to lie in the presence of
two additional special codes and
variable amount of compression. If you understand LZW, and you 
understand these differences, you get everything! As P.S. You 
may have noticed that the encoder has a little bit flexible 
during compression. I described "Greedy" way, choosing to output

code, so many characters, how
possible. In fact, this method of explosion
us to a standard for LZW and results in the best compression 
ratio. However, there is no rule that forbids Lo would you stop 
and output the code for the current prefix, regardless of what

On whether he was already on the table or not, and
add this string plus the next character in the string table. 
There are different nye reasons to wish to do so,

especially if the chain is too long and
creates difficulties in heshirovanii.Esli
you need it, do it. I hope this
help you.

                        Steve Blackstock

PS Steve Blackstock helped porusski speak at the Institute of 
Applied Mathematics AH CCCP A. Samotohin






Other articles:

ASM4KOFF - Run the compiled melodies. Loader at ACME. Using on-screen file ArtStudio (display). Maximum speed on the conclusion of sprites. As quickly as possible output points. Fastest up the stack.

CCLFSTM - All disk copier 128/512K.

CCLFSTM - The album "Backstreet Boys"

CCLFSTM - News from the Spark group

CCLFSTM - Description of system software: Power Code Decrusher v6.2

Demo Design - On the history demomeykinga.

Demo Design - Useful and interesting solution algorithms (implementation Fong).

Flash - The extension of the color palette ZX-Spectrum.

GFK Fraktiuit - Secrets of the graphic standards GX1.

LZW and GIF - Description of image formats. LZW and. GIF.

MUSICNEWS1 - Music kaleidoscope of pop ...

MUSICNEWS2 - METALL NEWS.

NEWS of Picon - The draft of the future ROMs. BASIC subroutine 48.

OPERATEXT - From the history of the demo Oper'y.

PRICE - Price list for products firm Scorpio.

RUSH - On the coterie in Chernigov, in April 1998.

SPECCY AF - Immortal Speccy.

Did you know - Passwords, eternal time and the bombs in the game Last Courier. Passwords for the game: X-Reversy; and muzykalkam: Branch of Mind demo, Diesirae demo. Hidden part in Faultless 2, 3, 4, 5, 8 (passwords) ...

Introduction - On the pros and cons of numbers.

Medem - The history of Zaporizhzhya modem.



Similar articles:

В этот день...   23 November