Faultless #09
24 мая 1998 |
|
LZW and GIF - Description of image formats. LZW and. GIF.
Subject: Graphic formats. LZW and. GIF
Author: Steve Blackstock
---------------------------------------
- 1 -
I hope this little document
will help to enlighten those who want to know
a little more about the compression algorithm Lempel-Ziv Welch,
and specifically on its realization tion for the format of GIF.
Before we begin, a bit of ter
terminology in the light of this document:
Symbol: a fundamental element of the data.
In ordinary text files a separate
bayt.V bitmaps, which
you are interested in this index, which
specifies the color of the pixel. I will
refer to any character as a
"K".
The stream of characters: a stream of characters such
as a data file. "Chain": a somewhat
consecutive characters. Chain length
can vary from 1 to very large
number of characters. I can indicate productivity
flax string as "[...] K ".
"Prefix": almost the same as the chain
ka, but it is understood that the prefix immediately precedes a
character, and prefix can have zero length. I will refer to an
arbitrary prefix as on "[...]".
"Root": a single-character string.
For most purposes, this is just a symbol,
but sometimes it can be otherwise. This
[...] K, where [...] is empty.
"Code": the number determined by the known number of bits that
encodes the chain. The stream codes: output codes such as
"raster data". "Element": the code and its chain.
"String table": a list of items usually
but not necessarily unique. This
should be sufficient for understanding the document.
LZW - a method of data compression, which
extracts benefits for repeated
chains of data. Since raster data
nye usually contain quite a lot of
repetition, LZW is a good method
to compress them, and disclosure.
At the moment, let's consider the usual
Noe encoding and decoding using the LZW-algorithm. In the GIF
used variation of this algorithm. Compression and
Disclosure LZW manipulates three objects
mi: a character stream into a flood of codes and tables
Lyceum chains. When compressing a stream of characters
is an input and the codestream - vyhodnym.Pri Mouth is a stream
of codes, and a stream of characters - vyhodnym.Tabli
sample of chains generated by and under pressure, and
the disclosure, but it is never transmitted from the
compression to the disclosure and vice versa.
- 2 -
The first thing we do in LZWszhatii is to initialize our
strings. To do this, we must choose a code size (number of
in bits) and know how many possible values can take our
characters. Let's put the code size to 12 bits, which means you
can store 0FFF, or 4096, elements in our table
chains. Let's also assume that
We have 32 different possible characters.
(This corresponds, for example, a picture with
32 possible colors for each pixel.) To initialize the table, we
establish compliance with the code # 0 character
# 0, code # 1 is the # 1, etc., to
Code # 31 and # 31 symbols. In fact, we
indicated that each code from 0 to 31 is the root. More in the
table will not be other codes with this property.
Now we will begin to compress the data. Let
first define something called the "current prefix". This
prefix, we always remember and compare
with him here in the future. I will refer to it as "[. C.]".
Initially, the current schy prefix contains nothing. Let
and also define the "current string"
which forms the prefix and the current
the next character in the stream of characters. I
will refer to the current string as
"[. C.] K", where K - some character.
Now look at the first character of
current characters. Let's call it P. We make
[. C.] P the current chain. (At this point,
This, of course, the root P.) Now let's run
Search string table to determine
Does it [. c.] P.
Of course, now it will happen, because
at our table during initialization were
mescheny all the roots. In this case we Nitsche
of not doing. Now make the current prefix [. C.] P. Take the
next character from the stream of characters. Let's call it Q.
Add the current prefix to form
[. C.] Q, ie current string. Perform
Search string table to determine
Does it [. c.] Q. In this case,
Of course, this will not happen. Aha! Now
we need something sdelat.Dobavim [. c.] Q
(Which in this case is the PQ) in the string table as code #
32, and derive code for [. c.] to the codestream. Now begin
again with the current prefix corresponding to the root of P.
We continue to add characters to [. c.], to form
[. C.] K, until such time until we can find
T [. c.] K in the string table. Then output the code for [. C.]
and add [. C.] K in string table. In pseudo code algorithm
be described approximately as follows:
[1] Initialize string table;
[2] [. C.] <- empty;
[3] K <- next character in the stream
characters;
[4] Does [. C.] K in the table of the chains
check?
(Yes: [. C.] <- [. C.] K; go to [3])
(No: add [. C.] K in the table
chains;
output the code for [. c.] in the stream to
transitions;
[. C.] <- K; go to [3])
- 3 -
How easy it is! Of course, when we
execute step [3] and the input stream is not
remains more characters, you draw code
for [. c.] and leave the table. All done.
Want an example? Let's assume that
we have a 4-character alphabet: A, B, C, D.
Stream of characters looks like ABACABA. Yes
White compress it. First, we initialize our string table:
# 0 = A, # 1 = B,
# 2 = C, # 3 = D.
The first character is A, which is included in
string table, so [. c.] hundred
becomes equal to A. Next we get AB, which is not included in
the table, so we derive the code # 0 (for [. c.]), and adding
AB it in the string table as code # 4.
[. C.] becomes B. Next we get
[. C.] A = BA, which is not included in the table
chains, therefore the output code # 1, and
add BA to the string table with code
# 5. [. C.] becomes A. Next, we
take the AC, which is not included in the table chain
kidneys. Output code # 0, and add the AC in
string table as code # 6. Now [. C.]
equals C. Next we get [. C.] A = CA, which is not included in
tablitsu.Vyvodim # 2 C, and add CA to the table as code # 7.
Now [. C.] = A. Next we get AB, which is a string table, so [.
C.] becomes equal to AB, and we looking for ABA, which is not
in string table therefore, we derive the code for AB, which
is # 4, and add ABA to the table of the chains
check under the code # 8. [. C.] is equal to A. We do not
can no longer take the characters, so we
output code # 0 for A and ends. Consequently, the codestream is
# 0 # 0 # 1 # 2 # 4 # 0. A few words (three), it should be
said about efficiency: use hashing strategy. Search for the
string table can have resulted in significant computation and
hashing reduces these costs. Note that "direct
my LZW "compression runs the risk of overflow string table -
and you can code that can not be represented
number of bits previously set for which
dov.Suschestvuet several ways for
To cope with this problem, and GIF
implements the most simple of them. We will
do well.
An important point to which you should pay attention to is the
fact that in any point during the compression of the condition:
if [...] K goes into the string table, [...] is also included
in it. This leads to an effective method of storing chain
vtablitse. Instead of memorizing the table
the whole chain, use the fact that any
chain can be represented as a prefix plus a character: [...] K.
If you make [...] K to the table, you know that [...]
already there, so you can
remember the code for [...] plus closing
symbol K.
It's all about what care should be taken when
compression. Disclosure, perhaps more difficult
conceptually, but the software implementation of it easier.
- 4 -
We describe how to do it. We again begin by initializing the
table tsepochek.Eta table is formed based on the knowledge
that we have of generators in
Eventually the stream of symbols, for example,
about the possible values simvolov.V GIF-fi
crystals, this information is in the header, the number of
possible pixel values However, the beauty of LZW is that
that's all we need. Compression was carried out in such a way
that we never meet in a stream of codes code, which we
could not be transformed into a chain.
We need to define something called
emoe "current Code" to which we
referred to as "", and "old code"
to which we refer to as "<old>".
To begin unpacking take the first
code. Now it becomes . This
code will initialize the string table as the root. Derive the
root a stream of characters. This code does the old
Code <old>. (*) Now, take the following
code and assigning it to . Maybe
that this code is not included in the string table, but let's
assume that it there. Derive the chain corresponding in
a stream of characters. Now we find We shall first character in
the chain, you just received. Call it K. Doba
Wim him to the prefix [...], generated by <old>, to get
new string [...] K. Add this string to the string table and set
the old <old> code to the current code .
Repeat on the place that I train
nachil asterisk and you're all done. Read this paragraph again
if you just "Run" on it!
Now let's consider the possibility that not included in
the table chain kidneys. Get back to the compression and try to
understand what happens when you input stream appears the
chain-type P [...] P [...] PQ. Assume that P [...]
is already in the table, but P [...] P - no.
The encoder performs parsing P [...], and finds that P [...] P
is not in the table. This will lead to the conclusion between
the code for P [...] and add P [...] P the string table. Then
he will take P [...] P for the chain and find that P [...] P is
in the table and give output code for P [...] P, if it turns out
that P [...] PQ in the table is missing.
The decompressor is always "on one
step back "encoder. When the decoder sees the code for P [...]
P, it will not add this code to your table right away because
he needed the beginning character P [...] P to add to the chain
for settlement glacial code P [...], to form
code for P [...] P. However, when decoding
schik find the code that it is still unknown, he will always be
on a more recent addition to the table. Investigator but he can
guess that the chain for This code should be and, in fact, only
yes is correct.
If I decoder, and I saw the code # 124,
and my table-chain contains the last
code only # 123, I can assume that
code # 124 must be, add it to the model
her string table and to bring the very chain. If code # 123
generates a chain on I will refer to here as the prefix
[...], Then code # 124 in this particular case
will be [...] plus the first character of [...].
So I have to add the first symbol
[...] To itself. Not so bad.
- 5 -
As an example (often encountered
radiating), let's assume that we
we have a bitmap image in which
the first three pixels have the same color
Ie My character stream looks like: QQQ
For definiteness, let us say that we
we have 32 colors and Q corresponds to the color
# 12.Kodirovschik generate a sequence
sequence codes 12.32 ,....( if you do not understand
why, take a minute to understand.)
Recall that # 32 is not included in the initial
nuyu table that contains codes from # 0
to # 31. Decoder will see # 12 and
translates it as color Q. Then he shall see
leads # 32, about the significance of which he is still
do not know. But if he thinks about it long enough, he will be
able to understand that QQ must be an element of # 32 in the
table and QQ shall be the following chain of inference.
Thus, the pseudo-code decoding
can be represented as follows:
[1] Initialize string chains;
[2] to take the first code: ;
[3] output circuit for a
stream of characters;
[4] <old> = ;
[5] <- the following code in the stream
codes;
[6], there is table
chains?
(Yes: output chain for in
stream of characters;
[...] <- Translation for <old>;
K <- first character translation for ;
add [...] K to the string table;
<old> <- ;)
(No: [...] <- translation for <old>;
K <- first character [...];
output [...] K in a stream of characters and to
addition of him to his table to the chains
check; <old> <- )
[7] go to [5];
Again, if you discover in step [5]
there are no more characters, you must Act
chit. Conclusion chains and finding the initial character in
them pose themselves efficiency problems, but I'm not going to
here to offer their solutions. Half the fun of programming
is to allow these pieces!
- 6 -
- And now GIF'a variations on this theme. In
the header part of GIF-file there is a field called the stream
of raster data "Code size". This is very confusing
name for this field, but we need to
him to accept. In fact, this "size
root. "actual size (in bytes) of compression codes actually
changes in the process of compression / decompression, and I
will refer to it here as the "size
compression. "Primary table, as usual,
contains codes for all the roots, but its
added to the top of the two special codes. Suppose we have the
"size Code ", which is usually equal to the number of bits
per pixel. N. If we denote it by the number of bi
Comrade per pixel is 1, N must be 2: the roots take up slots #
0 and # 1 in the initial table and two special code
will take up slots # 4 # 5. In any other
GOM case, N equals the number of bits per pixel, the roots take
up slots from # 0 to # (2 ** N-1), and special codes are
(2 ** N) and (2 ** N + 1). Initial size
compression will be equal to N +1 bit of the code. If
you're encoding, you output, we first
LA codes of length (N +1) bits, and if you Veda
those decoding, you choose first
(N +1) bits from the codestream. As
special codes are used: or
cleanup code, is (2 ** N), and <EOI> or
end information, is (2 ** N +1). Said encoder, which need
to re-ini ized string table, and pereusta
novit compression size equal to (N +1). <EOI>
means that the more codes you've net.Esli
Dete encoding or decoding, you
need to start adding items to a table
face chains + 2. If you're
coding, you should bring in
as the very first code, and then
again each time, once you reach
those code # 4095 (hex FFF),
because GIF does not allow compression sizes greater than 12
bits. If you're opening, you should reinitialize your string
table as soon as you find . Variable compression size is
actually does not cause much trouble. If you're encoding you
start with a size compression in the (N +1) bits, and as soon
as you output the code (2 ** (compression size) -1), you
increase the compression size by one bit.
Consequently, the following code to your conclusions
Let there be one bit longer. Remember
that the largest compression size is 12
bits, which corresponds to code 4095. If
you have reached this limit, you must withdraw as the next
code start. If you're decoding, you must increase your
compression size Tia Once you burn an element
# (2 ** (compression size) -1) in the string table. The
following code, that you read will be one bit longer. Do not do
mistake of waiting until you will need to
add to the table code (2 ** compression size). You already
missed the last bit of koda.Upakovka codes bitstream raster
data is also a potential stumbling block for beginners encoding
and decoding. Mlad shy-bit code must match with the younger
available bit of the first available byte
in a stream of codes. For example, if you started
with a 5-bit compression codes, and your three lane
Out of code, say, , <fghij>,
<klmno>, where e, j, and bits o # 0, your code will flow as:
- 7 -
byte # 0: hijabcde
byte # 1:. klmnofg
Thus the distinction between ordinary
LZW and GIF LZW to lie in the presence of
two additional special codes and
variable amount of compression. If you understand LZW, and you
understand these differences, you get everything! As P.S. You
may have noticed that the encoder has a little bit flexible
during compression. I described "Greedy" way, choosing to output
code, so many characters, how
possible. In fact, this method of explosion
us to a standard for LZW and results in the best compression
ratio. However, there is no rule that forbids Lo would you stop
and output the code for the current prefix, regardless of what
On whether he was already on the table or not, and
add this string plus the next character in the string table.
There are different nye reasons to wish to do so,
especially if the chain is too long and
creates difficulties in heshirovanii.Esli
you need it, do it. I hope this
help you.
Steve Blackstock
PS Steve Blackstock helped porusski speak at the Institute of
Applied Mathematics AH CCCP A. Samotohin
Other articles:
Similar articles:
В этот день... 23 November