Inferno #04
22 июня 2003
  Софт  

Softinka - Format RAR 2.x. Technical information.

<b>Softinka</b> - Format RAR 2.x. Technical information.
 ... At this point Alone Coder moved for
The Pentagon continued to recruit and articles
it. Despite resisting the keyboard, it was easier:)


            Format RAR 2.x.

I really liked this format from the point
view of the rapidity unpacker, correlated
quality compression, so I decided to you about
it tell.
To get started give proprietary technical
information that was a bit obvious that
and where the file is:)

--------- Begin of techinfo.txt ---------
Technical Information for RAR version 2.70
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Describes the format FILE VALID

  Only for RAR VERSION 1.50 AND OVER

==========================================

       RAR archive file format
==========================================

Archive file consists of blocks of varying lengths. The order 
of these blocks can vary, but the first block should always be

be a marker block followed by block
header file.
Each unit begins with the following fields:

HEAD_CRC 2 bytes CRC of the block

                          or part thereof
HEAD_TYPE 1 byte Block type
HEAD_FLAGS 2 bytes Block flags
HEAD_SIZE 2 byte block size
ADD_SIZE 4 bytes Optional

                           margin: add

                           to block size
ADD_SIZE field is present only if
(HEAD_FLAGS '0x8000)! = 0

Total block size is specified in the HEAD_SIZE
- If (HEAD_FLAGS '0x8000) == 0 - or
HEAD_SIZE + ADD_SIZE, if there is a field ADD_SIZE
- While (HEAD_FLAGS '0x8000)! = 0.

In all the blocks following bits in HEAD_FLAGS
have the same meaning:
0x4000 - if set, older versions

         RAR will ignore this block

         and remove it when the arch
         va;

         if not checked, the block copy
         induces a new archive file when

         the archive;
0x8000 - if set, then there

         field ADD_SIZE,

         and the size of a full block is

         HEAD_SIZE + ADD_SIZE.

Declared block types:
HEAD_TYPE = 0x72 marker block
HEAD_TYPE = 0x73 archive header
HEAD_TYPE = 0x74 file header
HEAD_TYPE = 0x75 header comment
HEAD_TYPE = 0x76 electronic signature

                      old type
HEAD_TYPE = 0x77 subunit
HEAD_TYPE = 0x78 information for

                      recovery
HEAD_TYPE = 0x79 electronic signature

Comment block is only used inside other blocks.

Processing of the archive is as follows:
1. Read and verify the block marker
2. Reading header file
3. Read or skipped HEAD_SIZE-size (MAIN_HEAD) bytes
4. If you find an end to the archive, the archive processing 
stops, otherwise Read 7 bytes in the fields:

HEAD_CRC, HEAD_TYPE, HEAD_FLAGS, HEAD_SIZE.
5. Verified HEAD_TYPE.
If HEAD_TYPE == 0x74
 read file header (first 7 bytes

  already read)
 read or skip HEAD_SIZE-time
  measures (FILE_HEAD) bytes
 read or skip FILE_SIZE bytes
otherwise
read the corresponding block HEAD_TYPE:

   read HEAD_SIZE-7 bytes

   if (HEAD_FLAGS '0x8000)

    read ADD_SIZE bytes
6. Go to Step 4.

==========================================

             Format block
==========================================

Marker block (MARK_HEAD)
~~~~~~~~~~~~~~~~~~~~~~~

HEAD_CRC Always 0x6152
2 bytes
HEAD_TYPE Header type: 0x72
1 byte
HEAD_FLAGS Always 0x1a21
2 bytes
HEAD_SIZE Block size = 0x0007
2 bytes
Marker block is actually considered
fixed sequence of bytes:
0x52 0x61 0x72 0x21 0x1a 0x07 0x00

(Prim.AlCo: well-chosen CRC!!! Issued
an interesting sequence: "Rar!", Esc,
Bell and Nul, convenient and visual recognition, and to verify 
the bearer Data for lice.)

(Note Shaitan: actual developer, Eugene Roshal, danogo format 
originally sought to visually identify the archive. Roughly the 
same story with other archive formats, and in general, the 
presence of a unique and visual signatures in different kinds 
of files have already become good manners) 

Header file (MAIN_HEAD)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

HEAD_CRC CRC fields from HEAD_TYPE to
2 bytes RESERVED2
HEAD_TYPE Header type: 0x73
1 byte
HEAD_FLAGS Bit flags:
2 bytes

           0x01 - attribute volumes (vol.

                   multi-volume archive)

           0x02 - There's archival commentary
                   tary

           0x04 - Attribute Block

                   Archives

           0x08 - attribute the continuous

                   (Solid) archive

           0x10 - Not used

           0x20 - There is information about

                   Author or electronic

                   Signature (AV)

   other bits in HEAD_FLAGS zarezervi
   modified for domestic use.

HEAD_SIZE total amount of archival
2 byte header, including archival

                Comments
RESERVED1 Reserved
2 bytes
RESERVED2 Reserved
4 bytes

Comment block
present if (HEAD_FLAGS '0x02)! = 0

Header file (a file in the archive)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
HEAD_CRC CRC fields from HEAD_TYPE to
FILEATTR 2 bytes and file name
HEAD_TYPE Header type: 0x74
1 byte
HEAD_FLAGS Bit flags:
2 bytes

           0x01 - file continued from

                   previous volume

           0x02 - file continued to

                   next volume

           0x04 - file encrypted password

           0x08 - is a file comment

           0x10 - uses information

                   previous file

                   (Flag of continuity)

                   (For RAR 2.0 and above)


       Bits 7 6 5 (for RAR 2.0 and above)

            0 0 0 - dictionary size 64 KB

            0 0 1 - dictionary size 128 KB

            0 1 0 - dictionary size 256 KB

            0 1 1 - dictionary size 512 KB

            1 0 0 - dictionary size 1024KB

            1 0 1 - reserved

            1 1 0 - reserved

            1 1 1 - file is directory


          0x100 - there are fields

                   HIGH_PACK_SIZE and

                   HIGH_UNP_SIZE. These fields

                   used only for

                   Archiving is more
                   lshih files (more than 2

                   GB), for smaller files

                   These fields are absent.

         0x8000 - this bit is always tired
                   resistivity, since the total time
                   action block HEAD_SIZE +

                   + PACK_SIZE (plus

                   HIGH_PACK_SIZE, if

                   bit set 0x100)

HEAD_SIZE Full size header
2 bytes of the file, including file name and

                Comments
PACK_SIZE size in the archive
4 bytes (compressed)
UNP_SIZE source file size
4 bytes (uncompressed)
HOST_OS used in the arhiviro1 bytes Vania operating system:

                    0 - MS-DOS

                    1 - OS / 2

                    2 - Win32

                    3 - Unix

                    4 - Mac OS

                    5 - BeOS
FILE_CRC CRC File
4 bytes
FTIME date and time in standard
4-byte MS-DOS format
UNP_VER version RAR, required for
1 byte file extraction
METHOD Compression
1 byte
NAME_SIZE size of the file name
2 bytes
File attributes ATTR
4 bytes
HIGH_PACK_SIZE Senior 4-byte 64-bit
4-byte compressed file size.

                Optional value,

                is present,

                only if bit 0x100 in

                HEAD_FLAGS installed.
HIGH_UNP_SIZE Senior 4-byte 64-bit
4 bytes size of the uncompressed file.

                Optional value,

                is present,

                only if bit 0x100 in

                HEAD_FLAGS installed.
FILE_NAME File name - string

                size NAME_SIZE bytes

Comment block
present if (HEAD_FLAGS '0x08)! = 0

Comment block
~~~~~~~~~~~~~~~~

HEAD_CRC CRC fields from HEAD_TYPE to
2 bytes COMM_CRC
HEAD_TYPE Header type: 0x75
1 byte
HEAD_FLAGS Bit flags
2 bytes
HEAD_SIZE header size kommenta2 bytes theory + size of comment
UNP_SIZE size uncompressed bytes kommenta2 theory
UNP_VER version RAR, required for
1 byte extracting comments
METHOD Compression
1 byte
COMM_CRC CRC comment
2 bytes
COMMENT Comment

Block for more information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

HEAD_CRC CRC block
2 bytes
HEAD_TYPE Header type: 0x76
1 byte
HEAD_FLAGS Bit flags
2 bytes
HEAD_SIZE Total block size
2 bytes
INFO Other data

Subunit
~~~~~~~

Object in the game (block or header) can
accompanied subunits. Subunit depends on
from the main object. Subunit may be
deleted or moved to a new version of the archive
when it is updated.

Subunit contains the following fields:

HEAD_CRC CRC block
2 bytes
HEAD_TYPE Header type: 0x77
1 byte
HEAD_FLAGS Bit flags
2 bytes (HEAD_FLAGS '0x8000) == 1,

                as the full-size

                block is

                HEAD_SIZE + DATA_SIZE
HEAD_SIZE Total block size
2 bytes
DATA_SIZE total amount of data
4 bytes
SUB_TYPE Type subblock
2 bytes
RESERVED Must be 0
1 byte
Other fields Other fields depending

                the type of subblock

Subunit extended attributes OS / 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

HEAD_CRC CRC block
2 bytes
HEAD_TYPE Header type: 0x77
1 byte
HEAD_FLAGS Bit flags
2 bytes (HEAD_FLAGS '0x8000) == 1,

                as complete

                block size is

                HEAD_SIZE + DATA_SIZE
HEAD_SIZE Total block size
2 bytes
DATA_SIZE total amount of data
4 bytes (compressed size of extended

                attributes)
SUB_TYPE 0x100
2 bytes
RESERVED Must be 0
1 byte
UNP_SIZE uncompressed size
4 bytes of extended attributes
UNP_VER version RAR, required for
1 byte extraction Extended

                Attributes
METHOD Compression
1 byte
EA_CRC CRC extended attributes
4 bytes

==========================================

               Notes
==========================================

1. To handle SFX-archive to skip module SFX, for which the 
archive is searched block-markera.V very SFX module is missing 
a sequence of bytes of the block-marker (0x52 0x61 0x72 0x21 
0x1a 0x07 0x00). 2. CRC is calculated using the standard

polynomial 0xEDB88320. If the size of
CRC is less than 4 bytes, then only used
significant byte.
3. Coding compression method:

     0x30 - saving (without compression)

     0x31 - high-speed compression

     0x32 - fast compression

     0x33 - normal compression

     0x34 - good compression

     0x35 - maximum compression
4. The version number of RAR, necessary to extract, coded as 10 
* major number version + minor version number.

---------- End of techinfo.txt ----------
Next went to gag:


         The internal format RAR.

Used compression type LZ, consisting
a copy of some of the pieces have already unpacked the file 
under the cursor. Below used terminology:

"Length links" (puts) - the length of the copied
piece;
"Offset" (disp) - offset from the cursor to
beginning of the copied piece.

To reduce redundancy, LZ coding
Huffman trees are used to a depth of
15 bits, in an amount of 4 pieces. These trees
bear the names: BD, LD, DD and RD. The first
used to extract the remaining three.
BD has 19 possible characters;
LD has 298 possible characters;
DD contains 48 possible characters;
RD contains 28 possible characters.

Below we use the term:
"Tier" (Row) - a set of symbols in the tree having the same bit 
length. 

Any packaged unit, if it is not
second and later in solid archive begins
from the block called "packed trees."


          Packed trees:

1 bit - "multimedia block" flag (0 if
no multimedia: below is only considered
such an option)
1 bit - clean old array lengths of zeros
(1 - not necessary)
19x4 bits - wood BD. Specified length in bits for all 19 
characters (itemized below). Tree generation algorithm based on 
of information on lengths, quite tricky, and Here, I will not 
give it. I can only say that the shorter characters are placed 
in the tree to the left (ie start soon from scratch than 
unity), rather than longer. 

Once the above information
read, work begins with a large table of the lengths called 
RT_Table, which has a size of 374 = 298 48 28 bytes. It 
sequences contain the length (0 .. 15) all symbols of trees LD, 
DD and RD. 

So i set the pointer to the first
byte table RT_Table and read characters from
file, using a tree BD:

0 .. 15 - add that number to the current table cell RT_Table 
and go to next cell.

tab [i] = (tab [i] + num) '15; i + +

16 - copy the previous cell N +3 times, where N (2 bits) is 
stored in a file at once after character 16.

tab [i] = tab [i-1]; i + +, etc.

17 - put in the Z +3 cells, starting from today, the number 0. 
Where Z (3 bits) is stored at once after character 17.

tab [i] = 0; i + +, etc.

18 - put in the Z +11 cells, starting from today, the number 0. 
Where Z (7 bits) is stored immediately after character 18. tab 
[i] = 0; i + +, etc. 

Filling RT_Table ends when
processed all of its 374 cells.

Thus, we have received from the file structure of all trees. 
Converting trees into convenient format for us (in my unpacker 
is a table of numbers "LowRowCode", table numbers 
"RowAdr-(LowRowCode>> Row)" and Table "Number of tree leaf-> 
symbol", where RowAdr - address of first character in the tier, 
and LowRowCode - The code corresponding to the leftmost element 
of the stage) and go on to the compressed data. 


          Packed data:

As above mentioned, before the actual data
in the file are trees. Under the same condition,
that is required for the presence of trees at the beginning of 
the file (ie, "do not block the second or on solid archive file 
") should be (Or should not anyway) to reset the table

previous shifts. The thing that always keeps the extractor 4 
previous realized bias. These shifts - 20-bit number (maximum 
link back = 1 MB). However, I'm not sure if they cancel it 
because really necessary, but who knows what is on the mind

packer?

Read from a file token by using a tree
LD:

0 .. 255 - just a symbol. Put it into the output stream.

256 - Repeat the previous shift and the length of the previous 
links. We realize this link. 

257 .. 260 - take one of the previous 4 shifts (257 = most new, 
258 = two years ago etc.). Read token, using a tree

RD. This token indicates the number of rows in
Table midBIT. It is taken from the length of the links and the 
number of bits that need it now optional finish reading from a 
file to add them to the length of links. Then the length of a 
little more adjusted: if disp> = 257, increment the length;

if disp> = 32768, again increment
length;
if disp> = # 40000, another increment
length.
Uffff. We realize the link.

261 .. 268 - if you subtract 261, get the number
rows in the table litBIT. From it is taken
displacement and the number of bits that need to finish reading 
from a file to correct this bias. The length of links is equal 
to 2. We realize the link. 

269 ​​- read the new trees (format see
above). Nothing in the unpacker is not initialized! Changing 
only the trees! 

270 .. 297 - subtract 270, get the line number
in midBIT, took out a length of links and
The number of bits of its correction.
Increment length.
Read token, using a tree DD. It specifies the number of rows in 
the table bigBIT - take out the offset and the number of bits 
for its correction. If disp> = 32768, length increment;

if disp> = # 40000, another increment
length.
We realize the link.

Unpacking ends when extracted
many bytes of the unpacked file as
indicated in the header of the file.

Everything! :) Next time I'll tell you about
multimedia compression and encryption,
unless, of course, he'll deal:)






Other articles:

Events - On completion of the first part of a virtual musical party The Compo.

Softinka - On operating systems for Spectrum ChAOS and ZXVGS.

Inferno - The authors of the magazine.

Pentagon - Instructions on how to activate unused (zero), the banks ROM in your computer Pentagon.

Pentagon - Instructions for remaking the Pentagon-128 to exit at Reset'u in the 0-th bank ROM 27512.

Gameland - Black Raven Passage of game: Unknown shipment. Disk 1.

Gameland - Black Raven Passage of game: Unknown shipment. Disk 2.

Softinka - Description of the GUI for disk-TR-DOS - ChAOS.

Inferno - On the shell.

Softinka - Editor of two screen graphics DouBleScreen Editor v0.4.

Softinka - Operating system ZXVGS. Composition versions software.

Inferno - Introduction by the editors.

Iron - The results of the development of coders RGB - PAL / NTSC, at the end of 2002.

Gameland - On the game King's Bounty 3, Black Raven: Unknown shipment.

Others - On the survey.

For Coderz - Macros for assembler Alasm v4.4x.

Mathematics - Mandelbrot fractal.

Softinka - Music Editor Pro Tracker v3.71. Features of the program.

Softinka - Format RAR 2.x. Technical information.

Others - Registered users ZXVGS and CPM22QED.

Softinka - File Types defined in the OS ZXVGS.

Softinka - The functions of the operating system ZXVGS.

Softinka - The appearance of the operating system ZXVGS.

Softinka - IDEDOS - access to hard disks in OS ZXVGS.

Softinka - The description of the operating system ZXVGS.

Softinka - MEMDISK - file system for storing files in memory.

Softinka - OS Releases ZXVGS and their differences.

Softinka - Resident System Extensions (RSX) in ZXVGS.

Softinka - Version of the new operating system for Spectrum ZXVGS.

Iron - Advanced Keyboard sinclair-compatible personal computers.

For Coderz - An algorithm for finding the integer part of square root.

Events - Nominees virtual musical party The Compo.


Темы: Игры, Программное обеспечение, Пресса, Аппаратное обеспечение, Сеть, Демосцена, Люди, Программирование

Similar articles:
Iron - locking scheme buggy ports on the Scorpion and Profi.
Muzoboz - Describes the two contest works: Axis of Evil, Faces-I.
Feedback - contact the publisher.
Feedback - contact the publisher.

В этот день...   21 November