Inferno #04
22 июня 2003 |
|
Softinka - Format RAR 2.x. Technical information.
... At this point Alone Coder moved for The Pentagon continued to recruit and articles it. Despite resisting the keyboard, it was easier:) Format RAR 2.x. I really liked this format from the point view of the rapidity unpacker, correlated quality compression, so I decided to you about it tell. To get started give proprietary technical information that was a bit obvious that and where the file is:) --------- Begin of techinfo.txt --------- Technical Information for RAR version 2.70 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Describes the format FILE VALID Only for RAR VERSION 1.50 AND OVER ========================================== RAR archive file format ========================================== Archive file consists of blocks of varying lengths. The order of these blocks can vary, but the first block should always be be a marker block followed by block header file. Each unit begins with the following fields: HEAD_CRC 2 bytes CRC of the block or part thereof HEAD_TYPE 1 byte Block type HEAD_FLAGS 2 bytes Block flags HEAD_SIZE 2 byte block size ADD_SIZE 4 bytes Optional margin: add to block size ADD_SIZE field is present only if (HEAD_FLAGS '0x8000)! = 0 Total block size is specified in the HEAD_SIZE - If (HEAD_FLAGS '0x8000) == 0 - or HEAD_SIZE + ADD_SIZE, if there is a field ADD_SIZE - While (HEAD_FLAGS '0x8000)! = 0. In all the blocks following bits in HEAD_FLAGS have the same meaning: 0x4000 - if set, older versions RAR will ignore this block and remove it when the arch va; if not checked, the block copy induces a new archive file when the archive; 0x8000 - if set, then there field ADD_SIZE, and the size of a full block is HEAD_SIZE + ADD_SIZE. Declared block types: HEAD_TYPE = 0x72 marker block HEAD_TYPE = 0x73 archive header HEAD_TYPE = 0x74 file header HEAD_TYPE = 0x75 header comment HEAD_TYPE = 0x76 electronic signature old type HEAD_TYPE = 0x77 subunit HEAD_TYPE = 0x78 information for recovery HEAD_TYPE = 0x79 electronic signature Comment block is only used inside other blocks. Processing of the archive is as follows: 1. Read and verify the block marker 2. Reading header file 3. Read or skipped HEAD_SIZE-size (MAIN_HEAD) bytes 4. If you find an end to the archive, the archive processing stops, otherwise Read 7 bytes in the fields: HEAD_CRC, HEAD_TYPE, HEAD_FLAGS, HEAD_SIZE. 5. Verified HEAD_TYPE. If HEAD_TYPE == 0x74 read file header (first 7 bytes already read) read or skip HEAD_SIZE-time measures (FILE_HEAD) bytes read or skip FILE_SIZE bytes otherwise read the corresponding block HEAD_TYPE: read HEAD_SIZE-7 bytes if (HEAD_FLAGS '0x8000) read ADD_SIZE bytes 6. Go to Step 4. ========================================== Format block ========================================== Marker block (MARK_HEAD) ~~~~~~~~~~~~~~~~~~~~~~~ HEAD_CRC Always 0x6152 2 bytes HEAD_TYPE Header type: 0x72 1 byte HEAD_FLAGS Always 0x1a21 2 bytes HEAD_SIZE Block size = 0x0007 2 bytes Marker block is actually considered fixed sequence of bytes: 0x52 0x61 0x72 0x21 0x1a 0x07 0x00 (Prim.AlCo: well-chosen CRC!!! Issued an interesting sequence: "Rar!", Esc, Bell and Nul, convenient and visual recognition, and to verify the bearer Data for lice.) (Note Shaitan: actual developer, Eugene Roshal, danogo format originally sought to visually identify the archive. Roughly the same story with other archive formats, and in general, the presence of a unique and visual signatures in different kinds of files have already become good manners) Header file (MAIN_HEAD) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HEAD_CRC CRC fields from HEAD_TYPE to 2 bytes RESERVED2 HEAD_TYPE Header type: 0x73 1 byte HEAD_FLAGS Bit flags: 2 bytes 0x01 - attribute volumes (vol. multi-volume archive) 0x02 - There's archival commentary tary 0x04 - Attribute Block Archives 0x08 - attribute the continuous (Solid) archive 0x10 - Not used 0x20 - There is information about Author or electronic Signature (AV) other bits in HEAD_FLAGS zarezervi modified for domestic use. HEAD_SIZE total amount of archival 2 byte header, including archival Comments RESERVED1 Reserved 2 bytes RESERVED2 Reserved 4 bytes Comment block present if (HEAD_FLAGS '0x02)! = 0 Header file (a file in the archive) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HEAD_CRC CRC fields from HEAD_TYPE to FILEATTR 2 bytes and file name HEAD_TYPE Header type: 0x74 1 byte HEAD_FLAGS Bit flags: 2 bytes 0x01 - file continued from previous volume 0x02 - file continued to next volume 0x04 - file encrypted password 0x08 - is a file comment 0x10 - uses information previous file (Flag of continuity) (For RAR 2.0 and above) Bits 7 6 5 (for RAR 2.0 and above) 0 0 0 - dictionary size 64 KB 0 0 1 - dictionary size 128 KB 0 1 0 - dictionary size 256 KB 0 1 1 - dictionary size 512 KB 1 0 0 - dictionary size 1024KB 1 0 1 - reserved 1 1 0 - reserved 1 1 1 - file is directory 0x100 - there are fields HIGH_PACK_SIZE and HIGH_UNP_SIZE. These fields used only for Archiving is more lshih files (more than 2 GB), for smaller files These fields are absent. 0x8000 - this bit is always tired resistivity, since the total time action block HEAD_SIZE + + PACK_SIZE (plus HIGH_PACK_SIZE, if bit set 0x100) HEAD_SIZE Full size header 2 bytes of the file, including file name and Comments PACK_SIZE size in the archive 4 bytes (compressed) UNP_SIZE source file size 4 bytes (uncompressed) HOST_OS used in the arhiviro1 bytes Vania operating system: 0 - MS-DOS 1 - OS / 2 2 - Win32 3 - Unix 4 - Mac OS 5 - BeOS FILE_CRC CRC File 4 bytes FTIME date and time in standard 4-byte MS-DOS format UNP_VER version RAR, required for 1 byte file extraction METHOD Compression 1 byte NAME_SIZE size of the file name 2 bytes File attributes ATTR 4 bytes HIGH_PACK_SIZE Senior 4-byte 64-bit 4-byte compressed file size. Optional value, is present, only if bit 0x100 in HEAD_FLAGS installed. HIGH_UNP_SIZE Senior 4-byte 64-bit 4 bytes size of the uncompressed file. Optional value, is present, only if bit 0x100 in HEAD_FLAGS installed. FILE_NAME File name - string size NAME_SIZE bytes Comment block present if (HEAD_FLAGS '0x08)! = 0 Comment block ~~~~~~~~~~~~~~~~ HEAD_CRC CRC fields from HEAD_TYPE to 2 bytes COMM_CRC HEAD_TYPE Header type: 0x75 1 byte HEAD_FLAGS Bit flags 2 bytes HEAD_SIZE header size kommenta2 bytes theory + size of comment UNP_SIZE size uncompressed bytes kommenta2 theory UNP_VER version RAR, required for 1 byte extracting comments METHOD Compression 1 byte COMM_CRC CRC comment 2 bytes COMMENT Comment Block for more information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HEAD_CRC CRC block 2 bytes HEAD_TYPE Header type: 0x76 1 byte HEAD_FLAGS Bit flags 2 bytes HEAD_SIZE Total block size 2 bytes INFO Other data Subunit ~~~~~~~ Object in the game (block or header) can accompanied subunits. Subunit depends on from the main object. Subunit may be deleted or moved to a new version of the archive when it is updated. Subunit contains the following fields: HEAD_CRC CRC block 2 bytes HEAD_TYPE Header type: 0x77 1 byte HEAD_FLAGS Bit flags 2 bytes (HEAD_FLAGS '0x8000) == 1, as the full-size block is HEAD_SIZE + DATA_SIZE HEAD_SIZE Total block size 2 bytes DATA_SIZE total amount of data 4 bytes SUB_TYPE Type subblock 2 bytes RESERVED Must be 0 1 byte Other fields Other fields depending the type of subblock Subunit extended attributes OS / 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HEAD_CRC CRC block 2 bytes HEAD_TYPE Header type: 0x77 1 byte HEAD_FLAGS Bit flags 2 bytes (HEAD_FLAGS '0x8000) == 1, as complete block size is HEAD_SIZE + DATA_SIZE HEAD_SIZE Total block size 2 bytes DATA_SIZE total amount of data 4 bytes (compressed size of extended attributes) SUB_TYPE 0x100 2 bytes RESERVED Must be 0 1 byte UNP_SIZE uncompressed size 4 bytes of extended attributes UNP_VER version RAR, required for 1 byte extraction Extended Attributes METHOD Compression 1 byte EA_CRC CRC extended attributes 4 bytes ========================================== Notes ========================================== 1. To handle SFX-archive to skip module SFX, for which the archive is searched block-markera.V very SFX module is missing a sequence of bytes of the block-marker (0x52 0x61 0x72 0x21 0x1a 0x07 0x00). 2. CRC is calculated using the standard polynomial 0xEDB88320. If the size of CRC is less than 4 bytes, then only used significant byte. 3. Coding compression method: 0x30 - saving (without compression) 0x31 - high-speed compression 0x32 - fast compression 0x33 - normal compression 0x34 - good compression 0x35 - maximum compression 4. The version number of RAR, necessary to extract, coded as 10 * major number version + minor version number. ---------- End of techinfo.txt ---------- Next went to gag: The internal format RAR. Used compression type LZ, consisting a copy of some of the pieces have already unpacked the file under the cursor. Below used terminology: "Length links" (puts) - the length of the copied piece; "Offset" (disp) - offset from the cursor to beginning of the copied piece. To reduce redundancy, LZ coding Huffman trees are used to a depth of 15 bits, in an amount of 4 pieces. These trees bear the names: BD, LD, DD and RD. The first used to extract the remaining three. BD has 19 possible characters; LD has 298 possible characters; DD contains 48 possible characters; RD contains 28 possible characters. Below we use the term: "Tier" (Row) - a set of symbols in the tree having the same bit length. Any packaged unit, if it is not second and later in solid archive begins from the block called "packed trees." Packed trees: 1 bit - "multimedia block" flag (0 if no multimedia: below is only considered such an option) 1 bit - clean old array lengths of zeros (1 - not necessary) 19x4 bits - wood BD. Specified length in bits for all 19 characters (itemized below). Tree generation algorithm based on of information on lengths, quite tricky, and Here, I will not give it. I can only say that the shorter characters are placed in the tree to the left (ie start soon from scratch than unity), rather than longer. Once the above information read, work begins with a large table of the lengths called RT_Table, which has a size of 374 = 298 48 28 bytes. It sequences contain the length (0 .. 15) all symbols of trees LD, DD and RD. So i set the pointer to the first byte table RT_Table and read characters from file, using a tree BD: 0 .. 15 - add that number to the current table cell RT_Table and go to next cell. tab [i] = (tab [i] + num) '15; i + + 16 - copy the previous cell N +3 times, where N (2 bits) is stored in a file at once after character 16. tab [i] = tab [i-1]; i + +, etc. 17 - put in the Z +3 cells, starting from today, the number 0. Where Z (3 bits) is stored at once after character 17. tab [i] = 0; i + +, etc. 18 - put in the Z +11 cells, starting from today, the number 0. Where Z (7 bits) is stored immediately after character 18. tab [i] = 0; i + +, etc. Filling RT_Table ends when processed all of its 374 cells. Thus, we have received from the file structure of all trees. Converting trees into convenient format for us (in my unpacker is a table of numbers "LowRowCode", table numbers "RowAdr-(LowRowCode>> Row)" and Table "Number of tree leaf-> symbol", where RowAdr - address of first character in the tier, and LowRowCode - The code corresponding to the leftmost element of the stage) and go on to the compressed data. Packed data: As above mentioned, before the actual data in the file are trees. Under the same condition, that is required for the presence of trees at the beginning of the file (ie, "do not block the second or on solid archive file ") should be (Or should not anyway) to reset the table previous shifts. The thing that always keeps the extractor 4 previous realized bias. These shifts - 20-bit number (maximum link back = 1 MB). However, I'm not sure if they cancel it because really necessary, but who knows what is on the mind packer? Read from a file token by using a tree LD: 0 .. 255 - just a symbol. Put it into the output stream. 256 - Repeat the previous shift and the length of the previous links. We realize this link. 257 .. 260 - take one of the previous 4 shifts (257 = most new, 258 = two years ago etc.). Read token, using a tree RD. This token indicates the number of rows in Table midBIT. It is taken from the length of the links and the number of bits that need it now optional finish reading from a file to add them to the length of links. Then the length of a little more adjusted: if disp> = 257, increment the length; if disp> = 32768, again increment length; if disp> = # 40000, another increment length. Uffff. We realize the link. 261 .. 268 - if you subtract 261, get the number rows in the table litBIT. From it is taken displacement and the number of bits that need to finish reading from a file to correct this bias. The length of links is equal to 2. We realize the link. 269 - read the new trees (format see above). Nothing in the unpacker is not initialized! Changing only the trees! 270 .. 297 - subtract 270, get the line number in midBIT, took out a length of links and The number of bits of its correction. Increment length. Read token, using a tree DD. It specifies the number of rows in the table bigBIT - take out the offset and the number of bits for its correction. If disp> = 32768, length increment; if disp> = # 40000, another increment length. We realize the link. Unpacking ends when extracted many bytes of the unpacked file as indicated in the header of the file. Everything! :) Next time I'll tell you about multimedia compression and encryption, unless, of course, he'll deal:)
Other articles:
Similar articles:
В этот день... 21 November