ZX Review #5-6
04 ноября 1997 |
|
Forum - compression programs.
Compression programs To begin with sore topic: compression of the text. Almost ideal is compression of the text by the method of Huffman (net) - for text (large size), this method is more progressive than LZSS (RLE generally keep quiet about). About Huffman compression and will speech. Theory Texts usually written either in Russian or English, so often repeated would be no more than 90 (Latin) or 120-140 characters (Russian). And the rough estimated 30% of them occupy very frequent letters. From all this it follows that we need to convert the algorithm Huffman compression, so that He worked in these conditions. The first thing we need to do - make a table of frequency used items - array of 256 bytes. In the first place should be the most common symbol, at last - the least. Examples will not lead, as do it is very easy. Thus, the array is obtained, now most important: developing format in which data is stored. The principle is: all numbers are divided into 5 groups according to the number of bits: Group Number of bits January 1 February 2 March 3 4 4-8 Numbers with the first of the third group coded as follows: first there is a N included bits - which number group, as many bits. Then is the sheer number (of bits in the group), the first of its value set to 0. (For ease of mastering the material, see ZXRevyu 1 / 95). Example: Number 0. Occupies 1 bit:% 0. The number of the first group. Hence, it will appear as% 10 - total 2 bits. Number 3. Occupies 2 bits: % 11. Group 2:% 11 01 - four bits. Number itself is transformed from% 11 to% 01, the program-decompressor could count the group number. As you can see, the more the group, the smaller the payoff: Group Gain (bits) June 1 April 2 March 2 But still a win great. But if you go further on this way, the 8-bit numbers will encode 16 bits 7-bit - 14, that does not suit us. Therefore, for 4-8 teams have a special format: instead of units with the number of recorded bits = 0 - sign spetsformata. Then we write more one bit: if it is 0, then for it is normal octet number (in the case of 6, 7, 8 groups). If it is 1, then followed by the 5-bit number (4, Group 5). Group (bits) Gain (bits) 4 +1 5 +1 6 -2 7 -2 8 -2 Example: number 255. 8-th group,% 11111111. After compression: % 00% 11111111 (loss of 2 bytes). Number 64. 7-I group,% 01000 000. Result:% 00% 01000000 (Again, a loss). Number 8. Group 4,% 0000100 0. Result:% 01% 01000 (gain 1 bit). But the numbers with the numbers 32 - 255 the vast majority - you might say. Here and makes itself known Huffman algorithm: in fact in the table, we coded most occurring less character codes, and especially in the texts, such symbols and will make us the majority (in Russian and English - the vowels spaces). Algorithm: uncompressed program should rasskompressirovat code, then the table find the corresponding symbol. An example of such a decompressor: (Program damp, not relotsiruemaya, not optimized). This program is only to perform the conversion code from the compressed file. You need to write your procedure to obtain the source code from the table (calculated from the resulting Opcode: LD L, N; N-number code LD H, 0, in the form of double-byte LD DE, TABLE; address table ADD HL, DE; compute address LD A, (HL); received a code Log in to the program: IY - Address compressed block, IX - address where decompression occurs. Compressed file format: first go 2 bytes - length of the file before compression (how many codes) .140. ORG 32768 LD E, (IY +0); take long LD D, (IY +1) INC IY INC IY; in IY-address file LD (LONG), DE; keep the length LD D, 7, bit 7. MAIN CALL GRUPPA; take a number of CALL PRIEM; calculate the code PUSH BC LD BC, (LONG) DEC BC; reduced the length of LD (LONG), BC LD A, B OR C POP BC JR NZ, MAIN; in the cycle RET LONG DW 0; stores the length ; Increase the bit address INCBIT PUSH AF; saved A LD A, D; number of bits AND A; if = 0, then JR Z, NEWBYTE; transition DEC A; reduced the number LD D, A; in the register D POP AF; restored A RET; Output NEWBYTE LD D, 7; new bayt.Bit 7 INC IY; increased byte POP AF; adres.Vosstanovili RET; A and output ; Check bits from the address And if bit = 0, then A will be 0 if bit = 1, , Then A will be 255. GETBIT LD A, # 47; code BIT 0, A GETBIT1 PUSH BC; remember BC LD (BT +1), A; pasted the code LD A, D AND A; if a bit is 0, then JR Z, PROH; transition LD B, A; counter bits UST LD A, (BT +1); set ADD A, 8; team BIT 0, A LD (BT +1), A; 1,2,3,4,5,6,7 DJNZ UST PROH LD A, (IY +0); take a byte BT BIT 0, A; check bits POP BC; restore BC IZ DB 0 JR Z, BIT0; if bit = 0, the transition BIT1 LD A, 255, bit = 1 RET BIT0 XOR A; bit = 0 RET , Inclusion of bits of address SETBIT PUSH AF; remembered A LD A, # C9; use GETBIT LD (IZ), A; of SETBIT: code RET LD A, # C7; code SET 0, A CALL GETBIT1; resulting in an A-B LD (IY +0), A; byte address XOR A; restored LD (IZ), A; GETBIT POP AF; restored A RET ; Getting a number of ; Entry: bit address on the tag team GRUPPA CALL GETBIT; take a bit AND A; equal to 0? JR Z, GRUP58; 5-8 bit number CALL INCBIT; increased address LD C, 1 count of GR1 CALL GETBIT; took bits AND A; equal to 0? JR Z, GR2; If yes, send the number of INC C; increased group CALL INCBIT; increased address JR GR1; in the cycle GR2 CALL SETBIT; included a bit (because 1 LD B, C; bit number equal to 1 RET; set to 0 GRUP58 CALL INCBIT; increased address CALL GETBIT; took bits AND A; equal to 0? JR Z, GRUP5; 5-bit number GRUP8 CALL INCBIT; 8-bit number LD B, 8, Group 8 RET GRUP5 CALL INCBIT; increased address LD B, 5, Group 5 RET , Getting numbers. B-group number ; Bit address is a number. PRIEM XOR A; resets rotating LD (BYTE), A; byte receiver PR CALL GETBIT; took bits AND A; if not equal to 0, then CALL NZ, WKL; include bits BYTE LD A, (BYTE); rotation bytes RLA; receiver LD (BYTE), A CALL INCBIT; increased address DJNZ PR; number of cycles = LD (IX +0), A; received a code INC IX; increased address RET; receiver WKL SCF; inclusion of bits RET BYTE DB 0, byte-receiver 2 From this follows several Huffman compression features: 1. On small files (estimated at up to 3 kb), the compressor does not will win because she decompresses the program will quite large - you need to save the table (already 256 bytes) and etc. 2. The larger the file and the less than it used characters, the more powerful compression. 3. Main feature: unlike other compressors (LZSS, RLE), in a compressed file contains no data, and their Codes! Therefore, to modify them in any Do not. (And here in LZSS and, especially, in the RLE, you can: got a doctor in a file on disk, searched its inscription - perhaps she does not compress - and If found, then changed as needed. Here you not only do not find, but also ruin The entire file ...) 4. Low speed of the decompressor (as compared to other technologies). But that's the theory - so that who tries Huffman in practice - write a Review. Ca. Ed.: You can significantly reduce frequency table of symbols, if we consider that symbols with numbers 32-255 are encoded the same number of bits (ten bits), and therefore, the order they appear in Table does not affect the size of the compressed file. Thus, the table is sufficient to store only the 32 most frequently used symbol. And yet. Try to implement an improved algorithm for Huffman, which are encoded as individual characters, and most frequently occurring pairs of characters. For example, in the English language often found a pair of characters "th". Obviously, the length of the code corresponding to this pair characters will be less than the sum of the lengths of character codes "t" and "h". Due to this superiority is achieved in the compression. * (C) Ivan Roshchin, Moscow, 1997 Features assembler ZX ASM 3.0 When the command "Load sts "(in the menu Setup) in the case if a file with the specified name is found, an error message is issued. If you select Edit mode = Text, you lose access to all points Menu Run, except Inspect, which is not always convenient. Not a message about error when assembling teams to load single-byte register double-byte numbers. For example, the command LD B, # 1234 will be assemble as LD B, # 34. ZX ASM somehow believes that registers, HL, IX, IY equal, ie in any team that uses the HL, can be used and registers IX, IY. If the teams type LD A, (HL) is really so, to commands such as ADD HL, HL, and some others do not applies. Let us illustrate this by Example: Compiles source code ADD HL, IX ADD HL, HL SBC HL, IY SBC HL, HL EX DE, IX EX DE, HL This error may lead to a very unpleasant situation. For example, you wrote in your program ADD HL, IX, forgetting that this team does not. Assembler also "saw nothing", and then you wonder why your program does not work. In teams, working with the halves of the index registers, have used the notation XL, XH, YL, YH. Meanwhile, STS in the same team uses notation LX, HX, LY, HY. Ill-treated situation when you define variable with a DS, and the number of bytes allocated for it zero. Such a case can arise, for example, if you first defined a variable with by DB, and then decided to replace the DB on the DS, while forgetting replace the 0 to 1: NAME DB 0 -> NAME DS 0 If the text of the program changed since the last recording CD, ZX ASM before performing "Dangerous" operations prompted to save it. But what about ZX ASM determines changes the text or No? Obviously, simply by comparing a checksum of the text in-memory, and text the last time recorded on disc. But this method in some cases, gives an incorrect result, and ZX ASM believes that the text has not changed: a) LD HL, # 1234 LD HL, # 3214 DB% DB% 01100110 11001100 (Because changing the order of characters does not affect the checksum text) b) LD A, 5 LD A, 4 LD B, 6 LD B, 7 (As ASC-character codes "5" and "6" in the amount equal to # 35 + # 36 = # 6B, and the same value equal to the sum character code "4" and "7"). *
Other articles:
Similar articles:
В этот день... 21 November