Inferno #07
31 мая 2005 |
|
For Coderz - Fundamentals of optimization for the processor Z80.
Fundamentals of optimization Z80 There are 4 directions of optimization: on size, speed, on the compressed size and volume source. At the last ostanav Lebanese are not going. Optimizing the size of 1. If the program structure is determined, should count the number of calls to each procedure (it's very easy - change thread mark of its name and try ottra nslirovat), and on this basis, the number and her procedure, the size can resolve the issue of depriving her of the status of routines, and making to the calling segment. Sometimes the routine this did not win. For example, in the case set of conditional returns (they are shorter conditionals) in it. CALL addr: RET is replaced by JP addr or, if possible, JR addr. Even better ne renesti fragment with JP to a specified therein address. CALL addr: JP addr2 simplified analogous but: the transfer of close to that in the CALL address and the replacement of the said construction on the LD BC, addr2: PUSH BC. Big cycle like a survey of keys - with a large number of transitions to a single the same point - it is logical to cut the same way holding up the address points to the stack and using the condition conditionally returns. Complex system of conditions can often be lead to any form containing table may be processed command CPIR. The end of the cycles of the form JR NZ, exit: CALL addr: JR Z, loop: exit to the form CALL Z, addr: JR Z, loop. Sometimes, we arrive at converges to change the signs of completion in the subspace gram addr, and sometimes to create this by program if its not:) Reasonable distribution agreements flags on the output from different procedures (not ill use the facts that n / n such a always sets the CY, and other semiconductors limit dpochitaet fold Z). can be achieved more frequent use of conditional vyzo Islands and thus save a lot of bytes and lines of code. It is useful from time to time walk through sources and write the most popular Fra gmenty call certain procedures. Often CALL stand in front of the same type team. This lets make them the top procedure for lzuyas that, in assembly language, unlike HLL, perhaps as many entry points into the procedure! In the end gluing procedure vayutsya a tree like this: OUT0 LD A, 16 JR $ +4 OUT7 LD A, 23 OUTME LD (pg), A OUTNO PUSH BC LD BC, 32765 OUT (C), A POP BC RET And that's not the biggest instance:) If, for example, after a CALL is often worth any operation, which in other cases both cases is harmless (something like OR A or LD HL, (addr)), we can substitute this operation at the end of the procedure itself. There are cases to Then nothing can substitute, but has Xia advantageous to make an intermediate called Since the procedure - with the part of the team associated with this challenge. The above recommendations relate to program organization and almost no concern its internal structure. On the next page and we'll talk about the device. By worthwhile note here that there is a way programming without using commands CALL, as well as a means of saving memory ny interpreted to create or complex Fix the system commands (script). 2. There are a number of designs, with suboptimal In terms of size / speed / registration ditch almost vsegda.Vot several such "Forbidden constructions": SLA A (replaced by ADD A, A); SLA L: RL H (replaced by ADD HL, HL); RL L: RL H (replaced by the ADC HL, HL); Two recent council can apply to other similar cases, if the redistribution pour registers; Complete an alternative form JR C, cy LD reg, the number of JR ok cy LD reg, number2 ok (Replaced by LD reg, number2: JR C, ok: LD reg, number: ok or if reg - accumulators torus: SBC A, A: AND number! number2: XOR number); PUSH HL: SBC HL, DE: POP HL (replaced by SBC HL, DE: ADD HL, DE - check: flags the same state!) DEC B: JR NZ, loop (replaced by DJNZ, if not important flag Z) - can be redistributed share registers for this optimization; LD L, (IX) INC HX LD H, (IX) DEC HX INC IX (Replaced by LD L, (IX-128): INC IX: LD H, (IX +127) with a different beginning. value IX); LD A, reg: OR A (except the index and replace is on the INC reg: DEC reg); In the future you will find others. Caught escaping counters ka cycle, if the output overflow low byte of the address: INC L: JR NZ, loop. Many conventional calculations associated with growth rate (and more), it is more convenient organization call of arithmetically. The most effective way to deploy program variables will almost always embedded in the text of the variables of the form: var1 = $ +1 LD DE, 0 Of all the places where the variable refers camping, for this should be to choose the place in which torus gain maximum (for example, where of the variable is subtracted). Another option is placement - in the block B stemnyh variables BASIC (IY + disp) - you fit with a large number of the same equivalence zoticheskih action variable, as ordered myanutoe vychitanie.Tem more, IY often about melts. It is clear that the option of IY is shorter than the LD HL, var1: SUB (HL), but the lengths it LD HL, var1: LD A, (HL): XOR 1: LD (HL), A. Subroutine with the possibility of plugging login realized exposing or removing the Niemi RET command at the beginning, without any Anyway variables. You can often find a solid piece of The program consisted of several sequences successive actions. Usually unconsciously we write these sections are independent of Achiev ltatov each other. It is important for "fast go "(or" extreme ") programming of (the term for programming without optimization minimization and finding errors - when it is not time must be guaranteed their original absence). But is contraindicated for optimal minimization. Typically, after a chi cluster contains a number of registers nul.Mezhdu those zeros may need to lower (perhaps not in other registers and not in neighboring this groin). Permutation of the registers and the stages can zhno win significant amounts. Due to rearrangement of registers can be move most of the 16-bit Opera tions in the register pair HL, to release John deksnye register, obtain additional operand stack after EX (SP), HL, reduced memory-memory operations to the team LDI (LDD) etc. Pay attention to command initialize tion registers and flagov.Na just in case inscribing them is always highlighted (eg, shift lines of text to the left). This will be remembrance of that successful coincidence cir stances commands you can comment. The same initialization can also be about Led by PUSH ... POP. When optimizing the use and properties cycles: they are multi-directional, are with the entrance in the middle, with the output from the mid- or from standard input-output; pa tric and condition; with preincrement and post-increment (you can transfer part of operations from the end of the cycle to the top and vice versa mouth). Permissible to use fragments ROM that you like. At 48k ROMs realized Vanir many useful actions, not the last them of which are LDDR: RET (e 5729 = # 1661) and LDIR: RET (e = 13251 = # 33C3), ideal for conventional calls and conditional jumps. But the incompatibilities, to Thoraya can happen if you are programming induces on YOUR non-standard ROM will be integer face it on your conscience:). Base option ROM - 1982, there were more supplements (should, however, to distinguish 1982 from SOS 128k, also marked by this date). In terms of memory, often you Dry store packed separately rarely used part of the program - though not code, description, or part of the fonts, maps, and t.p.Pri subsequent "bonding" program there is a reason not to pack already compressed so parts are. Optimization of compressed size In many cases, important, not the amount of memory Tee occupied by the program, and its size by disk or in memory (be it firmware). Gra bunt allows the use of packers address many aspects of this problem. Most used in the compression code Packaging is based on methods LZ (A. Lempel, J. Ziv) - duplicate copy sections of already unpacked half the file had not yet unpacked. Desirable know and the exact format of the compressed data, but the basic ideas is enough. Important role in determining the physical size of the program will play a loader and details, he related: extractor, power SETUP (setup program), the procedure ini tsializatsii.Initsializatsiya also included in this list, because keeping it in memory of the PIC LE run the program often is not justified - on a par with other parts of zagruzchika.No not always: extractor can be used very program to extract the necessary data it from the files and loader (if it turbiro vanny) is not a sin to use CHO Islands, without repeating it in the working part of program. There are, of course, instances where program should be able to retain nyat herself, and she has to keep in memory of all its parts. To achieve the greatest compression and optimal use of tailings sector ditch is best to have only packed vanny block (if not a packed, then At least the download - one), with data GOVERNMENTAL, throw later in time nym parts of memory. The number of rendition is better think about and cut at the stage of developed processing program. However, faced with This is easy once again to fix prog Rummy in terms of memory allocation, changing a few constants. When the compression of music and graphical such data need to choose the most suitable dyaschie kompressory.Odnako formats and at low Scrap quantity of such data overall compressor sor / decompressor profitable. Code optimization for the compression is to combat a particular problem. It only helps trial and error method. But, in particular, obviously shows that: - Similar size blocks of the program should be schat next; - Some cycles beneficial to disclose to the design DUP ... EDUP; - A series of JP to a single address are packed better than a series of the same JR; - All areas with optionally contains mym (variables) is better filled with zeros or codes of neighboring teams; - Field buffer is better to exclude dovogo block. For experiments with packing easier all otassemblirovat and unload Nesco lko versions of the program, after which pooche anterior compress compressor (s). Optimizing for speed Prior to the optimization of soon STI does not interfere to optimize and size - in order to remove stray kus Cove code, but avoid adding superfluous embedding strength CALL ... RET. Further optimized for speed program can either decrease the length, and grow. Originally optimization are al Algorithms. Many things can make you syachey methods: sorting, multiplication of twist the screen, the spectral transformation Huffman decoding, geometrically structure, reading bytes from the disk, etc. That way, which may be less justified in terms of size and ease of realization tion, sometimes turns out to be more rapid. For example, multiplication by a "square floor minus the sum of half-square "requires pre-generate a table square Comrade and, accordingly, the selection under her na cards, an quicksort can be generally not intuitive, and fast decoding Hough fmana - not only visual but also trudnoproveryaemym. Nevertheless, they are you are fit for speed. Of course, all accelerating not have, the more so most barbaric methods tabular form, fill all memory. Need to choose precisely those parts of the program that affect at the time of its execution is particularly strong. In simple cases that portion alone: she called "inner loop" (inner loop) and is easy - in place of the deepest embedded multi-tiered government tsikla.Kak Lo, cycles, number of passes that the calculus is in the thousands, even millions, reduce to table operations, and disclose in the first place. (Disclosure cycle is over mene it repeats DUP ... EDUP.) In addition, think all sorts of options for pisi with different registers and in each case consider the case of bars. May, however, find that the second cycle nesting (if you count the inside) is contained contains so many teams that slows the progressive Rummy is stronger than the inner one. Meanwhile, about him often overlooked and therefore do not account for. A more sophisticated version - a program with set of alternate operations. If operations are enclosed in a series of cycles, difficult even to find out how long in the amount of runs, each of them. Compute assume the number of passes can be put in any location: PUSH AF LD A, 0 ADD A, 1 LD ($ -3), A LD A, 0 ADC A, 0 LD ($ -3), A LD A, 0 ADC A, 0 LD ($ -3), A (Etc.) POP AF If there is a way to estimate the average number of cycles in each operation, then the bonds Nav number of passes, we get and the time me. You can try counting and interrupt vaniya.Mozhno exclude procedures for one and compare the total time. Depending on the contribution that each Doi procedure, you can choose two or three or ten current is the most relaxing and start acceleration Niemi ih.Uskorenie it is to you choice of different implementations of the simplest operations and convenient for them to separate registers processor. It is desirable to include all the formation data were processed as possible, as a result Giustra - with minimal use of PA myati.Ispolzuyte alternative sets. IX and IY is better to use as a sum Mataram or counters the cycle because of increment through them to the memory longer than usual st. (Again, there are also the opposite case.) Be as careful with IY - under the regime of Interrupt an IM or a call to the modified RST 1956 to differentiate it is dangerous. These conveniently placed on the round-hell resam that will simplify the calculation of these addresses, and at the same time be able to afford to replace anything or be like INC HL on, say, INC L. After the introduction of a family of tables can be proceed to consider the possibilities OBHO CWD procedures in any case, with the corresponding corresponding substitution of the result. At the end of disable all interrupts and enable stack - either as a repository for data stream input or output, or as an additional 16razryadnoe term calculations. If it seemed that close to a dead end, look for different algorithm. As a rule, it is. More pre-counted data using The use features of the stack (LD SP, HL, AB toinkrement, tables, procedures, chain vyzo Islands in the stack); other table formats, a decrease deprives the amount referred to constants, for Visual Effects - coarsening computation tions, the output of blocks ... and generally any wild ideas, which only come into your head. Not all options are used, and large chance to find something new, with no one owl tuyas! Procedures ROM for use in programs Add some interesting procedures ROM to the list of known 8880,8020,3584 etc. Some of these procedures is very short Kie, and one may wonder: why should they be used I use it? However, even these are very useful to leave as the address on the stack - A sort of fortress. List is published for the first time. # E88: ; HL = address of the display with the addition of # 800 LD A, H RRCA * 3 DEC A OR # 50 LD H, A EX DE, HL; DE = address of an attribute LD H, C, L, B; B = for example, the number of ; Attribute strings ; C = eg 0 ADD HL, HL * 5 LD B, H, C, L; BC = HL = CB * 32 RET # E9b: LD A, 24 SUB B # E9e: LD D, A RRCA * 3 AND # E0 LD L, A LD A, D AND # 18 OR # 40 LD H, A; HL = address of A-th znakomestnoy ; Line RET # 1117: INC HL LD (HL), E INC HL LD (HL), D SCF RET # 1660: EX DE, HL LDDR RET # 169a: LD D, (HL) INC HL LD E, (HL) RET # 16fd: LD (HL), C INC HL LD (HL), B RET # 172d: LD C, A LD B, 0 ADD HL, BC LD C, (HL) INC HL LD B, (HL) DEC HL RET # 18b9: INC HL * 6 LD A, (HL) RET # 19f5: ADD HL, DE PUSH DE LDIR POP HL RET # 1bb0: RST 8 DB # FF; "0 OK ..." # 1c19: LD C, (HL) INC HL LD B, (HL) EX DE, HL PUSH BC RET # 1e7d: OUT (C), A RET # 229b: sets the color of border and 23,624 in A # 231a: LD C, 1 RET Z LD C, -1 RET # 2aee: EX DE, HL INC HL LD E, (HL) INC HL LD D, (HL) RET # 2ba6: EX DE, HL LD A, B OR C RET Z PUSH DE LDIR POP HL RET # 2f8b: PUSH DE LD L, A, H, 0 LD E, L, D, H ADD HL, HL ADD HL, HL ADD HL, DE ADD HL, HL; HL = A * 10 LD E, C ADD HL, DE; HL = A * 10 + C LD C, H, A, L POP DE RET # 3406: A = BC = A * 5 ADD HL, BC RET # 350b: PUSH HL LD A, 0 LD (HL), A INC HL LD (HL), A INC HL RLA LD (HL), A RRA INC HL LD (HL), A INC HL LD (HL), A POP HL RET A. Coder
Other articles:
Similar articles:
В этот день... 21 November