Inferno #07
31 мая 2005

For Coderz - Fundamentals of optimization for the processor Z80.

<b>For Coderz</b> - Fundamentals of optimization for the processor Z80.
   Fundamentals of optimization Z80 There are 4 directions of 
optimization: on size, speed, on the compressed size and

volume source. At the last ostanav
Lebanese are not going.


          Optimizing the size of


                    1.


   If the program structure is determined,
should count the number of calls to
each procedure (it's very easy - change
thread mark of its name and try ottra
nslirovat), and on this basis, the number and
her procedure, the size can resolve the issue of
depriving her of the status of routines, and making
to the calling segment. Sometimes the routine
this did not win. For example, in the case
set of conditional returns (they are shorter
 conditionals) in it.

   CALL addr: RET is replaced by JP addr
or, if possible, JR addr. Even better ne
renesti fragment with JP to a specified therein
 address.

   CALL addr: JP addr2 simplified analogous
but: the transfer of close to that in the CALL
address and the replacement of the said construction on the
LD BC, addr2: PUSH BC.

   Big cycle like a survey of keys -
with a large number of transitions to a single
the same point - it is logical to cut the same way
holding up the address points to the stack and using the 
condition conditionally returns.



   Complex system of conditions can often be
lead to any form containing
table may be processed command
CPIR.


   The end of the cycles of the form JR NZ, exit: CALL
addr: JR Z, loop: exit to the form CALL
Z, addr: JR Z, loop. Sometimes, we arrive at
converges to change the signs of completion in the subspace
gram addr, and sometimes to create this by
program if its not:)

   Reasonable distribution agreements
flags on the output from different procedures (not
ill use the facts that n / n such a
always sets the CY, and other semiconductors limit
dpochitaet fold Z). can be achieved
more frequent use of conditional vyzo
Islands and thus save a lot of bytes and
lines of code.


   It is useful from time to time walk through
sources and write the most popular Fra
gmenty call certain procedures. Often
CALL stand in front of the same type team. This
lets make them the top procedure for
lzuyas that, in assembly language, unlike
HLL, perhaps as many entry points into the
procedure! In the end gluing procedure
vayutsya a tree like this:
 OUT0 LD A, 16

        JR $ +4
OUT7 LD A, 23
OUTME LD (pg), A
 OUTNO PUSH BC

         LD BC, 32765

         OUT (C), A

         POP BC

        RET

   And that's not the biggest instance:)

   If, for example, after a CALL is often worth
any operation, which in other cases
both cases is harmless (something like OR A or LD
HL, (addr)), we can substitute this operation
at the end of the procedure itself. There are cases to
Then nothing can substitute, but has
Xia advantageous to make an intermediate called
Since the procedure - with the part of the team
associated with this challenge.


   The above recommendations relate to
program organization and almost no concern
its internal structure. On the next
page and we'll talk about the device. By
worthwhile note here that there is a way
programming without using commands
CALL, as well as a means of saving memory ny
interpreted to create or complex
Fix the system commands (script).


                    2.


   There are a number of designs, with suboptimal
In terms of size / speed / registration
ditch almost vsegda.Vot several such
 "Forbidden constructions":

    SLA A (replaced by ADD A, A);

    SLA L: RL H (replaced by ADD HL, HL);

   RL L: RL H (replaced by the ADC HL, HL);

   Two recent council can apply to
other similar cases, if the redistribution
pour registers;

    Complete an alternative form

         JR C, cy

         LD reg, the number of

        JR ok
cy LD reg, number2
 ok

   (Replaced by LD reg, number2: JR C, ok:
LD reg, number: ok or if reg - accumulators
 torus: SBC A, A: AND number! number2: XOR number);

   PUSH HL: SBC HL, DE: POP HL (replaced by
SBC HL, DE: ADD HL, DE - check: flags
 the same state!)

   DEC B: JR NZ, loop (replaced by DJNZ,
if not important flag Z) - can be redistributed
 share registers for this optimization;

         LD L, (IX)

         INC HX

         LD H, (IX)

         DEC HX

        INC IX
(Replaced by LD L, (IX-128): INC IX: LD H,
 (IX +127) with a different beginning. value IX);

   LD A, reg: OR A (except the index and replace
is on the INC reg: DEC reg);

   In the future you will find others.


   Caught escaping counters
ka cycle, if the output overflow
low byte of the address: INC L: JR NZ, loop.


   Many conventional calculations associated with
growth rate (and more), it is more convenient organization
call of arithmetically.


   The most effective way to deploy
program variables will almost always
embedded in the text of the variables of the form:
 var1 = $ +1

        LD DE, 0

   Of all the places where the variable refers
camping, for this should be to choose the place in which
torus gain maximum (for example, where
of the variable is subtracted).

   Another option is placement - in the block B
stemnyh variables BASIC (IY + disp) - you
fit with a large number of the same equivalence
zoticheskih action variable, as ordered
myanutoe vychitanie.Tem more, IY often about
melts. It is clear that the option of IY is
shorter than the LD HL, var1: SUB (HL), but the lengths
it LD HL, var1: LD A, (HL): XOR 1: LD (HL), A.


   Subroutine with the possibility of plugging
login realized exposing or removing the
Niemi RET command at the beginning, without any
Anyway variables.


   You can often find a solid piece of
The program consisted of several sequences
successive actions. Usually unconsciously
we write these sections are independent of Achiev
ltatov each other. It is important for "fast
go "(or" extreme ") programming
of (the term for programming without optimization
minimization and finding errors - when it is not
time must be guaranteed their original
absence). But is contraindicated for optimal
minimization. Typically, after a chi
cluster contains a number of registers nul.Mezhdu those
zeros may need to lower (perhaps not
in other registers and not in neighboring this
groin). Permutation of the registers and the stages can
zhno win significant amounts.

   Due to rearrangement of registers can be
move most of the 16-bit Opera
tions in the register pair HL, to release John
deksnye register, obtain additional
operand stack after EX (SP), HL, reduced
memory-memory operations to the team LDI (LDD)
etc.


   Pay attention to command initialize
tion registers and flagov.Na just in case
inscribing them is always highlighted (eg,
shift lines of text to the left). This will be
remembrance of that successful coincidence cir
stances commands you can comment.
The same initialization can also be about
Led by PUSH ... POP.


   When optimizing the use and properties
cycles: they are multi-directional, are
with the entrance in the middle, with the output from the mid-
or from standard input-output; pa
tric and condition; with preincrement
and post-increment (you can transfer part of
operations from the end of the cycle to the top and vice versa
mouth).

   Permissible to use fragments ROM
that you like. At 48k ROMs realized
Vanir many useful actions, not the last
them of which are LDDR: RET (e
5729 = # 1661) and LDIR: RET (e = 13251
= # 33C3), ideal for conventional calls and
conditional jumps. But the incompatibilities, to
Thoraya can happen if you are programming
induces on YOUR non-standard ROM will be integer
face it on your conscience:). Base
option ROM - 1982, there were more
supplements (should, however, to distinguish 1982
from SOS 128k, also marked by this date).

   In terms of memory, often you
Dry store packed separately rarely
used part of the program - though not
code, description, or part of the fonts, maps, and
t.p.Pri subsequent "bonding" program
there is a reason not to pack already compressed so
parts are.


      Optimization of compressed size


   In many cases, important, not the amount of memory
Tee occupied by the program, and its size by
disk or in memory (be it firmware). Gra
bunt allows the use of packers
address many aspects of this problem.

   Most used in the compression code
Packaging is based on methods LZ (A.
Lempel, J. Ziv) - duplicate copy
sections of already unpacked half
the file had not yet unpacked. Desirable
know and the exact format of the compressed data,
but the basic ideas is enough.

   Important role in determining the physical
size of the program will play a loader and
details, he related: extractor, power
SETUP (setup program), the procedure ini
tsializatsii.Initsializatsiya also included in this
list, because keeping it in memory of the PIC
LE run the program often is not justified -
on a par with other parts of zagruzchika.No not
always: extractor can be used very
program to extract the necessary data it
from the files and loader (if it turbiro
vanny) is not a sin to use CHO
Islands, without repeating it in the working part of
program. There are, of course, instances where
program should be able to retain
nyat herself, and she has to keep in
memory of all its parts.

   To achieve the greatest compression and
optimal use of tailings sector
ditch is best to have only packed
vanny block (if not a packed, then
At least the download - one), with data
GOVERNMENTAL, throw later in time
nym parts of memory. The number of rendition is better
think about and cut at the stage of developed
processing program. However, faced with
This is easy once again to fix prog
Rummy in terms of memory allocation,
changing a few constants.

   When the compression of music and graphical
such data need to choose the most suitable
dyaschie kompressory.Odnako formats and at low
Scrap quantity of such data overall compressor
sor / decompressor profitable.

   Code optimization for the compression is to
combat a particular problem. It only helps
trial and error method. But, in particular, obviously
 shows that:

  - Similar size blocks of the program should be
 schat next;

  - Some cycles beneficial to disclose to the
 design DUP ... EDUP;

  - A series of JP to a single address are packed better
 than a series of the same JR;

  - All areas with optionally contains
mym (variables) is better filled with zeros
 or codes of neighboring teams;

  - Field buffer is better to exclude
dovogo block.

   For experiments with packing easier
all otassemblirovat and unload Nesco
lko versions of the program, after which pooche
anterior compress compressor (s).


         Optimizing for speed


   Prior to the optimization of soon
STI does not interfere to optimize and
size - in order to remove stray kus
Cove code, but avoid adding superfluous embedding
strength CALL ... RET.

   Further optimized for speed
program can either decrease the length,
and grow.

   Originally optimization are al
Algorithms. Many things can make you
syachey methods: sorting, multiplication of
twist the screen, the spectral transformation
Huffman decoding, geometrically
structure, reading bytes from the disk, etc. That
way, which may be less justified
in terms of size and ease of realization
tion, sometimes turns out to be more rapid.

   For example, multiplication by a "square floor
minus the sum of half-square "requires
pre-generate a table square
Comrade and, accordingly, the selection under her na
cards, an quicksort can be generally
not intuitive, and fast decoding Hough
fmana - not only visual but also
trudnoproveryaemym. Nevertheless, they are you
are fit for speed.


   Of course, all accelerating not have, the more so
most barbaric methods tabular form,
fill all memory. Need to choose
precisely those parts of the program that affect
at the time of its execution is particularly strong. In
simple cases that portion alone: ​​she
called "inner loop" (inner loop)
and is easy - in place of the deepest
embedded multi-tiered government tsikla.Kak
Lo, cycles, number of passes that the calculus
is in the thousands, even millions, reduce to
table operations, and disclose in the first
place. (Disclosure cycle is over
mene it repeats DUP ... EDUP.) In
addition, think all sorts of options for
pisi with different registers and in each case
consider the case of bars.

   May, however, find that the second cycle
nesting (if you count the inside) is contained
contains so many teams that slows the progressive
Rummy is stronger than the inner one. Meanwhile, about him
often overlooked and therefore do not account for.

   A more sophisticated version - a program with
set of alternate operations.
If operations are enclosed in a series of cycles,
difficult even to find out how long
in the amount of runs, each of them. Compute
assume the number of passes can be put in any
 location:

         PUSH AF

         LD A, 0

         ADD A, 1

         LD ($ -3), A

         LD A, 0

         ADC A, 0

         LD ($ -3), A

         LD A, 0

         ADC A, 0

         LD ($ -3), A

         (Etc.)

        POP AF

   If there is a way to estimate the average
number of cycles in each operation, then the bonds
Nav number of passes, we get and the time
me. You can try counting and interrupt
vaniya.Mozhno exclude procedures for one and
compare the total time.


   Depending on the contribution that each
Doi procedure, you can choose two or three or ten
current is the most relaxing and start acceleration
Niemi ih.Uskorenie it is to you
choice of different implementations of the simplest operations
and convenient for them to separate registers
processor. It is desirable to include all the formation
data were processed as possible, as a result
Giustra - with minimal use of PA
myati.Ispolzuyte alternative sets. IX
and IY is better to use as a sum
Mataram or counters the cycle because of
increment through them to the memory longer than usual
st. (Again, there are also the opposite case.)
Be as careful with IY - under the regime of
Interrupt an IM or a call to the modified RST 1956
to differentiate it is dangerous.

   These conveniently placed on the round-hell
resam that will simplify the calculation of these addresses, and
at the same time be able to afford to replace anything or
be like INC HL on, say, INC L.

   After the introduction of a family of tables can be
proceed to consider the possibilities OBHO
CWD procedures in any case, with the corresponding
corresponding substitution of the result. At the end of
disable all interrupts and enable
stack - either as a repository for data stream
input or output, or as an additional 16razryadnoe term 
calculations. 

   If it seemed that close to a dead end, look for
different algorithm. As a rule, it is.
More pre-counted data using
The use features of the stack (LD SP, HL, AB
toinkrement, tables, procedures, chain vyzo
Islands in the stack); other table formats, a decrease
deprives the amount referred to constants, for
Visual Effects - coarsening computation
tions, the output of blocks ... and generally any wild
ideas, which only come into your head.
Not all options are used, and large
chance to find something new, with no one owl
tuyas!


              Procedures ROM

      for use in programs


   Add some interesting procedures
ROM to the list of known 8880,8020,3584
etc.

   Some of these procedures is very short
Kie, and one may wonder: why should they be used
I use it? However, even these are very useful
to leave as the address on the stack
- A sort of fortress. List is published for the first time.

# E88:
 ; HL = address of the display with the addition of # 800

         LD A, H

         RRCA * 3

         DEC A

         OR # 50

         LD H, A

         EX DE, HL; DE = address of an attribute

        LD H, C, L, B; B = for example, the number of

                   ; Attribute strings

                    ; C = eg 0

         ADD HL, HL * 5

         LD B, H, C, L; BC = HL = CB * 32

        RET

 # E9b:

         LD A, 24

        SUB B
 # E9e:

         LD D, A

         RRCA * 3

         AND # E0

         LD L, A

         LD A, D

         AND # 18

         OR # 40

         LD H, A; HL = address of A-th znakomestnoy

                                    ; Line

        RET

 # 1117:

         INC HL

         LD (HL), E

         INC HL

         LD (HL), D

         SCF

        RET

 # 1660:

         EX DE, HL

         LDDR

        RET

 # 169a:

         LD D, (HL)

         INC HL

         LD E, (HL)

        RET

 # 16fd:

         LD (HL), C

         INC HL

         LD (HL), B

        RET

 # 172d:

         LD C, A

         LD B, 0

         ADD HL, BC

         LD C, (HL)

         INC HL

         LD B, (HL)

         DEC HL

        RET

 # 18b9:

         INC HL * 6

         LD A, (HL)

        RET

 # 19f5:

         ADD HL, DE

         PUSH DE

         LDIR

         POP HL

        RET

 # 1bb0:

         RST 8

        DB # FF; "0 OK ..."

 # 1c19:

         LD C, (HL)

         INC HL

         LD B, (HL)

         EX DE, HL

         PUSH BC

        RET

 # 1e7d:

         OUT (C), A

        RET

# 229b:
sets the color of border and 23,624 in A

 # 231a:

         LD C, 1

         RET Z

         LD C, -1

        RET

 # 2aee:

         EX DE, HL

         INC HL

         LD E, (HL)

         INC HL

         LD D, (HL)

        RET

 # 2ba6:

         EX DE, HL

         LD A, B

         OR C

         RET Z

         PUSH DE

         LDIR

         POP HL

        RET

 # 2f8b:

         PUSH DE

         LD L, A, H, 0

         LD E, L, D, H

         ADD HL, HL

         ADD HL, HL

         ADD HL, DE

         ADD HL, HL; HL = A * 10

         LD E, C

         ADD HL, DE; HL = A * 10 + C

         LD C, H, A, L

         POP DE

        RET

 # 3406:

         A = BC = A * 5

         ADD HL, BC

        RET

 # 350b:

         PUSH HL

         LD A, 0

         LD (HL), A

         INC HL

         LD (HL), A

         INC HL

         RLA

         LD (HL), A

         RRA

         INC HL

         LD (HL), A

         INC HL

         LD (HL), A

         POP HL

        RET

A. Coder




Other articles:

Classics - Almanashnik. Alexander Pushkin.

For Coderz - Recognition and computation of arithmetic expressions on their character record.

Inferno - The authors of the magazine.

For Coderz - the discipline to create large projects.

Interview - Questions Konstantin Sviridov (Conan) on the site zxnext.narod.ru.

Likbez - The principles of converting graphics PC-ZX.

For Coderz - Programming disc changer / drive in Scorpio.

Softinka - DNA_OS v0.431 - package of utilities for working with hard drives, RAM-drives and floppy disks.

For Coderz - Programming under DNA_OS ZET-9, a package of tools to work with storage devices.

Softinka - The problems and shortcomings package of tools to work with storage devices DNA_OS.

Likbez - details about disk formats that are FAT.

Inferno - Entered from the editor.

Inferno - Errors in the previous numbers.

For Coderz - Small programmers' tricks.

Gameland - On the new games: Oneyroid, Dizzy forever, Dridlock.

For Coderz - Writing archive. Practical principles LZ packaging.

Gameland - Passage of new shipments for the game "Black Crow".

For Coderz - Programming for the video mode 384x304.

Inferno - Letters to the Editor.

Sound - Eden Megus'a about the tracker for the AY / YM.

Inferno - On the shell.

For Coderz - Fundamentals of optimization for the processor Z80.

Likbez - The location of partitions on your hard drive.

Gamedev - 3D projection of the floor / road in the games.

Sound - Wild ideas for AY trackers.

Advertising - Ads by Roman Chuunin.

Advertising - Ads by V. Bogdanovich

For Coderz - How a large Flexible Program.

Repair - Faults Pentagon 128 + and their repair.

Inferno - Content.

Miscellaneous - Thoughts on the contest for the best software.

Others - Transfer software on ZX Spectrum with a PC.

Video - On packaging for a video ZX Spectrum.


Темы: Игры, Программное обеспечение, Пресса, Аппаратное обеспечение, Сеть, Демосцена, Люди, Программирование

Similar articles:
Tutorials - New trend in demomaking: Chunks 2x2.
AD & D - a description of the system of AD & D (Part 3).
Retro - 40 best procedures: Copying data in memory.
List BBS - The list of stations BBS.

В этот день...   21 November