Miracle #02
10 августа 1997

Tip GURU - Code optimization for execution time.

<b>Tip GURU</b> - Code optimization for execution time.
Now we invite you to read very
interesting article, write to all known PSV from DREAM TEAM 
GROUP. Perhaps This article does not solve your problem one

in terms of coding, as well as be able to improve
your professional skills. It is worth noting that very soon be 
published - SECOND BREATH - a season hit demomeykerstva 1997. 
This will be the latest creation DREAM TEAM by ZX-SPECTRUM'e. 
--------------------------------------- 

     OPTIMIZATION PROGRAM FOR TIME

               PERFORMANCE


                 PROLOGUE



    At the current stage of development of science and
Technology can not imagine any
one area of ​​their application, which gradually penetrated to 
a microprocessor technique. And gone are the days when

the result of simple calculations had to wait for tedious 
dolgo.Sovremennye problems call for solutions not just fast

and issued in "real time", then
is the rate-making should be higher
speed production tasks in a continuous mode
continuous time protsessov.V our doors
already knocking on the device, which recognize and
generating human conversation and
moving images of real objects

... Unfortunately the giant processing speed available so far 
only su per-computer and do not show today and tomorrow. Yet 
available the majority of microprocessor technology

operates at speeds of about several million simple operations 
per second (for clarity, I will say that acceptable to the eye 
parameter screen this computer is not up to speed

computing: he was just keeping up with
allotted to one frame time, just show up on the screen of 
predetermined images). It is not surprising tends refractive 
index of the programs to improve opportunities for due to their 
literate writing. And sometimes obtained remarkable results: 
the speed of the program increases are sometimes a few times 
just for through some minor changes, and only

5 - 10% can expedite any "raw"
program (of course we are talking about programs in assembler, 
as a first step acceleration of the program - write it on the 
low-level language). 


   Fundamentally differ in the two approaches
Process optimization: a heuristic method is based on logical 
thinking and not requires specialized knowledge of any 
language, nor processor or hardware environment, but requires a 
deep knowledge of algorithms; hardware method of fitting a more 
accurate (Even its implementation could be allocated to

computer) and is based on evidence is very
in-depth knowledge of all the properties of the devices on 
which the program operates. 


   Program optimization is usually
in two stages: first - the initial drafting of the program with 
some of speed considerations, the second - a modification

debugged programs for improving the parameters.


    Here are some that I know of optimization techniques in 
software According to this classification. Immediately it 
should be noted that due to scant coverage of this issue in the 
literature, almost all have been developed by a small circle of 
elite programmers (Hacker) and me and my friends. As a result, 
the material has an ambiguous character. 



              CHAPTER ONE

  (On how to write fast programs

     without knowing the specific equipment)



   First of all, if your program works fast enough to
think about all the other ways of achieving
goal and, perhaps, there
priori faster algorithms. For example, conversion of the 
object's coordinates in three dimensions do not necessarily for 
the Euler equations, the same result can be achieved by several 
different methods and pick the best. Another example - 
Automatic strip conductors in the design of the PCB, you can do

a few notable ways. All
ways to explore and choose
Again, the fastest.


   If the algorithm has already been selected and implemented, 
then the easiest way to do program faster - to remove from it 
cycles by repeating the program of the loop body. When the loop 
is executed at least 100 times a loop body just insert the 
number of times times should run a cycle. If the cycle is 
longer than 100 times, then repeat it so much times as 
possible, and leave the outer loop, but running fewer raz.Takim 
by the gain in speed is obtained by cutting away from the 
program processes such as change of variable

cycle, its control and the transition operators
to the top of the cycle. Also exempted register occupied 
previously by a variable cycle. 


   In any program that goes beyond the transfer of data, there 
are algebraic operatsii.Nakoplenny programmers experience

allows us to derive some rules, guided by which these 
operations can be write so that they work faster.

For example, A = VxV will always work
faster than A = ^ 2 (unless, of course, these operations are 
written as a universal procedure, and not specially optimized 
for What will be discussed later). Also faster to run A = Bx (C 
+ D) compared with A + = VxS VxD, because even if the processor 
provides a team of multiplication it will always be slower than 
addition. We generalize this subject: here in what order should 
the speed of the algebraic operations (from fast to slow) 
addition, subtraction, multiplication, division, 
exponentiation, taking kornya.A if you torment the question 
"which is faster: two multiply or four of addition? ", then you

have to do a more detailed study of the equipment (see 
potaktovoe response teams). Next, consider a more advanced 
features, such as COS, SIN, TG, LOG, EXP. On this occasion it 
should be noted that in most cases the calculation of these 
functions, we can write integral if they scaled that will give 
an enormous gain in speed. Example: suppose that we need to 
calculate the expression A = SIN (C). In the calculation of 
sine, we calculate it is not fractional numbers, and integers 
(due to the fact that the result will not change from 0 to 1, 
and from 0 to assume 256), multiplying everything on

256. Next, multiplying the resulting
result should be divided by 256 (the
is simply discarded low byte).


   Another way to avoid the application of complicated 
calculation (and therefore slow) functions to calculate them in 
advance and set in an array. Such a path finds in recent years 
increasingly used as memory expansion is much easier and 
cheaper to improve performance. For example, instead of 
calculating the sine series expansion in Most programs can be 
universal pre-calculate its value to say

increments of one degree and within 4.5 decimal places that 
will take some 720 bytes of memory, but will speed up the 
program several times. 


   Another area of ​​application of the accelerating
efforts - programs that most
Part-time work does not require the speed, but spoil the rare 
moments of life When you want to quickly do something (to

this type are almost all online programs - at first they did not
are doing and wait for commands from slow to
Compared to them, man, and then, when
team filed, they are forced to wait for a man, until executed 
some complex action). In this case, helps such

medium-long program is free
cycles (assuming the keyboard can be polled no more than 100 
times per second, which will not take a one percent CPU speed), 
it tries to predict what follow-up and prepare everything in an 
instant process: find free space on the media, to calculate

required constant check errors and to
things like that. Even if cooking
will be in vain harm from them will not. But if you come in 
handy - required work is done much faster.



   The above-mentioned methods and complicate
program is extended, but there's a trick
allows to reduce not only the time used
complement procedure, but also reduce the required pamyat.Rech 
it comes to self-modifying programmah.Rassmotrim next Example: 
suppose a program needs to transfer some information from one 
carrier to another, thus adding to each byte constant whose 
value every time you start user defined

Telem. Naturally in this case to allocate to
constant of some memory location where
somewhere in unoccupied areas. Let her address will NNNN. It 
should be action will be written like this:



    to take the next byte

    MOVE register, (NNNN)

    ADD another byte register

    put the next byte


   In this case, firstly in the team MOVE featured some of no 
importance to the algorithm address, but the most

program in memory to take
place under the constant. Using the same methods
self-modification can be written as follows:


    to take the next byte

    MOVE register, constant

    ADD another byte register

    put the next byte


   In this case, one of the teams primary
cycle, running many thousands of times
(That is, the speed of which must be
fight), we get almost 2-fold
gain in execution time due to
removal of indirection to the constant
ones. But how? Because of the constant condition
not fixed but set by the user,
and our program will have to write one
specific number! But the fact that instead of one particular 
number will always be be exactly the constant that has entered

user, because, as is known, in memory of the program is a set 
of numbers (B) and our specific number will also hold a certain 
number of bytes in memory for quite a specific address, and

What this address is worse than that which we wish to allocate 
to storing the constants? Here and is obtained as the program 
faster, and head does not hurt when searching for free memory 
for the constant, and the program is shorter, and memory for 
the data must be less. Except constants in the program can also 
be changed the code of operations that much more difficult to

writing and requires a thorough knowledge of equipment.



              CHAPTER TWO
(On how to write for a specific hardware)



   To apply the methods of the type described below is 
necessary to fully study the specifics of all the hardware

which will operate the program, and
First and foremost, a processor and storage media. Moreover, 
the study should not limit logic devices, and

should include an examination of the temporal characteristics.


   Huge role in the performance of the program is the exchange 
rate with carriers information. In a large number of cases

program is running slow not because of
core processor, but because of slow exchange with the media. As 
a rule, even in electronic media (memory ICs)

speed is much lower than the processor, and is easy to see the 
lion proportion of teams just up and running with the memory

(Remember that even the team retrieved the processor from 
memory). In such media as magnetic and optical discs speed and 
did a funny. The moral: do not be lazy, and thoroughly study 
the characteristics of the carrier of your information, and You 
will not regret it. Usually immediately find ways to increase 
the rate of exchange: it turns out that some part of memory

faster than the other, it appears that the media can be cached 
(to provide rapid buffer memory), then suddenly find

that even addresses the work faster
than with odd and so on. In the presence of
differences in the characteristics necessary to
ibolee used data have a
the faster sections. For example, many computers, the memory 
used as video memory used to store non-graphical data, which 
gives an increase memory for the data. But having studied the 
performance of the memory usually turns out that this memory is 
much slower and it is desirable to store only infrequently used 
data. 


   The next step - studying potaktovogo
execution of commands appears protsessorom.Srazu numerical 
criterion performance: you write well - 1200 cycles, and to 
think and write simpler - 750 cycles. Also there is a pleasant 
way evaluate the performance of the program is not even

introducing it into the car and found it necessary to
to perform the number of cycles. For example, making the 
program that processes signals with frequencies up to 100 hertz 
income pointless to enter into a computer program

Therefore, signal processing, if one cycle
It will take on the calculation of more than 10000 cycles
(When clocked at 1 MHz). At this
stage is useful to create for himself a black
list and bring back the team for running an excessively large 
number cycles to always avoid their use.


   If your program works in any other intellectual partners
parameters (disk drives, graphics / arithmetic / music and 
other co-processor, or another computer on the network) then 
work with them, you can also speed up detail by studying their 
logical and temporal parameters. For example do not need to 
write vector drawing program with specified

coordinates, if it is able to quickly make himself the video 
processor (a "smart" video processors are able to draw not only 
with age torus, but the three-dimensional polygons on

defines a chart pattern, light and
so on). Another example: why the program when printing text on 
paper to align it along the edges where it can do printer 
itself is not wasting any time in this or computer memory. Or 
anything to control the checksum of the array at transmission 
of data, summing up a vast array of numbers, if at the hardware 
level controlled I / O port and sufficient

sufficiently control a bit in his
the flag state. These and similar errors
lead to an annoying pauses where they
can be eliminated with proper approach to
speed of the program in writing it.


    In addition to "Iron" nice to know and
features a human if va
step program provides its participation
For example, in recent programs with the creation of 
vysokorealnyh computed image tions used a strong simplification 
of the calculation representations scene when it moves fast, so

the human eye is still not time to consider detaley.Takzhe on 
similar properties are designed algorithms strong package of 
images and sound for by the disappearance of small and 
imperceptible de tackles.




     Continued in the next issue





Other articles:

From the authors - A new shell in the magazine.

Igrofiliya - Secrets and Passwords for games: Adams Famaly, Astro Marine Corp, Black Magic, Bubble Bobble, Chicago, Cybernoid Part 1 & 2, Double Xinox, Dynasty Wars, Fargo, Freestyle Bmx, Garfield, Grand Prix-2, Herbert'S Dummy Run, Impossamole, Mad Mix-2 Indiana Jones And Last Crusade, Jet Bike Simulator, Kraal, Kgb Superspy, Last Mission, Line Of Fire, Magic Stripes, Monty Python Flying Circuit, New Zeland Story, Predator-2 Power Piramids, Rau Recruits, Road Runner, Score 3020, Task Force, Ulises, War Machine, Wild West Seymour, Zona0.

Rumors and News - Who does what: Dream Team Group, OHG, Pungency, Velichutin Nicholas, Virtual vision group, Golden Disk Group, KDF Soft, X-Project, Mafia, Flash Inc.

Tip GURU - Code optimization for execution time.

Familiar faces - An Interview with DJ.IRONMAN 'om.

Videoglobus - "A man with a thousand persons." Filmography Jimmy Kerry.

Videoglobus - Overview of new products video rental store.

Videoglobus - an overview of recent receipts of electronic music.

Room with laughter - The triumph of progress.

Room with laughter - "No beer, life is beautiful" story Blade.

Review - Overview of new products ON: NMWC, Mortal Combat, Kombatris, Feodal Wars, Tetroid, Babbona, Santa Claus, The Fury, Star Control, Pinc Floyd, Test Ramdoctor V2, Total Commander V1.4.

PC World - On the game: KRUSH KILL'N'DESTROY; DOMINION, MYTH.

PC World - Podskzki to games: Heroes of Migic 2 & 3, Mission Force, Cyber Storm, Leisure Suit Larry 7, Death Rally, NBA Full Court Press, XS, Tomb Raider, Redneck, Rampage, NHL'1997.

Private Lessons - PRO TRACKER for beginners from Ironman'a.

Progress - History houmtaypinga.

Progress - musical vechernka in Grozny.

Library - The story "Halo," by Henry Kuttner.

you a gift - On application to the magazine: Empire Demo Version, Rom Track Copyer, Dynamite Dux, Treble Player, Snoopy and peanuts, INSANE, Vicomm Modem 2.


Темы: Игры, Программное обеспечение, Пресса, Аппаратное обеспечение, Сеть, Демосцена, Люди, Программирование

Similar articles:
Music - Let us return to the long-promised on "music for the specified samples.
ABOUT - On the BBS with the "anarchist" rules.
Guru - Life in ehah. Step 1: How to write an echo.

В этот день...   21 November