Miracle #02
10 августа 1997 |
|
Tip GURU - Code optimization for execution time.
Now we invite you to read very interesting article, write to all known PSV from DREAM TEAM GROUP. Perhaps This article does not solve your problem one in terms of coding, as well as be able to improve your professional skills. It is worth noting that very soon be published - SECOND BREATH - a season hit demomeykerstva 1997. This will be the latest creation DREAM TEAM by ZX-SPECTRUM'e. --------------------------------------- OPTIMIZATION PROGRAM FOR TIME PERFORMANCE PROLOGUE At the current stage of development of science and Technology can not imagine any one area of their application, which gradually penetrated to a microprocessor technique. And gone are the days when the result of simple calculations had to wait for tedious dolgo.Sovremennye problems call for solutions not just fast and issued in "real time", then is the rate-making should be higher speed production tasks in a continuous mode continuous time protsessov.V our doors already knocking on the device, which recognize and generating human conversation and moving images of real objects ... Unfortunately the giant processing speed available so far only su per-computer and do not show today and tomorrow. Yet available the majority of microprocessor technology operates at speeds of about several million simple operations per second (for clarity, I will say that acceptable to the eye parameter screen this computer is not up to speed computing: he was just keeping up with allotted to one frame time, just show up on the screen of predetermined images). It is not surprising tends refractive index of the programs to improve opportunities for due to their literate writing. And sometimes obtained remarkable results: the speed of the program increases are sometimes a few times just for through some minor changes, and only 5 - 10% can expedite any "raw" program (of course we are talking about programs in assembler, as a first step acceleration of the program - write it on the low-level language). Fundamentally differ in the two approaches Process optimization: a heuristic method is based on logical thinking and not requires specialized knowledge of any language, nor processor or hardware environment, but requires a deep knowledge of algorithms; hardware method of fitting a more accurate (Even its implementation could be allocated to computer) and is based on evidence is very in-depth knowledge of all the properties of the devices on which the program operates. Program optimization is usually in two stages: first - the initial drafting of the program with some of speed considerations, the second - a modification debugged programs for improving the parameters. Here are some that I know of optimization techniques in software According to this classification. Immediately it should be noted that due to scant coverage of this issue in the literature, almost all have been developed by a small circle of elite programmers (Hacker) and me and my friends. As a result, the material has an ambiguous character. CHAPTER ONE (On how to write fast programs without knowing the specific equipment) First of all, if your program works fast enough to think about all the other ways of achieving goal and, perhaps, there priori faster algorithms. For example, conversion of the object's coordinates in three dimensions do not necessarily for the Euler equations, the same result can be achieved by several different methods and pick the best. Another example - Automatic strip conductors in the design of the PCB, you can do a few notable ways. All ways to explore and choose Again, the fastest. If the algorithm has already been selected and implemented, then the easiest way to do program faster - to remove from it cycles by repeating the program of the loop body. When the loop is executed at least 100 times a loop body just insert the number of times times should run a cycle. If the cycle is longer than 100 times, then repeat it so much times as possible, and leave the outer loop, but running fewer raz.Takim by the gain in speed is obtained by cutting away from the program processes such as change of variable cycle, its control and the transition operators to the top of the cycle. Also exempted register occupied previously by a variable cycle. In any program that goes beyond the transfer of data, there are algebraic operatsii.Nakoplenny programmers experience allows us to derive some rules, guided by which these operations can be write so that they work faster. For example, A = VxV will always work faster than A = ^ 2 (unless, of course, these operations are written as a universal procedure, and not specially optimized for What will be discussed later). Also faster to run A = Bx (C + D) compared with A + = VxS VxD, because even if the processor provides a team of multiplication it will always be slower than addition. We generalize this subject: here in what order should the speed of the algebraic operations (from fast to slow) addition, subtraction, multiplication, division, exponentiation, taking kornya.A if you torment the question "which is faster: two multiply or four of addition? ", then you have to do a more detailed study of the equipment (see potaktovoe response teams). Next, consider a more advanced features, such as COS, SIN, TG, LOG, EXP. On this occasion it should be noted that in most cases the calculation of these functions, we can write integral if they scaled that will give an enormous gain in speed. Example: suppose that we need to calculate the expression A = SIN (C). In the calculation of sine, we calculate it is not fractional numbers, and integers (due to the fact that the result will not change from 0 to 1, and from 0 to assume 256), multiplying everything on 256. Next, multiplying the resulting result should be divided by 256 (the is simply discarded low byte). Another way to avoid the application of complicated calculation (and therefore slow) functions to calculate them in advance and set in an array. Such a path finds in recent years increasingly used as memory expansion is much easier and cheaper to improve performance. For example, instead of calculating the sine series expansion in Most programs can be universal pre-calculate its value to say increments of one degree and within 4.5 decimal places that will take some 720 bytes of memory, but will speed up the program several times. Another area of application of the accelerating efforts - programs that most Part-time work does not require the speed, but spoil the rare moments of life When you want to quickly do something (to this type are almost all online programs - at first they did not are doing and wait for commands from slow to Compared to them, man, and then, when team filed, they are forced to wait for a man, until executed some complex action). In this case, helps such medium-long program is free cycles (assuming the keyboard can be polled no more than 100 times per second, which will not take a one percent CPU speed), it tries to predict what follow-up and prepare everything in an instant process: find free space on the media, to calculate required constant check errors and to things like that. Even if cooking will be in vain harm from them will not. But if you come in handy - required work is done much faster. The above-mentioned methods and complicate program is extended, but there's a trick allows to reduce not only the time used complement procedure, but also reduce the required pamyat.Rech it comes to self-modifying programmah.Rassmotrim next Example: suppose a program needs to transfer some information from one carrier to another, thus adding to each byte constant whose value every time you start user defined Telem. Naturally in this case to allocate to constant of some memory location where somewhere in unoccupied areas. Let her address will NNNN. It should be action will be written like this: to take the next byte MOVE register, (NNNN) ADD another byte register put the next byte In this case, firstly in the team MOVE featured some of no importance to the algorithm address, but the most program in memory to take place under the constant. Using the same methods self-modification can be written as follows: to take the next byte MOVE register, constant ADD another byte register put the next byte In this case, one of the teams primary cycle, running many thousands of times (That is, the speed of which must be fight), we get almost 2-fold gain in execution time due to removal of indirection to the constant ones. But how? Because of the constant condition not fixed but set by the user, and our program will have to write one specific number! But the fact that instead of one particular number will always be be exactly the constant that has entered user, because, as is known, in memory of the program is a set of numbers (B) and our specific number will also hold a certain number of bytes in memory for quite a specific address, and What this address is worse than that which we wish to allocate to storing the constants? Here and is obtained as the program faster, and head does not hurt when searching for free memory for the constant, and the program is shorter, and memory for the data must be less. Except constants in the program can also be changed the code of operations that much more difficult to writing and requires a thorough knowledge of equipment. CHAPTER TWO (On how to write for a specific hardware) To apply the methods of the type described below is necessary to fully study the specifics of all the hardware which will operate the program, and First and foremost, a processor and storage media. Moreover, the study should not limit logic devices, and should include an examination of the temporal characteristics. Huge role in the performance of the program is the exchange rate with carriers information. In a large number of cases program is running slow not because of core processor, but because of slow exchange with the media. As a rule, even in electronic media (memory ICs) speed is much lower than the processor, and is easy to see the lion proportion of teams just up and running with the memory (Remember that even the team retrieved the processor from memory). In such media as magnetic and optical discs speed and did a funny. The moral: do not be lazy, and thoroughly study the characteristics of the carrier of your information, and You will not regret it. Usually immediately find ways to increase the rate of exchange: it turns out that some part of memory faster than the other, it appears that the media can be cached (to provide rapid buffer memory), then suddenly find that even addresses the work faster than with odd and so on. In the presence of differences in the characteristics necessary to ibolee used data have a the faster sections. For example, many computers, the memory used as video memory used to store non-graphical data, which gives an increase memory for the data. But having studied the performance of the memory usually turns out that this memory is much slower and it is desirable to store only infrequently used data. The next step - studying potaktovogo execution of commands appears protsessorom.Srazu numerical criterion performance: you write well - 1200 cycles, and to think and write simpler - 750 cycles. Also there is a pleasant way evaluate the performance of the program is not even introducing it into the car and found it necessary to to perform the number of cycles. For example, making the program that processes signals with frequencies up to 100 hertz income pointless to enter into a computer program Therefore, signal processing, if one cycle It will take on the calculation of more than 10000 cycles (When clocked at 1 MHz). At this stage is useful to create for himself a black list and bring back the team for running an excessively large number cycles to always avoid their use. If your program works in any other intellectual partners parameters (disk drives, graphics / arithmetic / music and other co-processor, or another computer on the network) then work with them, you can also speed up detail by studying their logical and temporal parameters. For example do not need to write vector drawing program with specified coordinates, if it is able to quickly make himself the video processor (a "smart" video processors are able to draw not only with age torus, but the three-dimensional polygons on defines a chart pattern, light and so on). Another example: why the program when printing text on paper to align it along the edges where it can do printer itself is not wasting any time in this or computer memory. Or anything to control the checksum of the array at transmission of data, summing up a vast array of numbers, if at the hardware level controlled I / O port and sufficient sufficiently control a bit in his the flag state. These and similar errors lead to an annoying pauses where they can be eliminated with proper approach to speed of the program in writing it. In addition to "Iron" nice to know and features a human if va step program provides its participation For example, in recent programs with the creation of vysokorealnyh computed image tions used a strong simplification of the calculation representations scene when it moves fast, so the human eye is still not time to consider detaley.Takzhe on similar properties are designed algorithms strong package of images and sound for by the disappearance of small and imperceptible de tackles. Continued in the next issue
Other articles:
Similar articles:
В этот день... 21 November