Bugs #01
30 сентября 1999 |
|
Small Coding - Procedures for squaring and square root.
(C) The Devils Corporation 1999. Squaring. As you have guessed from the title of the article, here it goes as just about squared. This feature is one of the most common in the implementation of the computer. I offer two options for obtaining square numbers. 1. 8-BIT VERSION. HL = A ^ 2 LD H, 128; 7: 2 LD L, A; 4: 1 LD A, (HL); 7: 1 INC H; 4: 1 LD H, (HL); 7: 1 LD L, A; 4: 1 RET; 10: 1 ----------------------- 8 bytes. Time = 43 clock cycles. It is seen that uses a table of 512 bytes, which is pre-placed squares of all the 256 numbers. So to speak - no brainer I did not invent and do not intend to. Again, recall that the subroutine can be-optimized if: a) Receiving no response to HL. b) A number that is squared - send in L. c) Get rid of the RET, introducing a subroutine in the main body. d) Keep the number 128 somewhere else. Example for BC = L ^ 2: LD H, D; 4: 1 D = 128 LD C, (HL); 7: 1 INC H; 4: 1 LD B, (HL); 7: 1 ---------------------- 4 bytes. Time = 22 clock cycles. Yes, I almost forgot to complete the picture lead the process of creating a table of squares of 8-bit numbers. LD HL, 32768; start address table LD D, L LD E, D LD B, D LD C, 1 M1: LD (HL), E INC H LD (HL), D DEC H EX DE, HL ADD HL, BC EX DE, HL INC BC INC BC INC L JR NZ, M1 RET ----------------- 21 bytes. Well, that's cool? How not?? Okay, well: LD HL, 32768; start address table LD D, L LD E, D LD B, D LD C, D M1: LD (HL), E INC H LD (HL), D DEC H INC BC EX DE, HL ADD HL, BC EX DE, HL INC BC INC L JR NZ, M1 RET ----------------- 20 bytes. What - again crap? Granted, you can short - my limit was 18 bytes. 2. 16-BIT VERSION. Must be squared BC. From school for all of us (and may be not all) is well known that: (A + B) ^ 2 = A ^ 2 + 2 * A * B + B ^ 2 So, if you take the register-pair BC and expanded it to 'A' and 'B', then we obtain the following: (B * 256 + L) ^ 2. In this and squaring based on my version: A) First, create a table of squares of 8-bit numbers. (Above) B) And now, the procedure itself: The principle is: DE: HL = BC ^ 2 LD H, 128 LD L, C LD E, L LD A, (HL) LD (M1 +1), A INC H LD C, (HL) LD L, B LD A, (HL) DEC H LD B, (HL) LD HX, A LD H, L LD D, 0 LD L, D And now 8 times the following three lines - this multiplication! ADD HL, HL JR NC, $ +3 ADD HL, DE ; --------------------------------- LD D, HX ADD HL, HL JR NC, $ +3 INC D ADD HL, BC LD E, H LD H, L M1: LD L, 0 RET NC INC D RET ------------------ 66 bytes. Time = 400 cycles. Written outright and not optimized. So go for it! To begin, try to understand - what actually happened? Here, here purely, purely here You understand that you are not cool! Square root. In this article I'll show you how literate guys calculate the square root. Just want to clarify that I calculate the root with no fractional part and the answer is rounded to the nearest integer. Well, so happened with me, ie: SQR (16384) = 128, not 127 or 129. SQR (17000) = 130, not 129, and certainly not 130.3840484619141. It should be noted that I was satisfied with this round up to a certain extent only because of the speed of calculation. All the procedures operate on the principle A = SQR (HL) 1. Brake which no one saw! LD DE, 1, 10: 3 XOR A; 4: 1 M1: SBC HL, DE; 15: 2 RET C; 11 / 5: 1 INC DE; 6: 1 INC DE; 6: 1 INC A; 4: 1 JR M1; 12: 2 -------------------- 12 bytes. Min. time = 14 + 26 = 40 cycles Maks.vremya = 14 + 255 * 48 = 12254 cycles. Well, let's say - it works based on the fact that the number of responses with each successive integer response increases by 2. Written by me for the first time in 1994. 2. Quickening brake ... LD DE, 65535; 10: 3 XOR A; 4: 1 M1: ADD HL, DE; 11: 1 RET NC; 11 / 5: 1 DEC DE; 6: 1 DEC DE; 6: 1 INC A; 4: 1 JP M1; 10: 3 Place JR - will be shorter! -------------------- 12 bytes. Min. time = 14 + 22 = 36 cycles Maks.vremya = 14 + 255 * 42 = 10724 cycles. From quickening becomes evident only one brake - he is still quickening brake. The only advantage of these procedures - they are very short. 3. The most appropriate size and speed. LD DE, 64, 10: 3 LD A, L; 4: 1 LD L, H; 4: 1 LD H, D; 4: 1 LD B, 8, 7: 2 M1: SBC HL, DE; 15: 2 JP NC, M2; 10: 3 (or JR NC, M2) ADD HL, DE; 11: 1 M2: CCF; 4: 1 RL D; 8: 2 RLCA; 4: 1 ADC HL, HL; 15: 2 RLCA; 4: 1 ADC HL, HL; 15: 2 DJNZ M1; 13 / 8: 2 LD A, D; 4: 1 RET; 10: 1 ---------------------- 27 bytes. Execution Time = 838 cycles. Go tell - JR and JP does not matter. The very method of calculation was taken, as I recall, from the ROM 48. 4. Implicit spreadsheet. (Beware - an explosion of brains!) LD A, H; 4: 1 CP 16, 7: 2 JR NC, M1; 12 / 7: 2 OR 128, 7: 2 LD H, A; 4: 1 LD A, (HL); 7: 1 RET; 10: 1 M1: LD L, H; 4: 1 LD H, 127; 7: 2 LD H, (HL); 7: 1 LD L, A; 4: 1 ADD A, (HL); 7: 1 RET; 10: 1 --------------------- 17 bytes. When HL = 0 to 4095, time = 46 cycles. When HL = from 4096 to 65535, Time = 62 clock cycles. Table size 5120. (Memory address 32768) A table is: 0000-4095 - responses to HL from 0-4095 4096-4351 - interpolation range 4096-19199 4352-4607 - interpolation range 19200-33791 4608-4863 - interpolation range 33792-34815 4864-5119 - interpolation range 34816-65535 Why so - you can not ask. Additional 256-byte table with the address 32512. Ingredients: DS 75, 144 DS 57, 145 DS 4, 146 DS 120,147 ===== Well, that - TOWER tear?? ===== Now let go, when to tell about the accuracy of the calculations: Absolutely = 50046 numbers. 77% Error on -1 = 10321 number. 15% Error on +1 = 5199 numbers. 8% Well, what to say in his defense - should be better to interpolate! (But vpadlu ... yes and no time at all ...) Further, in principle, can not read anything until paragraph 5. !!!!!!! -------------------------------------------------- ------------ You can get rid of 4096-byte table if you modify the program. LD A, H; 4: 1 CP 16, 7: 2 JR NC, M1; 12 / 7: 2 LD DE, 65535; 10: 3 XOR A; 4: 1 M2: ADD HL, DE; 11: 1 RET NC; 11 / 5: 1 DEC DE; 6: 1 DEC DE; 6: 1 INC A; 4: 1 JP M2; 10: 3 M1: LD L, H; 4: 1 LD H, 127; 7: 2 LD H, (HL); 7: 1 LD L, A; 4: 1 ADD A, (HL); 7: 1 RET; 10: 1 --------------------- 24 bytes. Of course, you guessed it, I put way 2 inside. The table has a total of 1024 bytes. All HL, which are greater than 4096 on the table, but the other for: Min.vremya = 32 + 22 = 54 clock cycles. Maks.vremya = 32 + 42 * 63 = 2678 cycles. Well, that's probably very smart to add - consider themselves steep optimizers. Only a fool does not see that here you can unroll the loop, get rid of two DEC DE and a JP and get something like this: ; First three lines of procedure, see above LD DE, 65535; 10: 1 XOR A; 4: 1 ADD HL, DE; 11: 1 RET NC; 11 / 5: 1 DUP 1963 LD DE, 65533, short, DE = previous DE - 2 INC A ADD HL, DE RET NC EDUP ; Last 6 lines see above DUP 63 - Means to repeat 63 times the block of code. Now the time: Min. time = 18 (initial CP-scale) + 36 = 54 clock cycles. Maks.vremya = 18 (------ / / ------) + 30 * 64 = 1938 cycles. Something like 740 cycles snatched a very large response. Blog cycle is: 6 * 64 = 384 bytes. In such cases, I would like to recall the procedure 3, which is any root of 838 cycles. Thus, if use it to find the roots of these accidents 4096 You can win in time as follows: 1. (838-18) / 30 = 27. ... 2. 28 ^ 2 = 784 That is, the interval from 0 to 784 procedure 2 "makes" the procedure 3, but further ... bummer. -------------------------------------------------- ------------ 5. Explicit table calculation. In advance, a table of answers to SQR (HL), and then you can use something like this. Example for the roots of 0 to 16383 (Table 1 segment). LD A, H; 4: 1 OR 192; 7: 2 LD H, A; 4: 1 LD A, (HL); 7: 1 RET; 10: 1 ------------------ 6 bytes. Time = 32 clocks. or ADD HL, BC; BC = 49152 LD A, (HL) RET ---------------- 3 bytes, 28 cycles. In principle, this method does not work that way. Number 192 should be stored in any case (for example in C) but about the RET should be forgotten altogether, as do CALL 6 bytes is complete insanity. For those who do not understand - a subroutine should be built into the main body of the program! Here then we obtain an almost full speed - 19 strokes. And this is not the limit. Because if the table a little lift to 16384 bytes so, the routine can be as follows: SET 7, H; 8: 2 LD A, (HL); 7: 1 RET; 10: 1 ----------------- 4 bytes. Time = 25 cycles. (No RET - 15 cycles). But this is not the limit. Since many computers are the ability to connect instead of ROM in your page, if you can pay under the table, this piece of memory the calculation take the square root of 7 cycles (LD A, (HL)). From a mathematical point of view is a complete mess. But it is too fast. Okay - so far enough to engage in this nonsense. In fact, mode 3, in principle, can make a malicious nelyubiteley tables (I have something did not understand - surely there are still such Downey and Thugs). Or that - a table of 512 bytes or 16,384 - it - all - Khan? Guys, I understand that all of you, as if to programmers, such as - hackers, and all that, but sank to the ground, because the rate of ZX SPECTRUMa not great, and spend time on the conversion of some formula - it's a luxury. I do not remember where, was given a procedure of extracting the square root of the HL. There used multiplication. I do not understand - that there needed to multiply, and why? For all that the procedures of multiplication was not a fountain. What kind of nonsense? You that, sorry publish the normal (I'm not saying sverhklassnye) algorithms? Or mind only enough to write a braking procedure? Guys, it's time to stop using the old formula - Pythagoras Heron, the Newton-Leibniz (square root) and other elders. It is now almost 2000. Think of something my! PS: I hope you people are smart and do not take offense to criticism, and Notes. Did not want to offend anyone and did not try. You are right only in that no matter how bad your algorithm rhythms - they still work in real programs - and this is perhaps their only plus. If someone wants to express its opinion on the matter and without ... Accept the challenge from anyone in terms of writing itself short What the procedure or program of your choice (broken off almost every dogo! ) My E-Mail: devils@ellink.ru PPS: I propose to write a short procedure for calculating the cube root of HL in integers. A = HL ^ (1 / 3) Here's my version: LD DE, 65535 LD BC, 0 M1: ADD HL, DE RET NC INC A EX DE, HL DEC BC DEC BC DEC BC DEC BC DEC BC DEC BC ADD HL, BC EX DE, HL JR M1 --------- 20 bytes. ________________________________( C) The Devils Corporation 1999.
Other articles:
Similar articles:
В этот день... 21 November