Scenergy #02
31 декабря 1999 |
|
Coding - procedure for rapid multiplication.
FAST EXPONENTIAL MULTIPLY [Baze/3SC] Some smart man said hundreds years ago: "When there are no limits, there is no creativity ". Unfortunately we don't live in his era. He would be probably glad to see how Z80 assembly coders perform multiply. When we look at common calculations, we mostly need to multiply some integer by fractional number in range <0, 1>. Good examples are sinus, cosinus or perspective. People which are not familiar with fixed point arithmetics might wonder how to achieve this. As always ... simply: 1) Pre-multiply fraction by 256 [to make it appear as 8-bit integer]. 2) Multiply our two 8-bit integers. 3) Divide whole 16-bit result by 256 [actually, just take higher byte]. This article is about doing it FAST. The main idea is hidden in equation: a * b = exp (log (a) + log (b)) [where LOG is natural logarithm] Smart coder will immediately find out that we need just three table lookups and one addition. So our first attempt would look like: B, C - operands ld h, LOG; set pointer to LOG table ld l, b ld a, (hl); get logarithm of the first integer ld l, c add a, (hl); add logarithm of the second integer inc h; move to EXP table ld l, a ld a, (hl); get higher byte of the result Unfortunately, this will not work. To prevent ADD A, (HL) from overflow we must store logarithms with only 7-bit precision which would cause our results to be inaccurate. What do I mean by 7-bit precision? Look. When we are making LOG table, the biggest entry we can get is: log (255) = 5.54126 ... => 6 It is a good idea to pre-multiply all entries by 127 / log (255) and save the logarithmic curve with more precision [so there will be no numbers 0 - 6 but 0 - 127]. Let's call this constant S [scale]. Of course, we must slightly modify the calculation of EXP table: for X = 0 to 254 EXP [X] = exp (X / S) / 256 Anyway, this is just an example. To make our routine really work, we must increase scale. After various experiments with Busy we found out that 10-bits is optimum. Results are accurate [Many times they are even correctly rounded!] And memory requirements are still acceptable. Logarithms are no more 8-bit. They are stored in 512 bytes long table. EXP table has 1024 + 1024 entries and S = 1023 / log (255). So we can move a step forward and write our second attempt: ld h, LOG; set pointer to LOG table ld l, b ld e, (hl) inc h ld d, (hl); store logarithm of B into DE ld l, c ld b, (hl) dec h ld c, (hl); store logarithm of C into BC > Ld hl, EXP; set pointer to EXP table > Add hl, bc add hl, de ld a, (hl); get the result This routine would work perfectly. Anyway, it has one disatvantage - it can be faster. We can cut off highlighted instructions by famous table offset trick. Imagine that we stored EXP table at address 200 * 256. Look what happens if we increase higher byte of logarithms by 100: ld h, LOG ld l, b ld e, (hl); get lower byte of log (B) inc h ld d, (hl); get higher byte of log (B) increased by 100 ld l, c ld a, (hl); get higher byte of log (C) increased by 100 dec h ld l, (hl); get lower byte of log (C) ld h, a; store complete logarithms into HL and DE add hl, de; 100 + 100 = 200! we have pointer to EXP table ld a, (hl); get the result We have just developed complete multiply in 1973 T! Now let's talk about handling negative numbers. It would be nice to call the same routine dependless of operand signs. And it is possible to do it. We will just extend our trick and add two different table offsets to higher byte of the logarithm. One for positive numbers [entries 0 - 127] and one for negative numbers [entries 128 - 255]. Since EXP table is 2048 [8 * 256] bytes long, difference between these values must be at least 8. Let's take values 100 and 108 for example and make out possible variations: [+] [+] = [+] [+] * [-] = [-] 100 + 100 = 200 100 + 108 = 208 [-] [+] = [-] [-] * [-] = [+] 108 + 100 = 208 108 + 108 = 216 So, at address 200 * 256 will begin ordinary EXP table. At address 208 * 256 will be stored EXP table with negative results. And once again, there will be ordinary positive EXP table at address 216 * 256. We will lose 6K of memory instead of 2K, but we don't need to care about signs. In addition, tables will be more accurate. Because all integers will lie in interval <-127, 127>, we can use S = 1023 / log (127). Those of you which are extremely smart can ask: "Hah! And what if one operand will be zero? Everybody knows that log (0) is undefined! ". Yes ... but can we give up after such voyage? Definitely no! We just fill table entries for log (0) with zeroes [But we must add correct table offsets to higher byte]. The worsest case is: log (0) + log (127). When we add them together, we will get 1,023 [supposing that logarithms were scaled] and final result will be: exp (1023 / S) = 255 Since higher byte of 255 is 0, our problem is solved. If you decide to use correct rounding you can get value 1. In most cases nobody will notice this little inaccuracy. However, it depends on the nature of whole problem. Either we decide to round [and risk] or to ignore fractional part [and make all multiplies slightly inaccurate]. It's upon you. I suppose that nobody will make any additional zero checks. Finally, I will put there X * sin (Y) routine, which is very useful in vector rotations. The only difference between "Classic" multiply and this routine is in additional LOG table, which actually contains no S * log (abs (Y)), but S * log (abs (127 * Sin (Y * PI / 128))) entries. If you are wondering about ABS, you better jump few pages back and think again about handling signs by table offsets. ld h, LOG ld l, b ld e, (hl) inc h ld d, (hl); get log (B) inc h ld l, c ld a, (hl) inc h ld h, (hl) ld l, a; get log (sin (C)) add hl, de ld a, (hl); and we are finished You will probably spend lot of time by building tables, but it is well worth the effort. See ya next time. -Baze-3SC->
Other articles:
Similar articles:
В этот день... 21 November