Along the way, a few SM-specific oddities are noted, and the benefit of subtracting 0.5 (or 0.49999999) from the operand of frndint still eludes me, but please do tell let me know if you think my understanding is incorrect.
Chris
- Code: Select all
streamin base;
streamin exponent;
streamout result;
// Assembly-language exponentiation algorithm: computes result = base^exponent
// or z = x^y for positive base and any exponent.
// Functions in the x86 FPU instruction set operate on the contents of the
// 'stack' which is a series of registers: st(0), st(1), st(2), etc., that can
// be addressed individually but are interdependent.
// The fld (load) instruction 'pushes' the stack, shuffling any prior contents
// down one level, so st(0) becomes st(1), etc., and then stores the loaded
// value in st(0). The fstp instruction copies the value that was in st(0) to
// the specified destination and then 'pops' the stack; the opposite of pushing.
// A function with one operand takes st(0) from the stack, computes the
// function and places the result back in st(0). This action is equivalent to
// popping the stack to take the operand and then pushing the result back onto
// the stack. After one pop and one push any content in st(1) and lower levels
// remains in place.
// In most cases, a function with two operands pops st(0) and st(1) from the
// stack, computes the function and pushes the result back into st(0). After
// two pops and one push, any content in st(2) and below is raised one level:
// st(2) to st(1), etc. However, there are exceptions!
// The required result is z = x^y where x = base and y = exponent.
// Take log2 of both sides, giving log2(z) = y*log2(x) and recall that
// 2^[log2(v)] = v, where v is any value, so z = 2^[y*log2(x)].
// The FPU functions available to calculate this are:
// fyl2x (where the third character is a lower-case L, not a figure 1) which
// calculates st(0) = st(1) * log2[st(0)] for st(0)>0.
// fscale which calculates st(0) = st(0) * 2^<st(1)> where <v> is the integer
// part of v, always rounding down so <n.99999> becomes n and <n.00001> becomes
// n. The implementation of fscale used by SM leaves the content of st(1) and
// st(2) in place so it must obtain the two operands by popping the stack only
// once, before pushing the result back into st(0). A notable exception.
// f2xm1 (where the final character is a figure 1, not a lower-case L) which
// calculates st(0) = [2^{st(0)}] - 1 where {v} must have a value < 1.
// In order to be able to handle arbitrarily-large values of z without
// compromising accuracy, it is necessary to split [y*log2(x)] into integer and
// fractional parts, I and F. Then z = 2^[I+F] = 2^I * 2^F.
// I = <y*log2(x)> and F = y*log2(x) - <y*log2(x)>
// frndint applies 'Banker's' rounding so <n.5 becomes n and >n.5 becomes n+1.
// Exactly n.5 becomes whichever of n or n+1 is even. For compatibility with
// the rounding in fscale, the alternate rounding up and down of n.5 values can
// be overcome by subtracting 0.5 from the operand. However, for a scheme using
// a single frndint this just moves the issue to integer operands.
// Such subtraction was used in an earlier incarnation of this algorithm but it
// has been disabled here - it can be re-enabled by un-commenting the relevant
// two lines. More-clever schemes may be possible using multiple instances of
// frndint but this function is 'expensive' in terms of CPU cycles.
// fsub is used and this computes st(0) = st(1) - st(0), not the other way round
// so SM appears to implement fsubr as defined in 'Art of Assembly' Chapter 14.
float half=0.5; // in case you want to re-enable 0.5 subtraction (see below),
// but not otherwise used
fld exponent[0]; // loads x=base into st(0)
fld base[0];// and y=exponent into st(1)
fyl2x; // st(0) = y*log2(x) leaving only st(0) in the stack
fld st(0); // copies the value that was in st(0) and pushes it into st(0)
// pushing the value that was in st(0) down to st(1),
// effectively copying st(0) into st(1)
// * * Remove // at the beginning of each of the following two lines * *
// * * to re-enable subtraction of 0.5 as described above * *
//fld half[0]; // loads 0.5 into st(0) << neglecting this
//fsub; // st(0) = st(1) - st(0) = y*log2(x) - 0.5 << neglecting this
frndint; // rounds st(0) to an integer in the manner described above and
// places it in st(0) so now, st(0) = <y*log2(x)> = I and st(1) = y*log2(x)
fxch; // exchanges st(0) with st(1) so now st(0) = y*log2(x) and st(1) = I
fld st(1); // loads I into st(0) and pushes y*log2(x) down into st(1)
// so now st(0) = I, st(1) = y*log2(x) and st(2) = I. This last item is
// important - the value of I is being kept stored for later use - and one
// way to place a value into st(2) is to push it down in the stack
fsub; // st(0) = st(1) - st(0), so st(0) = y*log2(x) - I = F.
// Popping the two operands and then pushing the result brings the
// 'stored copy' of I up into st(1) so now st(0) = F and st(1) = I
f2xm1; // st(0) = [2^{st(0)}]-1 = (2^F)-1 and st(1) still contains I
fld1; // (where the final character is a figure 1, not a lower-case L)
// means st(0) = 1, pushing down st(1) = (2^F)-1 and st(2) = I
fadd; // st(0) = st(0) + st(1) = 2^F.
// Popping the two operands and then pushing the result brings the
// 'stored copy' of I up into st(1) so now st(0) = 2^F and st(1) = I
fxch; // exchange st(0) with st(1) so now st(0) = I and st(1) = 2^F
fld1; // st(0) = 1, pushing down st(1) = I and st(2) = 2^F
fscale; // st(0) = st(0)*2^st(1) = 1*2^I. The SM implementation, as described
// above, pops the stack only once and pushes the result onto the stack, so now
// st(0) = 2^I, st(1) = I and st(2) = 2^F
fstp st(1); // copies the value of st(0), places it in st(1) and pops the stack
// bringing this value up into st(0), and the value that was in st(2) up into
// st(1), so now st(0) = 2^I and st(1) = 2^F
fmul; // st(0) = st(0) * st(1) = 2^I * 2^F giving the required result
fstp result[0];
















