Best non-trigonometric floating point approximation of tanh(x) in 10 instructions or less
Description
I need a reasonably accurate fast hyperbolic tangent for a machine that has no built-in floating point trigonometry, so e.g. the usual tanh(x) = (exp(2x) - 1) / (exp(2x) + 1)
formula is going to need an approximation of exp(2x)
.
All other instructions like addition, subtraction, multiplication, division, and even FMA (= MUL+ADD in 1 op) are present.
Right now I have several approximations, but none of them are satisfactory in terms of accuracy.
[Update from the comments:]
- The instruction for
trunc()
/floor()
is available - There is a way to transparently reinterpret floats as integers and do all kinds of bit ops
- There is a family of instructions called SEL.xx (.GT, .LE, etc.) which compare 2 values and choose what to write to the destination
- DIVs are twice as slow, so nothing exceptional, DIVs are okay to use
Approach 1
Accuracy: ±1.2% absolute error, see here.
Pseudocode (A = accumulator register, T = temporary register):
[1] FMA T, 36.f / 73.f, A, A // T := 36/73 + X^2
[2] MUL A, A, T // A := X(36/73 + X^2)
[3] ABS T, A // T := |X(36/73 + X^2)|
[4] ADD T, T, 32.f / 73.f // T := |X(36/73 + X^2)| + 32/73
[5] DIV A, A, T // A := X(36/73 + X^2) / (|X(36/73 + X^2)| + 32/73)
Approach 2
Accuracy: ±0.9% absolute error, see here.
Pseudocode (A = accumulator register, T = temporary register):
[1] FMA T, 3.125f, A, A // T := 3.125 + X^2
[2] DIV T, 25.125f, T // T := 25.125/(3.125 + X^2)
[3] MUL A, A, 0.1073f // A := 0.1073*X
[4] FMA A, A, A, T // A := 0.1073*X + 0.1073*X*25.125/(3.125 + X^2)
[5] MIN A, A, 1.f // A := min(0.1073*X + 0.1073*X*25.125/(3.125 + X^2), 1)
[6] MAX A, A, -1.f // A := max(min(0.1073*X + 0.1073*X*25.125/(3.125 + X^2), 1), -1)
Approach 3
Accuracy: ±0.13% absolute error, see here.
Pseudocode (A = accumulator register, T = temporary register):
[1] FMA T, 14.f, A, A // T := 14 + X^2
[2] FMA T, -133.f, T, T // T := (14 + X^2)^2 - 133
[3] DIV T, A, T // T := X/((14 + X^2)^2 - 133)
[4] FMA A, 52.5f, A, A // A := 52.5 + X^2
[5] MUL A, A, RSQRT(15.f) // A := (52.5 + X^2)/sqrt(15)
[6] FMA A, -120.75f, A, A // A := (52.5 + X^2)^2/15 - 120.75
[7] MUL A, A, T // A := ((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133)
[8] MIN A, A, 1.f // A := min(((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133), 1)
[9] MAX A, A, -1.f // A := max(min(((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133), 1), -1)
The question
Is there anything better that can possibly fit in 10 non-trigonometric float32 instructions?
Comments
Post a Comment